by Christopher Kullenberg, University of Gothenburg
Back in the mid 1990s when I grew up as a teenager in a rural village in the south of Sweden, internet connections came in the form of dial-up modems and data was stored on floppy disks and noisy desktop computers. In those days, there was barely a concept of what “my data” was, because there were no social network sites, digital cameras were still science fiction and mobile phones could only make voice calls. If there was such a thing as my data, it was just some boring bank account, medical file or a database at the tax authorities. I owned my data, which was stored on floppy disks, and it consisted of pirated computer games and school assignments.
But one day in 1997 the notion of personal data became more clear to me. Me and my friends were bored with life in the computer lab in our high school. There were no Youtube channels, no Instagram influencers and the flow of everscroll social media was yet to be invented. Nothing to keep our attention hooked to the never-ending visual flashes of the lives of people that we did not really know. In other words, the experience of the internet was rather dull even though it was said that it was the future. But boredom is sometimes the origin of creativity, and we had discovered that it was possible to bypass the login screen of the most popular free e-mail provider, Hotmail, if the previous user had not properly closed the web browser. So we broke into a classmates’ account one day, as the opportunity presented itself out of the boredom of the world wide web version 1.0. As we started reading the classmate’s e-mails (this was back in the days when people wrote e-mails to friends as one wrote phyisical letters to a pen pal), we suddenly felt the burden of scruples heavy as a rock on our shoulders. We were eavesdropping on other people’s data, and it was wrong in a very straightforward fashion.
Today we call it “the cloud”. My classmate had signed up for a free e-mail account in the cloud, which in concrete terms meant storing his data at some elses computer, and me and my friends had not only committed a crime, but we had exploited a vulnerability that was out of control for the victim of this vicious attack. But we still used floppy disks, and we even had storage boxes for them, which sometimes came with a lock and key. It made sense, since physical objects need to be protected with padlocks, keys and robust doors. So does the cloud, but in the early days, that was seldom the case.
Then came the turn of the millennium and the dot-com crash. Hotmail would still be around for more than a decade, but slowly being eclipsed by Gmail, who made a promise to the users that they would never have to delete an e-mail again due to the impressive 1 gigabyte of storage space. For my data, and for more than 1.5 billion other users’ data.
From the ashes of the crash came the social network sites, and as a companion came the business model of using your data as a way of optimising ad revenues. As a university student I got access to Facebook before it was released to everyone, and with my sub-megapixel digital camera I started uploading pictures of my travels, my life, and other people’s lives since nobody really came to think of those images being shared online yet. I embraced “tagging” my friends in the photos, not knowing that this would be an obsolete function once facial recognition was fully mature as it is today. Neither did I consider the fact that every interaction that I made – be it “likes”, “retweets” or dropping a comment in a heated discussion – was recorded and stored in a form that was more and more difficult to define as “my data”. Somewhere in those massive “End User License Agreements” texts there was probably an explanation, but who had time for explanations when you suddenly were getting a lot of “likes” on your most recent profile picture. Just agree to it and get going already!
Around 2010 the social web started to interest me as a social scientist and academic scholar. In this time, the endless production of data by users of social network sites was often referred to as the “data deluge”. This biblical flood of data, consisting of pictures, likes, posts, reactions, check-ins and an updated list of your favourite books, movies and things to do in you spare time, was the perfect data for studying what social life was really about. Some social scientists (myself included) argued that this was the re-birth of an almost impossible type of statistics. In 1895 the French sociologist Gabriel de Tarde had written Les Lois de l’imitation (The Laws of Imitation), in which he envisioned a new type of data to be the foundation of social facts in the discipline of sociology. Instead of relying on conventional statistics that merely described births, deaths, the opinions of people, their income and other boring facts, de Tarde wanted to gaze into every little micro-decision that ordinary people made in their everyday lives. What brand of canned soup was in their kitchen, what books were on their book shelves, what seasonal colours of clothes were in their warderobes and what kind of label of gum where they chewing? In the late 19th century such questions were warded off as pure science fiction, just like submarines and space programmes. Such data could not exist, since it would require hundreds of work hours to record all such details using the technologies of the time – pencils and paper.
But in the early years of the 2010s Gabriel de Tardes vision seemed to be coming true. Really, people were posting onto Facebook, Instagram and Twitter exactly what was on their breakfast plates, what clothes they had recently purchased, what books they had read and what their thoughts were on them, even lining out in detail how they felt about it. Not only were they sharing it online in an open and easily retrievably format, they were also liking, retweeting, commenting and interacting with such information in a format that was far too tempting not to download and analyse. Before the social network sites had begun putting restrictions on the way they shared their information, I was able to make a huge database of what people in Sweden were talking about in Facebook groups, all with the seemingly good intention of realising the 19th century sociological vision of a data analysis that really made use of social facts. This was the new social science, a science that could look into everyones drawer, closet or cupboard to study study people’s behaviour, thoughts and emotions. However, the same scruples as those that appeared to me in the late 1990s, when I broke into a classmates Hotmail account, suddenly re-appeared to me. I had not broken a law, not even a license agreement typed up by a law firm somewhere in California. But again, there was something inherently wrong with even downloading this data, not to mention analysing it. I quickly removed it from my harddrive, and soon everyone was talking about research ethics for online research.
Then came the so-called Cambride analytica scandal of 2018. A British firm called Cambridge Analytica (conveniently borrowing the name of a reputable UK university) had harvested Facebook data for political advertising purposes. Even though their data collection was designed for a particular purpose, it was not entirely different from my own curious usage of social media data. As the scandal hit the news agenda, Facebook, and most other social network sites, put an end to the data deluge, at least in its open form, accessible to almost everyone. European legislation had also caught up, giving us the General Data Protection Regulation (GDPR), making any such inquiries legally impossible. Once again the old saying “legislation lags behind technology” was there to remind us that the Promethean fire fuelling our desire to use tehnology must be controlled with ethics.
Here the story could come to an end. The social media sites first upgraded their security of the cloud so that people would have their private data secured. Then they complied with public opinion, and further down the line with legislation.
However, the 2020s took off with a new scandal, one which again complicated the question of “who owns your data?”, but this time with a twist of artificial intelligence. The American company Clearview AI made headlines across the world as it was revealed that they had scraped the internet for pictures of peoples’ faces. These pictures were then used to train their AI for facial recognition, which in turn was sold to law enforcement agencies. Allegedly very efficient for preventing crime, the efficiency of the AI depended on access to my data, especially that profile picture I once uploaded to my social media accounts. But it does not take much for this technology to be used for malicious purposes. A few years earlier the Russian service FindFace used a similar technology to match faces with profiles recorded in the social network site VKontakte. It did not take long for this technology to be abused. Some anonymous users of a Russian imageboard used FindFace to identify actresses in erotic movies, then posting their identity online with the attention of shaming them for what they regarded to be immoral behaviour.
Today, your data is not only a valued commodity, one which you are increasingly losing control over. But it is also something that, in an era of artificial intelligence, can be used against you.
Photo: thisisengineering-raeng
One thought on “Who owns your data?”
Comments are closed.
As one of the founder of MyData Global I very much appreciate the text by Christopher. “Data empowerment” is also the “Big picture” in which human-centric data management belongs. Congratulations to your choice of web site name (:-