By Dennis Nguyen
The emerging field of critical data studies produced a wealth of sophisticated theorisation and eloquent critiques on the datafication and automation of society. However, most approaches are qualitative in nature and the discipline still misses a stronger grounding in more diverse empirical research (Flensburg & Lomborg, 2021) This is not a surprise, considering that -while being inherently interdisciplinary- critical data studies and related research strains still largely stem from critical media and culture studies. These disciplines traditionally gravitate towards decidedly qualitative-reflective approaches to deal with selected empirical material.
An argument can be made that research into the social and cultural dimensions of the digital transformation needs to make use of more empirical methods. These may include the “traditional” methods along the qualitative-quantitative spectrum (e.g., interviews, ethnography, surveys, experiments etc.) but also more recent computational methods and digital methods.
Such data-driven methods may allow researchers in critical data studies to test concrete hypotheses and generalise from representative samples, which in turn can further inform theory-development. They can also help with saving time and resources. This is not to say that research designs that apply large, quantitatively-oriented methods are per se ‘objective’ or superior. To the contrary, each method needs to be understood and assessed within its specific context of use and situatedness (Dobson, 2019). They come with their own assumptions, biases, and limitations. In this respect, they are not different from any other method in the wider toolset of the social sciences (inc. social- and media psychology) and humanities. Yet digital and computational methods offer possibilities for researchers to ask novel research questions and expand knowledge about datafication and automation based on observations that go beyond the anecdotal (Nguyen, 2020).
However, it is not always easy and intuitive to use large amounts of data and algorithms for research that is firmly rooted in fields that only recently started to embrace digital and computational methods(Karsdorp et al., 2021). Researchers themselves need to develop digital literacy and digital research skills first, before they can apply new approaches that not only supplement but expand and possibly even substitute previous methods for empirical investigation. This also includes an understanding for how to differentiate between a growing number of digital and computational methods that can serve a variety of research purposes.
Digital Literacy for (Aspiring) Researchers
Digital literacy is a pillar of critical data studies (and the closely related media-/culture and communication studies in general), both as an attribute of researchers in the field and a discernible research topic.
Critical data studies deal with digital technologies that transform how people interact with each other, participate in culture, politics, the economy so on and so forth. Digital media that create data and are driven by automated algorithms are ubiquitous (Nguyen, 2021). Put simply, the discipline’s main interest is to understand how data and artificial intelligence shape relationships of power, social hierarchies, processes of meaning-making, performance of identities, and the ethical challenges that novel technologies raise. Researchers usually consider themselves digitally literate and a sub-strand of critical data studies seeks to understand (and potentially foster) citizens’ digital literacy (including data literacy, internet skills etc.). However, what exactly does ‘digital literacy’ stand for when it comes to researchers themselves? What does the concept entail in terms of knowledge, competencies, and skills?
Generally speaking, one may draw a distinction between two broader dimensions of understanding and practical skills/strategic behaviour for research. First, there is the conceptual knowledge and critical thinking about how digital media shape social relationships and interactions. Aspiring researchers acquire this through education, as a growing number of (post-graduate) study programs place explicit emphasis on building this kind of knowledge among their students. More specifically, this first dimension includes critical data literacy and data infrastructure literacy (Gray et al, 2018) as two closely linked sub-dimensions of digital literacy.
Across educational programs related to critical data studies, students learn early on what datafication, and automation are, and how they are embedded in digital media. They understand e.g., that data are not a given but constructed, algorithms implement human intentions, and what the societal effects of platformization are. Since societies observe themselves through data to make decisions, data literacy as part of digital literacy further covers a basic understanding of statistics. So, conceptual knowledge includes topics such as algorithmic awareness, privacy protection, data bias, and at least some foundational numeric/statistics knowledge. In sum, all of this already determines to a large degree what issues researchers want to engage with and in which methodological ways.
Second, digital literacy concerns practical skills in using computers and the Internet for learning and research purposes. Simple things are operating digital devices, navigating interfaces, finding, and filtering information with search engines, using online resources for primary and secondary research etc. More advanced practical skills link to coding literacy (Vee, 2017) – the ability to read and write code (mostly Python or R in the computational social sciences and digital humanities). This is where many educational programs still have a lot of work to do, as equipping students and future researchers with these skills is a considerable logistical challenges for university departments where the necessary expertise in content and didactics is rather scarce. Many researchers in the broader field of the social sciences and humanities who want to use digital and computational methods need to learn how to do so themselves, as formalised training is only gradually being integrated in curricula.
To approach digital literacy holistically, it needs both theory-focused and methods-oriented training; ideally both can be combined (Lindgren, 2020). For example, in a course co-taught by the author, participants were asked to raise a critical question about how social media platforms shape public discourses and conduct empirical research on them. First, they had to explore how tech affordances and discourse cultures affect each other. Then they had to figure out ways to collect data to address their research questions (for example, about how Anti-Asian sentiments are part of COVID-19 discussions on Twitter). So, both conceptual knowledge and practical skills came together. Such exercises illustrate how specific questions can be approached with specific digital and/or computational methods; it showcases what values they offer but also what limitations they have. How to use them may inspire researchers for their own work. Understanding and using digital and computational methods relies on digital literacy but can also be a path for increasing one’s digital literacy.
Using Digital Methods and Computational Methods
While digital methods and computational methods all deal with usually larger amounts of data and apply algorithmic procedures for data analysis, they form two distinguishable classes of methodological approaches. Broadly speaking, the main differences concern data sources, analytical foci, and levels of technical difficulty. However, there are considerable overlaps and often both terms are used interchangeably.
Computational methods are posited here as the broader and more general class of approaches for data-driven empirical research. The use of programming languages such as Python or R for data collection and analysis can serve a wide spectrum of research interests. Once researchers have acquired sufficient expertise and code literacy, they are in a position to build methods and tools that specifically fit their research objectives “from scratch”. One can use Python scripts to analyse anything from social media postings to 18th century French poetry; it is also quite feasible to use one’s programming skills to set up an experiment or conduct surveys all within one self-coded methodological framework. The extent to which different analytical steps are combined within one method, i.e., its complexity but also usability (e.g., does it have an interface or not?) depend on the researcher’s individual skill level. With a programming language such as Python, researchers can build methods that collect, analyse, and visualise data in one seamless process.
Computational skills in this broader sense describe capabilities to use computer code to write computer programs that serve as methods for empirical research with data that is either natively digital (coming from web-based sources, Rogers, 2013) or data that have been digitised (e.g., scanning books and storing them in a database). The entry level for using computational methods can appear high for researchers (and students) who have very little or no experience with programming. However, acquiring these skills can enable them to build customised, case-specific tools and to push method development in their research domain (Brooker, 2020).
Digital methods describe a closely related spectrum of different research strategies that draw from data born from digital media interactions, most of which are happening online. The term ‘digital methods’ was largely coined by Richard Rogers and the Digital Methods Initiative (DMI) at the University of Amsterdam. Simply put, digital methods often exploit the affordances and application programming interfaces (APIs) of existing platforms. For example, the DMI initiative developed a range of API-based tools to collect and analyse data from e.g., social media networks or popular search engines. These tools often come with an interface that researchers can use without any pre-existing coding skills. An example from Canada for this is Netlytic, which allows users to collect limited data from social media platforms such as Twitter and to perform forms of automated content analysis and network analyses. The advantages of accessibility are accompanied by several limitations. First, each tool depends on the respective platforms’ terms and conditions tied to their APIs (which can change any time; however, computational methods face the same challenge). Second, they cannot always be easily adjusted to more specific research needs. They come with the functions that they have and researchers may need to seek additional tools and methods to supplement them.
One could also argue that digital methods in a broader sense further include research methods that build on direct interactions with a digital interface, such as ‘app walkthroughs’ (Light et al, 2018) or ‘datawalking’ (van Es and de Lange, 2020). These focus more on the subjective experience of datafication rather than the (semi-)automated retrieval and analysis of larger data volumes. Digital methods can thus be heavily computerised in form of large-scale data collection and analysis or less computerised and centred on engagement with digital media. Nevertheless, in both cases data (big and small) are won through the digital media in focus.
Computational Methods, Digital Methods, and Critical Data Studies
The lines between computational and digital methods are rather fuzzy and surely not clearly delineated. However, the above distinction highlights the conceptual relationship as well as key differences between the two. In a sense, all digital methods that come in form of programmed tools are the outcome of a computational approach. However, digital methods appear more”end-user” oriented, with all the benefits and limitations that such ready-made methods entail. Deciding when to use which depends, as always, on the research interest and can support both explorative-descriptive research and hypothesis testing. When researching the behaviour of specific populations, especially in online environments, computational and digital methods may generally offer less intrusive approaches.
However, all computational and digital methods face similar challenges and ethical issues, especially when they deal with data retrieved via APIs and/or potentially sensitive data. Data are always context-dependent and have their own histories that need to be considered (who created them, where do they come from, what do they represent); they are communication devices and thus frame parts of (social) reality in specific ways. Furthermore, large quantities of data do not render questions about representativeness obsolete (for example, how representative are Google searches on a society’s view on dating? What do 10.000.000 tweets really represent?).
Regardless of the exact implementation of computational and/or digital methods, researchers need to be critical about what data they collect and what these data represent. It is also advisable to explore how such methods can be combined with other traditional methods in multi-methods research designs, if resources allow. It is important to critically reflect for each new method on 1) how they are embedded in the existing toolset of methods, especially in respect to what they can add and what their limitations, pitfalls, and challenges are; 2) what other tools/methods are available, which one’s do researchers need to build for themselves, and how to make a fitting choice based on the research interest. Furthermore, it is imperative to address questions of reliability, validity, transparency, research ethics but also follow good practice advice in data presentation/data visualisation. Interdisciplinary cooperation can greatly support here the advancement and enhancement of computational and digital methods.
By developing, testing, and critiquing computational and digital methods, researchers in the critical data studies cannot only put their research on a more robust empirical basis with data relevant for their research interests but als ensure that they themselves adhere to the normative standards by which they criticise current data practices in the private as well as public sectors. Concerning concrete research applications, computational and/or digital methods can help critical data studies with empirically charting platformisation trends (e.g., by analysing what software infrastructures enable these developments), conducting experiments that simulate digital environments, explore technology discourses through e.g., automated content analyses and more.
Eventually, researchers in the field will need to connect computational and digital methods with other existing qualitative and qualitative methods to design novel, purpose-specific, and integrative research designs. Different methodological traditions have each their own strengths and limitations but ultimately all of them are valuable as long as they make a clear, logical fit for the research objective(s). It is recommendable to overcome ‘silo-thinking’ and to seek ways to combine different methodological approaches that can place emphasis on either hypothesis-development or hypothesis-testing (Nguyen, 2020). Ultimately, both is needed.
List of References
Flensburg, S., & Lomborg, S. (2021). Datafication research: Mapping the field for a future agenda. New Media & Society. https://doi.org/10.1177/14614448211046616
Brooker, P. D. (2020): Programming with Python for Social Scientist. Sage.
Dobson, J. E. (2020): Critical Digital Humanities. The Search for a Methodology. University of Illinois Press.
Gray, J., Gerlitz, C., & Bounegru, L. (2018). Data infrastructure literacy. Big Data & Society. https://doi.org/10.1177/2053951718786316
Karsdorp, F., Kestemont, M., and Riddell, A. (2021): Humanities Data Analysis. Case Studies with Python. Princeton.
Light, B., Burgess, J., & Duguay, S. (2018). The walkthrough method: An approach to the study of apps. New Media & Society, 20(3), 881–900. https://doi.org/10.1177/1461444816675438
Lindgren, S. (2020): Data Theory. Polity.
Nguyen, D. (2021). Mediatisation and datafication in the global COVID-19 pandemic: on the urgency of data literacy. Media International Australia, 178(1), 210–214. https://doi.org/10.1177/1329878X20947563
Nguyen D. (2020) Media and Communication Studies in the Age of Digitalization and Datafication: How Practical Factors and Research Interests Determine Methodological Choices. In: Nguyen D., Dekker I., Nguyen S. (eds) Understanding Media and Society in the Age of Digitalisation. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-38577-4_2
van Es, K., & de Lange, M. (2020). Data with its boots on the ground: Datawalking as research method. European Journal of Communication, 35(3), 278–289. https://doi.org/10.1177/0267323120922087
Rogers, R. (2013): Digital Methods. MIT Press.
Vee, A. (2017): Code Literacy. MIT Press.