Computational Analysis of Communication (eBook)
John Wiley & Sons (Verlag)
978-1-119-68028-4 (ISBN)
Provides clear guidance on leveraging computational techniques to answer social science questions
In disciplines such as political science, sociology, psychology, and media studies, the use of computational analysis is rapidly increasing. Statistical modeling, machine learning, and other computational techniques are revolutionizing the way electoral results are predicted, social sentiment is measured, consumer interest is evaluated, and much more. Computational Analysis of Communication teaches social science students and practitioners how computational methods can be used in a broad range of applications, providing discipline-relevant examples, clear explanations, and practical guidance.
Assuming little or no background in data science or computer linguistics, this accessible textbook teaches readers how to use state-of-the art computational methods to perform data-driven analyses of social science issues. A cross-disciplinary team of authors-with expertise in both the social sciences and computer science-explains how to gather and clean data, manage textual, audio-visual, and network data, conduct statistical and quantitative analysis, and interpret, summarize, and visualize the results. Offered in a unique hybrid format that integrates print, ebook, and open-access online viewing, this innovative resource:
- Covers the essential skills for social sciences courses on big data, data visualization, text analysis, predictive analytics, and others
- Integrates theory, methods, and tools to provide unified approach to the subject
- Includes sample code in Python and links to actual research questions and cases from social science and communication studies
- Discusses ethical and normative issues relevant to privacy, data ownership, and reproducible social science
- Developed in partnership with the International Communication Association and by the editors of Computational Communication Research
Computational Analysis of Communication is an invaluable textbook and reference for students?taking computational methods courses in social sciences, and for professional social scientists looking to incorporate computational methods into their work.
Dr. Wouter van Atteveldt is an Associate Professor of Political Communication at Vrije Universiteit, Amsterdam. He is co-founder of the Computational Methods division of the International Communication Association, and Founding Chief Editor of Computational Communication Research. He has published extensively on innovative methods for analyzing political text and contributed to a number of relevant R and Python packages.
Dr. Damian Trilling is an Associate Professor, Department of Communication Science, at the University of Amsterdam, and Associate Editor of Computational Communication Research. His research uses computational methods such as the analysis of digital trace data and large-scale text analysis to study the use and effects of news media. He has developed extensive teaching materials to introduce social scientists to the Python programming language.
Dr. Carlos Arcila Calderón is an Associate Professor, Department of Sociology and Communication, at the University of Salamanca, Chief Editor of the journal Disertaciones, and member of the Editorial Board of Computational Communication Research. He has published extensively on new media and social media studies, and has led the prototype Autocop, a Spark-based environment to run distributed supervised sentiment analysis of Twitter messages.
Dr. Wouter van Atteveldt is an Associate Professor of Political Communication at Vrije Universiteit, Amsterdam. He is co-founder of the Computational Methods division of the International Communication Association, and Founding Chief Editor of Computational Communication Research. He has published extensively on innovative methods for analyzing political text and contributed to a number of relevant R and Python packages. Dr. Damian Trilling is an Associate Professor, Department of Communication Science, at the University of Amsterdam, and Associate Editor of Computational Communication Research. His research uses computational methods such as the analysis of digital trace data and large-scale text analysis to study the use and effects of news media. He has developed extensive teaching materials to introduce social scientists to the Python programming language. Dr. Carlos Arcila Calderón is an Associate Professor, Department of Sociology and Communication, at the University of Salamanca, Chief Editor of the journal Disertaciones, and member of the Editorial Board of Computational Communication Research. He has published extensively on new media and social media studies, and has led the prototype Autocop, a Spark-based environment to run distributed supervised sentiment analysis of Twitter messages.
1
Introduction
Abstract
This chapter explains how the methods outlined in this book are situated within the methodological and epistemological frameworks used by social scientists. It argues why the use of Python and R is fundamental for the computational analysis of communication. Finally, it shows how this book can be used by students and scholars.
Keywords computational social science, Python, R
- Understand the role of computational analysis in the social sciences
- Understand the choice between Python and/or R
- Know how to read this book
1.1 The Role of Computational Analysis in the Social Sciences
The use of computers is nothing new in the social sciences. In fact, one could argue that some disciplines within the social sciences have even been early adopters of computational approaches. Take the gathering and analyzing of large-scale survey data, dating back to the use of the Hollerith Machine in the 1890 US census. Long before every scholar had a personal computer on their desk, social scientists were using punch cards and mainframe computers to deal with such data. If we think of the analysis of communication more specifically, we already see attempts to automate content analysis in the 1960’s (see, e.g. Scharkow, 2017).
However, something has profoundly changed in recent decades. The amount and type of data we can collect as well as the computational power we have access to have increased dramatically. In particular, digital traces that we leave when communicating online, from access logs to comments we place, have required new approaches (e.g., Trilling, 2017). At the same time, better computational facilities now allow us to ask questions we could not answer before.
González-Bailón (2017), for instance, argued that the computational analysis of communication now allows us to test theories that were formulated a century ago, such as Tarde’s theory of social imitation. Salganik (2019) tells an impressive methodological story of continuity in showing how new digital research methods build on and relate to established methods such as surveys and experiments, while offering new possibilities by observing behavior in new ways.
A frequent misunderstanding, then, about computational approaches is that they would somehow be a-theoretical. This is probably fueled by clichés coined during the “Big Data”-hype in the 2010’s, such as the infamous saying that in the age of Big Data, correlation is enough (Mayer-Schönberger and Cukier, 2013); but one could not be more wrong: as the work of Kitchin (2014a, b) shows, computational approaches can be well situated within existing epistemologies. For the field to advance, computational and theoretical work should be symbiotic, with each informing the other and with neither claiming superiority Margolin, 2019). Thus, the computational scientists’ toolbox includes both more data-driven and more theory-driven techniques; some are more bottom-up and inductive, others are more top-down and deductive. What matters here, and what is often overlooked, is in which stage of the research process they are employed. In other words, both inductive and deductive approaches as they are distinguished in more traditional social-science textbooks (e.g., Bryman, 2012) have their equivalent in the computational social sciences.
Therefore, we suggest that the data collection and data analysis process is thought of as a pipeline. To test, for instance, a theoretically grounded hypothesis about personalization in the news, we could imagine a pipeline that starts with scraping online news, proceeds with some natural-language processing techniques such as Named Entity Recognition, and finally tests whether the mentioning of persons has an influence on the placement of the stories. We can distinguish here between parts of the pipeline that are just necessary but not inherently interesting to us, and parts of the pipeline that answer a genuinely interesting question. In this example, the inner workings of the Named Entity Recognition step are not genuinely interesting for us – we just need to do it to answer our question. We do care about how well it works and especially which biases it may have that could affect our substantive outcomes, but we are not really evaluating any theory on Named Entity Recognition here. We are, however, answering a theoretically interesting question when we look at the pipeline as a whole, that is, when we apply the tools in order to tackle a social scientific problem. Of course, what is genuinely interesting depends on one’s discipline: For a computational linguist, the inner workings of the named entity recognition may actually be the interesting part, and our research question just one possible “downstream task”.
This distinction is also sometimes referred to as “building a better mousetrap” versus “understanding”. For instance, Breiman (2001) remarked: “My attitude toward new and/or complicated methods is pragmatic. Prove that you’ve got a better mousetrap and I’ll buy it. But the proof had better be concrete and convincing.” (p. 230). In contrast, many social scientists are using statistical models to test theories and to understand social processes: they want to specifically understand how x relates to y, even if y may be better predicted by another (theoretically uninteresting) variable.
This book is to some extent about both building mousetraps and understanding. When you are building a supervised machine learning classifier to determine the topic of each text in a large collection of news articles or parliamentary speeches, you are building a (better) mousetrap. But as a social scientist, your work does not stop there. You need to use the mousetrap to answer some theoretically interesting question.
Actually, we expect that the contents of this book will provide a background that helps you to face the current research challenges in both academia and industry. On the one hand, the emerging field of Computational Social Science has become one of the most promising areas of knowledge and many universities and research institutes are looking for scholars with this profile. On the other hand, it is widely known that nowadays the computational skills will increase your job opportunities in private companies, public organizations or NGOs, given the growing interest in data-driven solutions.
When planning this book, we needed to make a couple of tough choices. We aimed to at least give an introduction to all techniques that students and scholars who want to computationally analyze communication will probably be confronted with. Of course, specific – technical – literature on techniques such as, for instance, machine learning can cover the subject in more depth, and the interested student may indeed want to dive into one or several of the techniques we cover more deeply. Our goal here is to offer enough working knowledge to apply these techniques and to know what to look for. While trying to cover the breadth of the field without sacrificing too much depth when covering each technique, we still needed to draw some boundaries. One technique that some readers may miss is agent-based modeling (ABM). Arguably, such simulation techniques are an important technique in the computational social sciences more broadly (Cioffi-Revilla, 2014), and they have recently been applied to the analysis of communication as well (Waldherr, 2014, Wettstein, 2020). Nevertheless, when reviewing the curricula of current courses teaching the computational analysis of communication, we found that simulation approaches do not seem to be at the core of such analyses (yet). Instead, when looking at the use of computational techniques in fields such as journalism studies (e.g., Boumans and Trilling, 2016), media studies (e.g., Rieder, 2017), or the text-as-data movement (Grimmer and Stewart, 2013), we see a core of techniques that are used over and over again, and that we have therefore included in our book. In particular, besides general data analysis and visualization techniques, these are techniques for gathering data such as web scraping or the use of API’s; techniques for dealing with text such as natural language processing and different ways to turn text into numbers; supervised and unsupervised machine learning techniques; and network analysis.
1.2 Why Python and/or R?
By far most work in the computational social sciences is done using Python and/or R. Sure, for some specific tasks there are standalone programs that are occasionally used; and there are some useful applications written in other languages such as C or Java. But we believe it is fair to say that it is very hard to delve into the computational analysis of communication without learning at least either Python or R, and preferably both of them. There are very few tasks that you cannot do with at least one of them.
Some people have strong beliefs as to which language is “better” – we do not subscribe to that view. Most techniques that are relevant to us can be done in either language, and personal preference is a big factor. R started out as a statistical programming environment, and that heritage is still visible, for instance in the strong emphasis on vectors, factors, et cetera, or the possibility to estimate complex statistical models in just one line of code. Python started out as a general-purpose programming language, which means that some of the things we do feel a bit more “low-level” – Python abstracts away less of the underlying...
| Erscheint lt. Verlag | 10.3.2022 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik |
| Sozialwissenschaften ► Kommunikation / Medien | |
| Sozialwissenschaften ► Politik / Verwaltung | |
| Sozialwissenschaften ► Soziologie | |
| Schlagworte | Communication & Media Studies • Communication Analysis • communication science analysis • Communication Studies • computational analysis methods • computational analysis social science • computational analysis social science textbook • how to use computational analysis • Kommunikationswissenschaft • Kommunikation u. Medienforschung • social science data analysis • Statistics • Statistics for Social Sciences • Statistik • Statistik in den Sozialwissenschaften |
| ISBN-10 | 1-119-68028-X / 111968028X |
| ISBN-13 | 978-1-119-68028-4 / 9781119680284 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich