Кто читает:: Департамент социологии

Статус:: Курс обязательный

Когда читается:: 1-й курс, 2 модуль

Преподаватель

Маслинский Кирилл Александрович

Full Syllabus Ask Question

Abstract

For social and political sciences, written text provide essential data for studying ideology and political discourse, conflict, sentiment and political affiliation, among many other things. With a growing availability of larger collections of text in digital form it is tempting to scale the research up in terms of the population studied (e.g. “all of twitter”), time spans (e.g. “all of the American history”), and geographical scope (e.g. “all foreign ties of China”). Computational methods for text analysis promise to aid at the scale where traditional conetnt analysis is not feasible. We will use R programming environment as a toolbox for text analysis. To “learn by doing” we will work with real text collections and will replicate some methods from the recent social research employing computational text analysis.

Learning Objectives

The goal of the course is to provide basic understanding on how to properly use collections of texts as quantitative evidence, and to make this knowledge practical.

Expected Learning Outcomes

Being able to adequately interpret and report the results of computational text analysis in research papers.
Understanding possibilities of the automated text analysis as well as its pitfalls and important caveats about applying statistical tests to language data.
Being able to apply computational methods of text analysis (e.g. analysis of word frequency and co-occurrence, document classification, topic modeling) to collections of texts.

Course Contents

Counting words — Preprocessing: transforming text into data in R.
Comparing corpora — Comparing word usage in contrast corpora.
Document-level modeling — Vector space model. Document classification.
Co-occurrence — Distributional semantics.
Dictionary methods — Sentiment.

Assessment Elements

Homework
Class participation
Final project

Interim Assessment

2021/2022 2nd module
0.5 * Homework + 0.2 * Class participation + 0.3 * Final project

Bibliography

Recommended Core Bibliography

Grimmer, J., & Stewart, B. M. (2013). Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 3, 267.
Rule, A., Cointet, J.-P., Bearman, P. S., ISSN: 0027-8424 ; EISSN: 1091-6490 ; Proceedings of the National Academy of Sciences of the United States of America ; https://hal.inrae.fr/hal-02636957 ; Proceedings of the National Academy of Sciences of the United States of America, National Academy of Sciences, 2015, & 112 (35). (2015). Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790-2014. ISSN: 0027-8424. https://doi.org/10.1073/pnas.1512221112
Zhai, C., & Aggarwal, C. C. (2012). Mining Text Data. New York: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=537386

Recommended Additional Bibliography

Grimmer, J. (2010). A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases. Political Analysis, 1, 1.
Hamilton, W. L., Clark, K., Leskovec, J., & Jurafsky, D. (2016). Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora.
Lipton, Z. C. (2016). The Mythos of Model Interpretability. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.E8C74632

Authors

MASLINSKIY KIRILL ALEKSANDROVICH

Магистерская программа «Анализ данных для государства и общества»

Контакты

Контакт-центр

Text Mining and Natural Language Processing