Кто читает:: Департамент социологии

Статус:: Курс по выбору

Когда читается:: 4-й курс, 1, 2 модуль

Преподаватель

Вильховенко Александр Александрович

Full Syllabus Ask Question

Abstract

The course is targeted at undergraduate social science students aiming at careers in data analysis or academia. The course consists of seminars. It covers special prediction and classification models (logistic regression and cluster analysis) and more advanced data management topics such as web scraping and data imputation. The course discusses data culture, from data management to coding styles to narrating with data and Bayesian statistics. Another feature of that course is to learn the coding instruments which helps to create tidy analysis. It is also a starting point for students interested in pursuing advanced training in research methods or planning to use quantitative methods with categorical outcomes in their own research.

Learning Objectives

The course covers the foundations and popular techniques of categorical data analysis with the goal of training students to be informed producers and consumers of quantitative research.

Expected Learning Outcomes

Students can apply classification techniques, propose hypotheses and choose the methods in categorical data analysis in R, including supervised classification with a binary outcome, and unsupervised classification with clustering techniques of mixed data types.
Students create customized R Markdown reports.
Students create reproducible analysis scripts.
Students define basic terms and identify the purposes of Bayesian inference vs frequentist inference.
Students describe known problems with the null hypothesis statistical testing and propose known solutions to them.
Students inspect missing data patterns and apply various methods of data imputation.
Students interpret the results and assess the quality of proposed analytical and visualization solutions, provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of models and data stories.
Students propose and apply tools for reproducible and ethical data analysis.
Students scrape simple web tables and texts with R and convert them into standard data formats.

Course Contents

Coding with Style
Web Scraping
Binary Logistic Regression
Missing Data
Cluster Analysis
Data Culture and Data Acumen

Assessment Elements

Final Exam
The exam is conducted in the form of an online test, in which students will have to answer questions on all topics covered. Question blocks include topics on PCA, Logistics Model Diagnostics, Web Data Collection, Bayesian Thinking, Data Culture, Reproducible Science.
Data Imputation
The binary logistic regression project is another project to be evaluated separately.
Binary Outcome project
The project includes an extra part for additional points (up to 2 points, counts if the main task is complete): use one or more decision tree methods to compare and contrast the quality of both logistic regression and decision tree solutions. Compare the performance of the two methods and make a conclusion about which of them performs better here.
Coding reflection paper
This is a non-compulsory task for extra points.
Bayes reaction paper
This is a non-compulsory task for extra points.
Cluster Analysis project
Dimension Reduction project
Web scraping
Rmd Customization

Interim Assessment

2023/2024 2nd module
Binary Outcome project * 0.25 + Cluster Analysis project * 0.2 + Coding reflection paper * 0.05 + Rmd Customization * 0.1 + Web scraping * 0.1 + Bayes reaction paper * 0.1 + Final Exam * 0.1 + Data Imputation * 0.1

Bibliography

Recommended Core Bibliography

Baker, M. (2015). Reproducibility crisis: Blame it on the antibodies. Nature, 521(7552), 274–276. https://doi.org/10.1038/521274a
Ledolter, J. (2013). Data Mining and Business Analytics with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=587979
Munzert, S. (2014). Automated Data Collection with R : A Practical Guide to Web Scraping and Text Mining. HobokenChichester, West Sussex, United Kingdom: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=878670
Upton, G. J. G. (2016). Categorical Data Analysis by Example. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402878
Wickham, H. (2015). Advanced R, Second Edition. Boca Raton, FL: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=934735
Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Recommended Additional Bibliography

9781439898208 - Andrew Gelman , John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin - Bayesian Data Analysis, Third Edition - 2013 - Chapman & Hall/CRC Press - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1763244 - nlebk - 1763244
9781482253467 - McElreath, Richard - Statistical Rethinking : A Bayesian Course with Examples in R and Stan - 2015 - Chapman and Hall/CRC - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1338291 - nlebk - 1338291
Hadley, W. (2016). Ggplot2 : Elegant Graphics for Data Analysis. New York, NY: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1175341
Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (Vol. Second edition). Hoboken: Wiley-Interscience. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=838162
Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67–82. https://doi.org/10.1093/esr/jcp006
Seppe vanden Broucke, & Bart Baesens. (2018). Practical Web Scraping for Data Science : Best Practices and Examples with Python. Apress.

Authors

VILKHOVENKO ALEKSANDR ALEKSANDROVICH
Ильина Мария Ивановна
SHIROKANOVA ANNA ALEKSANDROVNA

Бакалаврская программа «Социология и социальная информатика»

Контакты

Коронавирус Live

Advanced Data Analysis