Delivered at:: Department of Sociology

Course type:: Elective course

When:: 4 year, 1, 2 module

Instructor

Shirokanova, Anna

Full Syllabus Ask Question

Abstract

The course is targeted at undergraduate social science students aiming at careers in data analysis or academia. The course consists of seminars. It covers special prediction and classification models (logistic regression and cluster analysis) and more advanced data management topics such as web scraping and data imputation. The course discusses data culture, from data management to coding styles to narrating with data and Bayesian statistics. Another feature of that course is to learn the coding instruments which helps to create tidy analysis. It is also a starting point for students interested in pursuing advanced training in research methods or planning to use quantitative methods with categorical outcomes in their own research.

Learning Objectives

The course covers the foundations and popular techniques of categorical data analysis with the goal of training students to be informed producers and consumers of quantitative research.

Expected Learning Outcomes

Students can apply classification techniques, propose hypotheses and choose the methods in categorical data analysis in R, including supervised classification with a binary outcome, and unsupervised classification with clustering techniques of mixed data types.
Students create customized R Markdown reports.
Students create reproducible analysis scripts.
Students define basic terms and identify the purposes of Bayesian inference vs frequentist inference.
Students describe known problems with the null hypothesis statistical testing and propose known solutions to them.
Students inspect missing data patterns and apply various methods of data imputation.
Students interpret the results and assess the quality of proposed analytical and visualization solutions, provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of models and data stories.
Students propose and apply tools for reproducible and ethical data analysis.
Students scrape simple web tables and texts with R and convert them into standard data formats.

Course Contents

Coding with Style
Web Scraping
Binary Logistic Regression
Missing Data
Cluster Analysis
Data Culture and Data Acumen

Assessment Elements

Binary Outcome project
The project includes an extra part for additional points (up to 2 points, counts if the main task is complete): use one or more decision tree methods to compare and contrast the quality of both logistic regression and decision tree solutions. Compare the performance of the two methods and make a conclusion about which of them performs better here.
Dimension Reduction project
Cluster Analysis project
Coding reflection paper
This is a non-compulsory task for extra points.
Rmd Customization
Web scraping
Bayes reaction paper
This is a non-compulsory task for extra points.
Final Exam
Data Imputation
The binary logistic regression project is another project to be evaluated separately.

Interim Assessment

2021/2022 2nd module
0.1 * Web scraping + 0.2 * Cluster Analysis project + 0.05 * Coding reflection paper + 0.1 * Dimension Reduction project + 0.05 * Bayes reaction paper + 0.25 * Binary Outcome project + 0.1 * Final Exam + 0.1 * Data Imputation + 0.05 * Rmd Customization

Bibliography

Recommended Core Bibliography

Ledolter, J. (2013). Data Mining and Business Analytics with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=587979
Munzert, S. (2014). Automated Data Collection with R : A Practical Guide to Web Scraping and Text Mining. HobokenChichester, West Sussex, United Kingdom: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=878670
Upton, G. J. G. (2016). Categorical Data Analysis by Example. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402878
Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Recommended Additional Bibliography

Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (Vol. Second edition). Hoboken: Wiley-Interscience. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=838162
McElreath, R. (2016). Statistical Rethinking : A Bayesian Course with Examples in R and Stan. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1338291
Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67–82. https://doi.org/10.1093/esr/jcp006
Seppe vanden Broucke, & Bart Baesens. (2018). Practical Web Scraping for Data Science : Best Practices and Examples with Python. Apress.

Authors

SHIROKANOVA ANNA ALEKSANDROVNA

Bachelor’s Programme 'Sociology and Social Informatics'

Coronavirus Live

Advanced Data Analysis