• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Advanced Data Analysis

2021/2022
Academic Year
ENG
Instruction in English
5
ECTS credits
Course type:
Elective course
When:
4 year, 1, 2 module

Instructor

Course Syllabus

Abstract

The course is targeted at undergraduate social science students aiming at careers in data analysis or academia. The course consists of seminars. It covers special prediction and classification models (logistic regression and cluster analysis) and more advanced data management topics such as web scraping and data imputation. The course discusses data culture, from data management to coding styles to narrating with data and Bayesian statistics. Another feature of that course is to learn the coding instruments which helps to create tidy analysis. It is also a starting point for students interested in pursuing advanced training in research methods or planning to use quantitative methods with categorical outcomes in their own research.
Learning Objectives

Learning Objectives

  • The course covers the foundations and popular techniques of categorical data analysis with the goal of training students to be informed producers and consumers of quantitative research.
Expected Learning Outcomes

Expected Learning Outcomes

  • Students can apply classification techniques, propose hypotheses and choose the methods in categorical data analysis in R, including supervised classification with a binary outcome, and unsupervised classification with clustering techniques of mixed data types.
  • Students create customized R Markdown reports.
  • Students create reproducible analysis scripts.
  • Students define basic terms and identify the purposes of Bayesian inference vs frequentist inference.
  • Students describe known problems with the null hypothesis statistical testing and propose known solutions to them.
  • Students inspect missing data patterns and apply various methods of data imputation.
  • Students interpret the results and assess the quality of proposed analytical and visualization solutions, provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of models and data stories.
  • Students propose and apply tools for reproducible and ethical data analysis.
  • Students scrape simple web tables and texts with R and convert them into standard data formats.
Course Contents

Course Contents

  • Coding with Style
  • Web Scraping
  • Binary Logistic Regression
  • Missing Data
  • Cluster Analysis
  • Data Culture and Data Acumen
Assessment Elements

Assessment Elements

  • non-blocking Binary Outcome project
    The project includes an extra part for additional points (up to 2 points, counts if the main task is complete): use one or more decision tree methods to compare and contrast the quality of both logistic regression and decision tree solutions. Compare the performance of the two methods and make a conclusion about which of them performs better here.
  • non-blocking Dimension Reduction project
  • non-blocking Cluster Analysis project
  • non-blocking Coding reflection paper
    This is a non-compulsory task for extra points.
  • non-blocking Rmd Customization
  • non-blocking Web scraping
  • non-blocking Bayes reaction paper
    This is a non-compulsory task for extra points.
  • non-blocking Final Exam
  • non-blocking Data Imputation
    The binary logistic regression project is another project to be evaluated separately.
Interim Assessment

Interim Assessment

  • 2021/2022 2nd module
    0.1 * Web scraping + 0.2 * Cluster Analysis project + 0.05 * Coding reflection paper + 0.1 * Dimension Reduction project + 0.05 * Bayes reaction paper + 0.25 * Binary Outcome project + 0.1 * Final Exam + 0.1 * Data Imputation + 0.05 * Rmd Customization
Bibliography

Bibliography

Recommended Core Bibliography

  • Ledolter, J. (2013). Data Mining and Business Analytics with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=587979
  • Munzert, S. (2014). Automated Data Collection with R : A Practical Guide to Web Scraping and Text Mining. HobokenChichester, West Sussex, United Kingdom: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=878670
  • Upton, G. J. G. (2016). Categorical Data Analysis by Example. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402878
  • Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Recommended Additional Bibliography

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (Vol. Second edition). Hoboken: Wiley-Interscience. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=838162
  • McElreath, R. (2016). Statistical Rethinking : A Bayesian Course with Examples in R and Stan. Boca Raton: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1338291
  • Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67–82. https://doi.org/10.1093/esr/jcp006
  • Seppe vanden Broucke, & Bart Baesens. (2018). Practical Web Scraping for Data Science : Best Practices and Examples with Python. Apress.