• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Data Analysis in Sociology

2019/2020
Academic Year
ENG
Instruction in English
2
ECTS credits
Course type:
Compulsory course
When:
4 year, 3 module

Instructor

Course Syllabus

Abstract

The Data Analysis in Sociology in the 4th year of the Program focuses on categorical data and covers special types of prediction and classification models (logistic regression and cluster analysis). The course finishes with a discussion of data culture and data acumen, from data management to inference and prediction. This course is also the starting point for students interested in pursuing advanced training in research methods or planning to use quantitative methods with categorical outcomes in their own research.
Learning Objectives

Learning Objectives

  • The course covers the foundations and popular techniques of quantitative data analysis with the goal of training students to be informed producers and consumers of quantitative research.
Expected Learning Outcomes

Expected Learning Outcomes

  • Students can carry out statistical analyses of a data set, propose hypotheses and choose the methods needed to reach the goals, interpret the results and assess the quality of proposed solutions. Students provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of their own and others’ models.
  • Students can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results.
  • Students can generalize and analyze the materials they read, assess it critically, express their own opinions and give their interpretation.
  • Students can apply a theoretical framework to define hypotheses and explain the results of a study; they can apply appropriate statistical models and generalize the results.
Course Contents

Course Contents

  • Topic 18. Binary logistic regression
    Models for categorical outcome variables. Variety of goals of analysis with categorical data. Typical goals of analysis and interpretation of results. Binary logistic regression. Objectives of logistic regression. The logistic curve. Maximum likelihood estimation. Assumptions of logistic regression. Perfect separation. Transforming a probability into odds and logit values. Goodness-of-fit measures for logistic regression. Out-of-sample validation. Classification matrix. Interpretation of results with linear and dichotomous predictors. Stepwise model building. Model diagnostics. Binary logistic regression in R.
  • Topic 19. Cluster analysis
    Objectives of cluster analysis: segmentation, taxonomy description, data simplification, and relationship identification. Conceptual framework for cluster analysis. Distance between objects. Similarity measures. Distance measures for various types of variables. Proximity matrix. Assumptions of cluster analysis. Hierarchical and non-hierarchical clustering algorithms. K-means clustering, DBSCAN clustering. Dendrograms. Measures of overall fit. Cluster profiles. Between- and within-cluster variation. Determining the number of clusters. Interpretation of clusters. Cross-classification from several solutions. Cluster analysis in R.
  • Topic 20. Data culture and data acumen
    Building data acumen: making meaningful, correct and useful judgments about data. Privacy and ethical concerns in data analysis and research. Data culture areas: data life-cycle, data curation, understanding causality, understanding conditional and joint probabilities, false negatives and false positives, critical assessment of popular practices and further use of R functionalities to make sense of the data. The data life-cycle: generation, collection, processing, management, analysis, visualization, interpretation, and delivery.
  • Topic 21. Data management. Revised variable types
    Getting and cleaning data in R. Data curation. Delivering results in applications. Data simulation for hypothesis testing. Stevens’ typology of data and the meaningfulness of scaling. Alternative scale taxonomies. Transforming data values to simplify the structure. Research questions and data types.
  • Topic 22. Understanding causality and prediction
    Computational (predictive) versus statistical (inferential) thinking. Association and causation. Learning from data and effective communication to decision-makers. The description-prediction-prescription framework and its critique. Data modeling culture. The trend toward prediction. Data science language. Problems in current data modeling. The replication crisis: false-negative results, selection bias, p-value misuse.
Assessment Elements

Assessment Elements

  • non-blocking Project 1
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 2
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Written Exam
    The exam consists of two problems involving the methods covered in this course.
  • non-blocking Test 1
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 2
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
Interim Assessment

Interim Assessment

  • Interim assessment (3 module)
    0.25 * Project 1 + 0.25 * Project 2 + 0.1 * Test 1 + 0.1 * Test 2 + 0.3 * Written Exam
Bibliography

Bibliography

Recommended Core Bibliography

  • Ledolter, J. (2013). Data Mining and Business Analytics with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=587979
  • Upton, G. J. G. (2016). Categorical Data Analysis by Example. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402878

Recommended Additional Bibliography

  • Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67–82. https://doi.org/10.1093/esr/jcp006
  • Valentin Amrhein, David Trafimow, & Sander Greenland. (2019). Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician, (S1), 262. https://doi.org/10.1080/00031305.2018.1543137