• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Data Analysis in Sociology

Academic Year
Instruction in English
ECTS credits
Course type:
Compulsory course
3 year, 3, 4 module

Course Syllabus


This course lasts for three years. The 2nd year provide an intermediate-advanced statistical analysis for quantitative research in sociology. In the 2nd year, the course covers two main topics: factor analysis and statistical prediction, including linear regression and structural equation modelling. We also discuss key issues in statistical analysis, such as creating indices and identifying causality based on the results of the analysis. The course covers the building blocks of quantitative data analysis with the goal of training students to be informed consumers and producers of quantitative research. This course is also the starting point for students interested in pursuing advanced methods training or planning to use quantitative methods in their own research.
Learning Objectives

Learning Objectives

  • develop skills necessary to solve typical problems in analysing social data in R software environment
Expected Learning Outcomes

Expected Learning Outcomes

  • Conduct statistical analyses in RStudio
  • Choose appropriate methods and techniques for certain types of variables and certain aims of the analysis
  • Give meaningful interpretation of statistical results: regression coefficients, tables, plots and diagrams (produced in R)
  • Perform data transformations
  • Represent graphically the results of the statistical analyses
  • Create analytical reports describing all the stages of analysis and interpreting its results
Course Contents

Course Contents

  • Introduction to GLM
    Covariance and correlation. Basic concepts and logics of linear regression and GLM.
  • Linear regression: OLS. Diagnostics
    OLS estimator of linear regression, interpretation and statistic test of OLS estimators, fitted values and residuals, R-squared, addressing nonlinearity in linear regression framework, standardized coefficients, drawing plots, practice in R.
  • Linear regression: Interaction effects
    Main and multiplicative effects in regression models. Interaction effects, additive effects. Interpreting results. Choosing best model. Practice in R.
  • Exploratory factor analysis
    Dimensionality reduction. Manifest and latent variables. Factors, graphical representation of factors. Exploratory factor analysis. Factor scores, factor space, types of rotation. Optimal number of factors. Interpretation of the results. Creating indices based on factor analysis. Practice in R.
  • Confirmatory factor analysis
    Difference between exploratory and confirmatory factor analyses. Factor structure. Testing your (or somebody else’s) scales. Types of latent variables. Constructing factor model in lavaan package. Calculation of degrees of freedom, minimal number of cases. Non-correlated and correlated latent factors. Interpreting results. Model diagnostics. Cronbach’s alpha. Practice in R.
  • Introduction in SEM
    Structural equation modeling as extension of confirmatory factor analysis. Exogenous and endogenous variables. Testing causal assumptions. Partial correlation, heterogeneous correlations (polychoric, tetrachoric and polyserial correlations). Practice in R.
  • SEM: model specification
    Formulating theory-based causal hypotheses. Causal inference. Specification concepts. Mediation and moderation effects. Measurement error: correlated and uncorrelated. Practice in R.
  • Path analysis
    Concept of “path”. Path analysis: only observed variables. Graphical representation. Identification of path model. Estimation of structural equation model. Model fit. Degrees of freedom, number of cases. Meaning of the indices. Corrected chi-square measures. Interpreting the results. Practice in R.
  • SEM with latent variables
    Introducing latent factors in the model. Identification of SEM. Estimation of structural equation model. Model fit. Meaning of the fit indices. Model modification. Interpreting the results. Practice in R.
  • Putting it all together
    Implementing all the methods to the real-life research. Combining factor analysis and regression analysis. Using SEM to test theoretical assumptions about causality. Advantages and disadvantages of the methods.
Assessment Elements

Assessment Elements

  • non-blocking Practical tasks
    After each seminar, students are assigned a practical task which should be completed until Friday, 12 p.m.
  • non-blocking Project1
    Project. There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Exam
  • non-blocking DataCamp
  • non-blocking Project2
    There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Project 3
    A project dedicated to the topics of causal modeling (SEM). One week will be given to prepare and submit your paper.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.2 * DataCamp + 0.1 * Exam + 0.15 * Practical tasks + 0.15 * Project 3 + 0.2 * Project1 + 0.2 * Project2


Recommended Core Bibliography

  • Agresti, A. (2013). Categorical Data Analysis (Vol. Third edition). Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=769330
  • Agresti, A., & Finlay, B. (2014). Statistical Methods for the Social Sciences: Pearson New International Edition (Vol. Pearson new international ed., 4. ed). Harlow England: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418314
  • Denis, D. J. (2016). Applied Univariate, Bivariate, and Multivariate Statistics. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1091881
  • Kline, R. B. (2016). Principles and Practice of Structural Equation Modeling, Fourth Edition (Vol. Fourth edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1078917
  • Stowell, S. (2014). Using R for Statistics. Berkeley, CA: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1174344
  • Tabachnick, B. G., & Fidell, L. S. (2014). Using Multivariate Statistics: Pearson New International Edition (Vol. 6th ed). Harlow, Essex: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418064

Recommended Additional Bibliography

  • Beh, E. J., & Lombardo, R. (2014). Correspondence Analysis : Theory, Practice and New Strategies. Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=842814
  • Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research, Second Edition (Vol. Second edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=831411
  • Crawley, M. J. (2013). The R Book (Vol. Second Edition). Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=531630
  • Little, T. D. (2013). The Oxford Handbook of Quantitative Methods. Oxford: Oxford University Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=603942