• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Бакалаврская программа «Социология и социальная информатика»

Data Analysis in Sociology

2020/2021
Учебный год
ENG
Обучение ведется на английском языке
4
Кредиты
Статус:
Курс обязательный
Когда читается:
3-й курс, 3, 4 модуль

Course Syllabus

Abstract

This course lasts for three years. The 2nd year provide an intermediate-advanced statistical analysis for quantitative research in sociology. In the 2nd year, the course covers two main topics: factor analysis and statistical prediction, including linear regression and structural equation modelling. We also discuss key issues in statistical analysis, such as creating indices and identifying causality based on the results of the analysis. The course covers the building blocks of quantitative data analysis with the goal of training students to be informed consumers and producers of quantitative research. This course is also the starting point for students interested in pursuing advanced methods training or planning to use quantitative methods in their own research.
Learning Objectives

Learning Objectives

  • develop skills necessary to solve typical problems in analysing social data in R software environment
Expected Learning Outcomes

Expected Learning Outcomes

  • Conduct statistical analyses in RStudio
  • Choose appropriate methods and techniques for certain types of variables and certain aims of the analysis
  • Give meaningful interpretation of statistical results: regression coefficients, tables, plots and diagrams (produced in R)
  • Perform data transformations
  • Represent graphically the results of the statistical analyses
  • Create analytical reports describing all the stages of analysis and interpreting its results
Course Contents

Course Contents

  • Introduction to GLM
    Covariance and correlation. Basic concepts and logics of linear regression and GLM.
  • Linear regression: OLS. Diagnostics
    OLS estimator of linear regression, interpretation and statistic test of OLS estimators, fitted values and residuals, R-squared, addressing nonlinearity in linear regression framework, standardized coefficients, drawing plots, practice in R.
  • Linear regression: Interaction effects
    Main and multiplicative effects in regression models. Interaction effects, additive effects. Interpreting results. Choosing best model. Practice in R.
  • Exploratory factor analysis
    Dimensionality reduction. Manifest and latent variables. Factors, graphical representation of factors. Exploratory factor analysis. Factor scores, factor space, types of rotation. Optimal number of factors. Interpretation of the results. Creating indices based on factor analysis. Practice in R.
  • Confirmatory factor analysis
    Difference between exploratory and confirmatory factor analyses. Factor structure. Testing your (or somebody else’s) scales. Types of latent variables. Constructing factor model in lavaan package. Calculation of degrees of freedom, minimal number of cases. Non-correlated and correlated latent factors. Interpreting results. Model diagnostics. Cronbach’s alpha. Practice in R.
  • Introduction in SEM
    Structural equation modeling as extension of confirmatory factor analysis. Exogenous and endogenous variables. Testing causal assumptions. Partial correlation, heterogeneous correlations (polychoric, tetrachoric and polyserial correlations). Practice in R.
  • SEM: model specification
    Formulating theory-based causal hypotheses. Causal inference. Specification concepts. Mediation and moderation effects. Measurement error: correlated and uncorrelated. Practice in R.
  • Path analysis
    Concept of “path”. Path analysis: only observed variables. Graphical representation. Identification of path model. Estimation of structural equation model. Model fit. Degrees of freedom, number of cases. Meaning of the indices. Corrected chi-square measures. Interpreting the results. Practice in R.
  • SEM with latent variables
    Introducing latent factors in the model. Identification of SEM. Estimation of structural equation model. Model fit. Meaning of the fit indices. Model modification. Interpreting the results. Practice in R.
  • Putting it all together
    Implementing all the methods to the real-life research. Combining factor analysis and regression analysis. Using SEM to test theoretical assumptions about causality. Advantages and disadvantages of the methods.
Assessment Elements

Assessment Elements

  • non-blocking Practical tasks
    After each seminar, students are assigned a practical task which should be completed until Friday, 12 p.m.
  • non-blocking Project1
    Project. There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Exam
    Экзамен проводится в письменной форме. Экзамен проводится на платформе MsTeams: через модуль "Задания" всем рассылается задание экзамена; выполненную работу следует прикрепить также в модуле "Задания" MsTeams. В случае сбоя в работе MsTeams, студент также может направить выполненную работу на корпоративную почту преподавателя со своей корпоративной почты. На выполнение экзамена выделяется 2 дня. Вы можете начать в любое время, но рассчитайте свои силы и возможности так, чтобы уложиться до дедлайна. Компьютер студента должен удовлетворять требованиям: подключение к интернету, предустановленный RStudio одной из последних версий. Во время экзамена студентам запрещено кооперироваться и коллективно выполнять задание. Во время экзамена студентам разрешено пользоваться любыми источниками - учебниками, интернетом. Долговременным нарушением связи во время экзамена считается отсутствие интернета в течение всего времени экзамена/ отсутствие доступа к компьютеру в течение всего срока экзамена. При долговременном нарушении связи студент не может продолжить участие в экзамене. Процедура пересдачи аналогична процедуре сдачи. О проблемах со связью или доступом к компьютеру студент должен сообщить преподавателю незамедлительно (как только появится такая возможность). При своевременном сообщении о проблеме каждый случай технических неполадок будет рассматриваться отдельно, решение о возможности и форме прохождения экзамена будет выноситься индивидуально.
  • non-blocking DataCamp
  • non-blocking Project2
    There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Project 3
    A project dedicated to the topics of causal modeling (SEM). One week will be given to prepare and submit your paper.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.2 * DataCamp + 0.1 * Exam + 0.15 * Practical tasks + 0.15 * Project 3 + 0.2 * Project1 + 0.2 * Project2
Bibliography

Bibliography

Recommended Core Bibliography

  • Agresti, A. (2013). Categorical Data Analysis (Vol. Third edition). Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=769330
  • Agresti, A., & Finlay, B. (2014). Statistical Methods for the Social Sciences: Pearson New International Edition (Vol. Pearson new international ed., 4. ed). Harlow England: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418314
  • Denis, D. J. (2016). Applied Univariate, Bivariate, and Multivariate Statistics. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1091881
  • Kline, R. B. (2016). Principles and Practice of Structural Equation Modeling, Fourth Edition (Vol. Fourth edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1078917
  • Stowell, S. (2014). Using R for Statistics. Berkeley, CA: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1174344
  • Tabachnick, B. G., & Fidell, L. S. (2014). Using Multivariate Statistics: Pearson New International Edition (Vol. 6th ed). Harlow, Essex: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418064

Recommended Additional Bibliography

  • Beh, E. J., & Lombardo, R. (2014). Correspondence Analysis : Theory, Practice and New Strategies. Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=842814
  • Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research, Second Edition (Vol. Second edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=831411
  • Crawley, M. J. (2013). The R Book (Vol. Second Edition). Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=531630
  • Little, T. D. (2013). The Oxford Handbook of Quantitative Methods. Oxford: Oxford University Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=603942