• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Бакалаврская программа «Социология и социальная информатика»

Data Analysis in Sociology

2022/2023
Учебный год
ENG
Обучение ведется на английском языке
4
Кредиты
Статус:
Курс обязательный
Когда читается:
4-й курс, 3 модуль

Course Syllabus

Abstract

The Data Analysis in Sociology in the 4th year of the Program focuses on categorical data and covers special types of prediction and classification models (logistic regression and cluster analysis). The course finishes with a discussion of data culture and data acumen, from data management to inference and prediction. This course is also the starting point for students interested in pursuing advanced training in research methods or planning to use quantitative methods with categorical outcomes in their own research.
Learning Objectives

Learning Objectives

  • The course covers the foundations and popular techniques of quantitative data analysis with the goal of training students to be informed producers and consumers of quantitative research.
  • The course covers the foundations and popular techniques of quantitative data analysis with the goal of training students to be informed producers and consumers of quantitative research.
  • develop skills necessary to solve typical data analysis problems on social data in the R software environment
  • develop skills necessary to solve typical data analysis problems on social data in the R software environment
  • develop skills necessary to solve typical problems in analysing social data in R software environment
  • develop skills necessary to solve typical problems in analysing social data in R software environment
Expected Learning Outcomes

Expected Learning Outcomes

  • Students can apply a theoretical framework to define hypotheses and explain the results of a study; they can apply appropriate statistical models and generalize the results.
  • Students can apply a theoretical framework to define hypotheses and explain the results of a study; they can apply appropriate statistical models and generalize the results.
  • Students can carry out statistical analyses of a data set, propose hypotheses and choose the methods needed to reach the goals, interpret the results and assess the quality of proposed solutions. Students provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of their own and others’ models.
  • Students can carry out statistical analyses of a data set, propose hypotheses and choose the methods needed to reach the goals, interpret the results and assess the quality of proposed solutions. Students provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of their own and others’ models.
  • Students can generalize and analyze the materials they read, assess it critically, express their own opinions and give their interpretation.
  • Students can generalize and analyze the materials they read, assess it critically, express their own opinions and give their interpretation.
  • Students can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results.
  • Students can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results.
  • Choose appropriate methods and techniques for certain types of variables and certain aims of the analysis
  • Choose appropriate methods and techniques for certain types of variables and certain aims of the analysis
  • Conduct statistical analyses in RStudio
  • Conduct statistical analyses in RStudio
  • Create analytical reports describing all the stages of analysis and interpreting its results
  • Create analytical reports describing all the stages of analysis and interpreting its results
  • Give meaningful interpretation of statistical results: regression coefficients, tables, plots and diagrams (produced in R)
  • Give meaningful interpretation of statistical results: regression coefficients, tables, plots and diagrams (produced in R)
  • Perform data transformations
  • Perform data transformations
  • Represent graphically the results of the statistical analyses
  • Represent graphically the results of the statistical analyses
Course Contents

Course Contents

  • Introduction to GLM
  • Topic 18. Binary logistic regression
  • Introduction to GLM
  • Topic 18. Binary logistic regression
  • Linear regression: OLS. Diagnostics
  • Topic 19. Cluster analysis
  • Central tendency measures
  • Central tendency measures
  • Topic 19. Cluster analysis
  • Linear regression: OLS. Diagnostics
  • Linear regression: Interaction effects
  • Linear regression: Interaction effects
  • Chi-square
  • Topic 20. Data culture and data acumen
  • Chi-square
  • Topic 20. Data culture and data acumen
  • Two means comparison
  • Topic 21. Data management. Revised variable types
  • Topic 21. Data management. Revised variable types
  • Exploratory factor analysis
  • Exploratory factor analysis
  • Two means comparison
  • Confirmatory factor analysis
  • One-way ANOVA
  • One-way ANOVA
  • Topic 22. Understanding causality and prediction
  • Confirmatory factor analysis
  • Topic 22. Understanding causality and prediction
  • Linear regression
  • Linear regression
  • Linear regression with multiple predictors
  • Linear regression with multiple predictors
Assessment Elements

Assessment Elements

  • non-blocking Project 1
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 1
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 2
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 2
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Written Exam
    The exam consists of two problems involving the methods covered in this course.
  • non-blocking Written Exam
    The exam consists of two problems involving the methods covered in this course.
  • non-blocking Test 1
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 1
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 2
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 2
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Practical tasks
    After each seminar, students are assigned a practical task which should be completed until Friday, 12 p.m.
  • non-blocking Practical tasks
    After each seminar, students are assigned a practical task which should be completed until Friday, 12 p.m.
  • non-blocking Project1
    Project. There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Project1
    Project. There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Exam
    Экзамен проводится в письменной форме. Экзамен проводится на платформе MsTeams: через модуль "Задания" всем рассылается задание экзамена; выполненную работу следует прикрепить также в модуле "Задания" MsTeams. В случае сбоя в работе MsTeams, студент также может направить выполненную работу на корпоративную почту преподавателя со своей корпоративной почты. На выполнение экзамена выделяется 2 дня. Вы можете начать в любое время, но рассчитайте свои силы и возможности так, чтобы уложиться до дедлайна. Компьютер студента должен удовлетворять требованиям: подключение к интернету, предустановленный RStudio одной из последних версий. Во время экзамена студентам запрещено кооперироваться и коллективно выполнять задание. Во время экзамена студентам разрешено пользоваться любыми источниками - учебниками, интернетом. Долговременным нарушением связи во время экзамена считается отсутствие интернета в течение всего времени экзамена/ отсутствие доступа к компьютеру в течение всего срока экзамена. При долговременном нарушении связи студент не может продолжить участие в экзамене. Процедура пересдачи аналогична процедуре сдачи. О проблемах со связью или доступом к компьютеру студент должен сообщить преподавателю незамедлительно (как только появится такая возможность). При своевременном сообщении о проблеме каждый случай технических неполадок будет рассматриваться отдельно, решение о возможности и форме прохождения экзамена будет выноситься индивидуально.
  • non-blocking Exam
    Экзамен проводится в письменной форме. Экзамен проводится на платформе MsTeams: через модуль "Задания" всем рассылается задание экзамена; выполненную работу следует прикрепить также в модуле "Задания" MsTeams. В случае сбоя в работе MsTeams, студент также может направить выполненную работу на корпоративную почту преподавателя со своей корпоративной почты. На выполнение экзамена выделяется 2 дня. Вы можете начать в любое время, но рассчитайте свои силы и возможности так, чтобы уложиться до дедлайна. Компьютер студента должен удовлетворять требованиям: подключение к интернету, предустановленный RStudio одной из последних версий. Во время экзамена студентам запрещено кооперироваться и коллективно выполнять задание. Во время экзамена студентам разрешено пользоваться любыми источниками - учебниками, интернетом. Долговременным нарушением связи во время экзамена считается отсутствие интернета в течение всего времени экзамена/ отсутствие доступа к компьютеру в течение всего срока экзамена. При долговременном нарушении связи студент не может продолжить участие в экзамене. Процедура пересдачи аналогична процедуре сдачи. О проблемах со связью или доступом к компьютеру студент должен сообщить преподавателю незамедлительно (как только появится такая возможность). При своевременном сообщении о проблеме каждый случай технических неполадок будет рассматриваться отдельно, решение о возможности и форме прохождения экзамена будет выноситься индивидуально.
  • non-blocking Project2
    There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Project2
    There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Projects
    Late submissions are not considered (try us). If you are ill during the project submission, present a medical certificate to get the formula adjusted for you. If you miss more than one project, there might be a makeup assignment. When you submit a project in MS Teams, you must click on the "Turn in" button to complete the submission. All projects are, first, posted to the dedicated channel where they are peer-reviewed, and submitted in the Assignments section by each contributing student. If you have any questions about the project, sign up for a consultation.
  • non-blocking Projects
    Late submissions are not considered (try us). If you are ill during the project submission, present a medical certificate to get the formula adjusted for you. If you miss more than one project, there might be a makeup assignment. When you submit a project in MS Teams, you must click on the "Turn in" button to complete the submission. All projects are, first, posted to the dedicated channel where they are peer-reviewed, and submitted in the Assignments section by each contributing student. If you have any questions about the project, sign up for a consultation.
  • non-blocking In-class activity
  • non-blocking In-class activity
  • non-blocking Exam
  • non-blocking Exam
  • non-blocking Project 1
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 1
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 2
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 2
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Written Exam
    The exam consists of two problems involving the methods covered in this course.
  • non-blocking Written Exam
    The exam consists of two problems involving the methods covered in this course.
  • non-blocking Test 1
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 1
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 2
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 2
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Short tests
  • non-blocking Short tests
  • non-blocking MOOC completion
  • non-blocking MOOC completion
  • non-blocking Mid-Term Test
  • non-blocking Mid-Term Test
Interim Assessment

Interim Assessment

  • 2020/2021 3rd module
  • 2020/2021 3rd module
  • 2020/2021 4th module
    0.1 * In-class activity + 0.05 * MOOC completion + 0.15 * Mid-Term Test + 0.4 * Projects + 0.1 * Short tests
  • 2020/2021 4th module
    0.1 * In-class activity + 0.05 * MOOC completion + 0.15 * Mid-Term Test + 0.4 * Projects + 0.1 * Short tests
  • 2021/2022 3rd module
  • 2021/2022 3rd module
  • 2021/2022 4th module
    0.2 * Exam + 0.4 * Practical tasks + 0.2 * Project1 + 0.2 * Project2
  • 2021/2022 4th module
    0.2 * Exam + 0.4 * Practical tasks + 0.2 * Project1 + 0.2 * Project2
  • 2022/2023 3rd module
    0.25 * Project 1 + 0.25 * Project 2 + 0.1 * Test 1 + 0.1 * Test 2 + 0.3 * Written Exam
  • 2022/2023 3rd module
    0.25 * Project 1 + 0.25 * Project 2 + 0.1 * Test 1 + 0.1 * Test 2 + 0.3 * Written Exam
Bibliography

Bibliography

Recommended Core Bibliography

  • Agresti, A. (2013). Categorical Data Analysis (Vol. Third edition). Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=769330
  • Agresti, A., & Finlay, B. (2014). Statistical Methods for the Social Sciences: Pearson New International Edition (Vol. Pearson new international ed., 4. ed). Harlow England: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418314
  • Denis, D. J. (2016). Applied Univariate, Bivariate, and Multivariate Statistics. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1091881
  • Kline, R. B. (2016). Principles and Practice of Structural Equation Modeling, Fourth Edition (Vol. Fourth edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1078917
  • Ledolter, J. (2013). Data Mining and Business Analytics with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=587979
  • Stowell, S. (2014). Using R for Statistics. Berkeley, CA: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1174344
  • Tabachnick, B. G., & Fidell, L. S. (2014). Using Multivariate Statistics: Pearson New International Edition (Vol. 6th ed). Harlow, Essex: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418064
  • Upton, G. J. G. (2016). Categorical Data Analysis by Example. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402878

Recommended Additional Bibliography

  • Beh, E. J., & Lombardo, R. (2014). Correspondence Analysis : Theory, Practice and New Strategies. Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=842814
  • Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research, Second Edition (Vol. Second edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=831411
  • Crawley, M. J. (2013). The R Book (Vol. Second Edition). Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=531630
  • Little, T. D. (2013). The Oxford Handbook of Quantitative Methods. Oxford: Oxford University Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=603942
  • Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67–82. https://doi.org/10.1093/esr/jcp006
  • Valentin Amrhein, David Trafimow, & Sander Greenland. (2019). Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician, (S1), 262. https://doi.org/10.1080/00031305.2018.1543137