• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Бакалаврская программа «Социология и социальная информатика»


Data Analysis in Sociology

Учебный год
Обучение ведется на английском языке
Курс обязательный
Когда читается:
2-й курс, 3, 4 модуль


Course Syllabus


This course lasts for three years. The course goes from introductory topics (variable types, hypothesis testing, descriptive statistics) to some statistics and methods (chi-square, t-test, nonparametric statistics, oneway ANOVA, and linear regression). This course is also the starting point for students interested in pursuing advanced methods training or planning to use quantitative methods in their own research.
Learning Objectives

Learning Objectives

  • The 1st year aims at beginners and serves to develop skills necessary to solve typical problems in analysing social data in R software environment.
Expected Learning Outcomes

Expected Learning Outcomes

  • Student can formulate research goals, objectives and methods according to academic standards, can process the data from international sources; can assess the quality and analyze given samples of international research; can replicate the procedures from international studies; can carry out research projects in international teams
  • Students can carry out statistical analyses of a data set, propose hypotheses and choose the methods needed to reach the goals, interpret the results and assess the quality of proposed solutions. Students provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of their own and others’ models
  • Student can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results
  • Students can apply a theoretical framework to define hypotheses and explain the results of a study; can apply appropriate statistical models and generalize the results
Course Contents

Course Contents

  • Topic 1. Research hypotheses vs. statistical hypotheses. Variable types
    The cycle of research. Data analysis as part of the research process. Goals of data analysis. Theory of sampling recap. Survey data, behavior data. Posing and testing hypotheses. Research hypotheses vs. statistical hypotheses testing. Directed and non-directed hypotheses. Dependent and independent variables. Variable scales: nominal, ordinal, continuous (interval and ratio). Descriptive statistics of a variable depending on its type. Getting to know R and RStudio.
  • Topic 2. Exploratory data analysis
    Central tendency measures. Mean, median, mode. Standard normal distribution and its use. Z-scores. Moments of distributions. Distribution plots and reading them. Sources of bias in data. Interpretation of z-scores. Mean as a data model. Creating objects, types of objects, basic functions in R. Descriptive statistics in R. Tidy data. Plots for univariate and bivariate distributions. Histogram, bar plot, box plot, scatterplot, stacked bar plot.
  • Topic 3. Chi-square and association measures
    Observed and expected frequencies. Measures of association for categorical variables. Reading and interpreting chi-square tests. Assumptions of chi-square. Independence. Standardized residuals. Odds ratio. Chi-square and other association measures in R.
  • Topic 4. Two means comparison
    Independent and paired samples. Assumptions behind the t-test. Two-sample t-tests. Testing for normality and homogeneity of variance. Nonparametric tests for two samples. Reading and interpreting means comparison. Confidence intervals. Weights in surveys. Post-stratification weights and design weights. Means comparison in R.
  • Topic 5. One-way ANOVA
    One-way analysis of variance (ANOVA). Difference from association measures and t-test. Assumptions and usage of ANOVA. Between-group and within-group variance, their ratio. Planned and non-planned comparisons; corrections. Post hoc comparisons for equal and unequal variances. Reading and interpreting ANOVA. Plots of post hoc tests. Non-parametric tests for multiple comparisons. One-way ANOVA in R. Presenting the results of ANOVA. Getting to know RMarkdown: reports and slide shows.
  • Topic 6. Correlation and linear regression
    Idea of correlations. Pearson’s product-moment correlation. Research problems for correlational analysis. Correlation coefficients for different types of data. Correlation matrices. ANOVA, correlation, and regression as linear models. Building a linear regression. Ordinary least squares. Fitting the regression line. Assumptions behind linear regression. Reading and interpreting regression coefficients. Goodness of fit measures. Presenting and interpreting a linear regression. Categorical predictors in a linear regression. Dummy-coding. Linear regression in R. Plotting linear regressions in R (case studies).
  • Topic 7. Multiple linear regression
    Difference between simple and multiple linear regression. The concept of interaction effects for categorical by categorical, categorical by continuous, and continuous by continuous variables. Multicollinearity. Centering. Model selection in multiple regression. Reading and interpreting interaction models in a linear regression. Testing for interactions in R. Graphs for interaction effects. Reporting and interpreting a linear regression with interactions in R.
Assessment Elements

Assessment Elements

  • non-blocking Projects 1-4 (0.1 * 4)
    Students create teams of 2-3 and work together on their project during the whole course, submitting and peer-reviewing them by each computer lab. Final projects are submitted in full and presented in the classroom. Each group selects one country from the European Social Survey, then picks the topic of interest within the scope of available survey questions (e.g. Health, Democracy, Religion, etc.) and performs all the tests covered in class on these data. One day before each computer lab, the due piece of work is to be submitted and blindly peerreviewed by two other groups in LMS. The instructors would assign reviewers, while students might not know who would be their reviewers next time. Final projects are presented in two steps. At the first stage, the group submits the code with interpretations. After this, they present the findings and procedures in class. Students are expected to choose and perform correctly the ways to analyse and interpret the data, as well as to demonstrate their knowledge and skills in presenting these results to the audience. Individual contribution of each student is graded. Projects themselves should be submitted as scripts or RMarkdown objects; in-class presentations should be adapted for the slide shows (e.g. Prezi, LibreOffice Impress, etc.). Project details are available in LMS
  • non-blocking Test
    All students fill in a comprehensive paper-and-pencil test covering all previous topics.
  • non-blocking In-class activity
    In-class activity during lectures and seminars. Students are expected to ask questions and participate in discussions, as well as help other students during practice sessions. Small regular tests held at seminars are also part of this grade.
  • non-blocking Exam
    The exam is aimed at checking the skills students should have obtained during the course. Its structure is close to the structure of projects but covers all the topics: standard problems including descriptive statistics, measures of association, comparing two or more means, and linear regression. The examination is conducted in writing (solving problems) using asynchronous proctoring. The exam is conducted on the Moodle platform (et.hse.ru), proctoring on the Examus platform (https://hse.student.examus.net). You need to connect to the exam in 15 minutes. On the Examus platform, system testing is available. The student's computer must meet the following requirements: https://elearning.hse.ru/data/2020/05/07/1544135594/Технические%20требования%20к%20ПК%20студента.pdf) To participate in the exam, the student must access the proctoring platform in advance, run a system test, turn on the camera and microphone, and verify identity. During the exam, students are prohibited from communicating (on social networks, consulting with other people in the room), or copying the materials 'as is' from the web. During the exam, students use local R (RStudio) to solve the data analysis problems. Students are allowed to use their hand-written notes and files located on their computer. A short-term disconnection during the exam is considered an interruption of communication up to 10 minutes long. A long-term disconnection during an exam is considered to be a communication interruption of 10 minutes or more. In case of a long-term disconnection, the student cannot continue the exam. The exam retake procedure is similar to the described procedure.
  • non-blocking MOOC
    If there are medical reasons for not completing the task, students should talk to the instructor before the deadline and arrange a later submission. Medical certificates should be presented no later than two weeks after the deadline, otherwise they are not taken into account.
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.2 * Exam + 0.2 * In-class activity + 0.1 * MOOC + 0.4 * Projects 1-4 (0.1 * 4) + 0.1 * Test


Recommended Core Bibliography

  • Stowell, S. (2014). Using R for Statistics. Berkeley, CA: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1174344

Recommended Additional Bibliography

  • Field, A. V. (DE-588)128714581, (DE-627)378310763, (DE-576)186310501, aut. (2012). Discovering statistics using R Andy Field, Jeremy Miles, Zoë Field. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edswao&AN=edswao.363067604