# Introduction to Statistical Analysis for Social Research

### July 29 – August 9

### Online/In-person

### Language: English

### Through this intensive course students will acquire practical skills in working with statistical data.

The first rule of statistics: correlation is not causation. The second rule of statistics: causation is never absolute. Knowing these two rules, we can give reliable and competent answers to the big questions that our constantly changing world throws at us.

### Course Description

This course is an introduction to quantitative research methods in social science. By the end of this course, students should be able to effectively evaluate and analyse studies, which use quantitative methods of data collection and analysis; understand basic statistics and causality; and gain experience in collection, analysis, visualisation and interpretation of quantitative data.

### Why Choose This Course?

Through the intensive course, students will acquire practical skills in working with statistical data, which are of very high value in a wide range of sectors of modern academia, industry and public service. By the end of the course, students will have a clear understanding of the logic of constructing any statistical study, the necessary prerequisites for collecting and analyzing data, and algorithms for conducting causal empirical analysis. A special advantage of the course is that it is taught in R, one of the most popular programming languages in the world.

### Content

**Topic 1. Basic concepts and R basics **

Algorithm for installing and launching the statistical environment R and R-Studio. Basic commands, objects and functions in R. Objects and functions in R. Assignment operator. Ways to enter data in R. Working with loaded data. Creating tables and working with them.

**Topic 2. Descriptive statistics**

General population and sample. Data types. Descriptive statistics: measures of central tendency and measures of dispersion. Normal distribution and the central limit theorem. Ways to calculate descriptive statistics in R.

**Topic 3. Data Visualisation: principles, tools, examples**

The role of data visualisation in the presentation of research results. Principles of data visualisation. Chart types: scatter plot, distribution plot (histogram), range plot (boxplot), violin plot, column plot, pie chart. Basic R for visualisation and ggplot2 package.

**Topic 4. Statistical hypotheses and errors. Comparison of samples**

Statistical hypotheses: alternative and null. Statistical errors: first and second kind. Statistical significance. Binomial test. Comparison of samples: an overview of statistical tests (parametric—non-parametric; two-tailed; left-tailed; right-tailed). Parametric tests: t-test for independent and paired samples. ANOVA test. Nonparametric tests: Wilcoxon (Mann-Whitney) test for independent and paired samples.

**Topic 5. Correlation**

Correlation and covariance. Pearson's correlation coefficient. Interpretation of the values of the correlation coefficient. Significance of the correlation coefficient. Spearman's correlation coefficient. Correlation matrix.

*Topic 6. Paired linear regression: principle, interpretation, design*

The difference between regression and correlation. Dependent and independent variables. Method of least squares (MLS): essence, assumptions. Paired linear regression: regression equation, interpretation of regression output. Coefficient of determination. Formulation of regression output results in R: stargazer package. Compositional construction of studies using regression analysis. Implementation of paired linear regression in R. Visualization of the essence of the method of least squares. Interpretation of regression analysis results.

**Topic 7. Multiple OLS regression: principle, interpretation, design**

Multiple linear regression: regression equation, parameters calculated for independent variables, F-statistic value for the regression model. Comparison of regression models. Nuances of interpretation of the coefficient of determination and standardized coefficients for independent variables. Consideration of the compositional construction of a social science study using multiple linear regression as the main method of data analysis.

**Topic 8. Technical problems and prerequisites for OLS regression**

Background of the OLS regression. Technical problems of regression models: multicollinearity, heteroscedasticity, outliers, influential observations. Diagnostics and ways to solve technical problems of regression models.

**Topic 9. Substantive problems of regression models**

Substantive problems of regression models: endogeneity, exclusion of relevant explanatory variables from the analysis, inclusion of irrelevant explanatory variables in the analysis. Sample bias problem. Consideration of the problems of sampling bias (selection of units of analysis by the dependent variable), endogeneity, exclusion of relevant explanatory variables from the analysis and inclusion of irrelevant explanatory variables in the analysis.

**Topic 10. Logistic regression: principle, interpretation, design**

Generalised linear models: essence, types. Logistic regression: essence, types. Binary and ordinal logistic regression equation. Parameters for evaluating logistic models. Issuance of logistic regression, its interpretation. Predicted probabilities and odds ratio.

### Skills and Competence

Skills of collection, analysis, visualization and interpretation of quantitative data, usage of the heuristics of R statistical environment

### Teaching Methods

Lectures, workshops.

### Prerequisites

No specific prerequisites are assumed for the class other than a basic understanding of algebra and ability to use a computer. We will need R and RStudio software for the practical classes.

### Final Assessment

Project.

### Final Grade Background

Trainings (scripts) completion and final project.

### Course is taught by

### Recommended Reading List

Kabacoff, R. (2022). R in action: data analysis and graphics with R and Tidyverse. Simon and Schuster.

Field, A., Miles, J., & Field, Z. (2017). Discovering statistics using R (p. 992). W. Ross MacDonald School Resource Services Library.

Geddes, B. (1990). How the cases you choose affect the answers you get: Selection bias in comparative politics. Political analysis, 2, 131-150.

Aleksei Sorbale

Associate Professor:HSE Campus in St. Petersburg / Saint-Petersburg School of Social Sciences and Area Studies / Department of Political Science and International Affairs