• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Data Analysis in Sociology

2021/2022
Academic Year
ENG
Instruction in English
4
ECTS credits
Course type:
Compulsory course
When:
4 year, 3 module

Instructor

Course Syllabus

Abstract

The Data Analysis in Sociology in the 4th year focuses on developing a conscious, systematized approach to data analysis and on solving old problems and anxieties at doing data analysis. The course finishes with a discussion of data culture in day-to-day research. Prerequisites: independent user of R (base R, tidyverse, ggplot2), statistics including linear regression modelling, intermediate or advanced English. Interviews are possible upon request from candidate students. Grading for external students: follow the formula provided in the syllabus. For Sociology and Social Informatics students, this is the third part of a 3-part discipline where the summary formula for three years will be additionally applied.
Learning Objectives

Learning Objectives

  • The 1st year aims at beginners and serves to develop skills necessary to solve typical problems in analysing social data in R software environment.
  • The course covers the foundations and popular techniques of quantitative data analysis with the goal of training students to be informed producers and consumers of quantitative research.
  • develop skills necessary to solve typical problems in analysing social data in R software environment
  • Two specific goals of this course are to systematize the principles of data analysis for all standard problems and to alleviate old-time anxieties related to any part of the data analysis cycle.
Expected Learning Outcomes

Expected Learning Outcomes

  • Students can apply a theoretical framework to define hypotheses and explain the results of a study; they can apply appropriate visualization to communicate the results.
  • Students can generalize and analyze the data they have, assess it critically, express their own opinions, and give their interpretation on the best possible decision.
  • Students can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results.
  • Choose appropriate methods and techniques for certain types of variables and certain aims of the analysis
  • Conduct statistical analyses in RStudio
  • Give meaningful interpretation of statistical results: regression coefficients, tables, plots and diagrams (produced in R)
  • Perform data transformations
  • Represent graphically the results of the statistical analyses
  • Student can formulate research goals, objectives and methods according to academic standards, can process the data from international sources; can assess the quality and analyze given samples of international research; can replicate the procedures from international studies; can carry out research projects in international teams
  • Student can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results
  • Students can apply a theoretical framework to define hypotheses and explain the results of a study; can apply appropriate statistical models and generalize the results
  • Students can carry out statistical analyses of a data set, propose hypotheses and choose the methods needed to reach the goals, interpret the results and assess the quality of proposed solutions. Students provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of their own and others’ models
Course Contents

Course Contents

  • Introduction to GLM
  • Topic 1. Research hypotheses vs. statistical hypotheses. Variable types
  • Best Practices in Data Wrangling
  • Linear regression: OLS. Diagnostics
  • Topic 2. Exploratory data analysis
  • Data Management
  • Linear regression: Interaction effects
  • Topic 3. Chi-square and association measures
  • Communicating Data Analysis Results
  • Exploratory factor analysis
  • Topic 4. Two means comparison
  • Confirmatory factor analysis
  • Topic 5. One-way ANOVA
  • Topic 6. Correlation and linear regression
  • Topic 7. Multiple linear regression
Assessment Elements

Assessment Elements

  • non-blocking Projects 1-4 (0.1 * 4)
    Students create teams of 2-3 and work together on their project during the whole course, submitting and peer-reviewing them by each computer lab. Final projects are submitted in full and presented in the classroom. Each group selects one country from the European Social Survey, then picks the topic of interest within the scope of available survey questions (e.g. Health, Democracy, Religion, etc.) and performs all the tests covered in class on these data. One day before each computer lab, the due piece of work is to be submitted and blindly peerreviewed by two other groups in LMS. The instructors would assign reviewers, while students might not know who would be their reviewers next time. Final projects are presented in two steps. At the first stage, the group submits the code with interpretations. After this, they present the findings and procedures in class. Students are expected to choose and perform correctly the ways to analyse and interpret the data, as well as to demonstrate their knowledge and skills in presenting these results to the audience. Individual contribution of each student is graded. Projects themselves should be submitted as scripts or RMarkdown objects; in-class presentations should be adapted for the slide shows (e.g. Prezi, LibreOffice Impress, etc.). Project details are available in LMS
  • non-blocking Test
    All students fill in a comprehensive paper-and-pencil test covering all previous topics.
  • non-blocking In-class activity
    In-class activity during lectures and seminars. Students are expected to ask questions and participate in discussions, as well as help other students during practice sessions. Small regular tests held at seminars are also part of this grade.
  • non-blocking Exam
    The exam is aimed at checking the skills students should have obtained during the course. Its structure is close to the structure of projects but covers all the topics: standard problems including descriptive statistics, measures of association, comparing two or more means, and linear regression. The examination is conducted in writing (solving problems) using asynchronous proctoring. The exam is conducted on the Moodle platform (et.hse.ru), proctoring on the Examus platform (https://hse.student.examus.net). You need to connect to the exam in 15 minutes. On the Examus platform, system testing is available. The student's computer must meet the following requirements: https://elearning.hse.ru/data/2020/05/07/1544135594/Технические%20требования%20к%20ПК%20студента.pdf) To participate in the exam, the student must access the proctoring platform in advance, run a system test, turn on the camera and microphone, and verify identity. During the exam, students are prohibited from communicating (on social networks, consulting with other people in the room), or copying the materials 'as is' from the web. During the exam, students use local R (RStudio) to solve the data analysis problems. Students are allowed to use their hand-written notes and files located on their computer. A short-term disconnection during the exam is considered an interruption of communication up to 10 minutes long. A long-term disconnection during an exam is considered to be a communication interruption of 10 minutes or more. In case of a long-term disconnection, the student cannot continue the exam. The exam retake procedure is similar to the described procedure.
  • non-blocking MOOC
    If there are medical reasons for not completing the task, students should talk to the instructor before the deadline and arrange a later submission. Medical certificates should be presented no later than two weeks after the deadline, otherwise they are not taken into account.
  • non-blocking Practical tasks
    After each seminar, students are assigned a practical task which should be completed until Friday, 12 p.m.
  • non-blocking Project1
    Project. There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Exam
  • non-blocking DataCamp
  • non-blocking Project2
    There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Project 3
    A project dedicated to the topics of causal modeling (SEM). One week will be given to prepare and submit your paper.
  • non-blocking Dataviz Task
    You can search for original inspirational plots here (https://viz.wtf/) or other public sources.
  • non-blocking Test 1
  • non-blocking Test 2
  • non-blocking Online Track
  • non-blocking Homework Scripts
    If not specified otherwise, individual scripts should be submitted as knitted R Markdown files (HTML) turned in via the MS Teams Assignments section (forgetting to click on the button 'Turn In' means late submission). If students fail to knit their script, the mark for submission is cut by half but they can still get 0.5 points if they submit an R script.
  • non-blocking Written Exam
    The exam consists of problems similar to those solved at practice sessions so that you can prepare for the exam during the course.
Interim Assessment

Interim Assessment

  • 2019/2020 4th module
    0.4 * Projects 1-4 (0.1 * 4) + 0.1 * MOOC + 0.2 * In-class activity + 0.1 * Test + 0.2 * Exam
  • 2020/2021 4th module
  • 2021/2022 3rd module
    0.1 * Dataviz Task + 0.1 * Test 1 + 0.1 * Test 2 + 0.2 * Written Exam + 0.2 * Online Track + 0.3 * Homework Scripts
Bibliography

Bibliography

Recommended Core Bibliography

  • Agresti, A. (2013). Categorical Data Analysis (Vol. Third edition). Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=769330
  • Agresti, A., & Finlay, B. (2014). Statistical Methods for the Social Sciences: Pearson New International Edition (Vol. Pearson new international ed., 4. ed). Harlow England: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418314
  • Denis, D. J. (2016). Applied Univariate, Bivariate, and Multivariate Statistics. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1091881
  • Kline, R. B. (2016). Principles and Practice of Structural Equation Modeling, Fourth Edition (Vol. Fourth edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1078917
  • Knaflic, C. N. (2015). Storytelling with Data : A Data Visualization Guide for Business Professionals. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1079665
  • Stowell, S. (2014). Using R for Statistics. Berkeley, CA: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1174344
  • Tabachnick, B. G., & Fidell, L. S. (2014). Using Multivariate Statistics: Pearson New International Edition (Vol. 6th ed). Harlow, Essex: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1418064
  • Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Recommended Additional Bibliography

  • Beh, E. J., & Lombardo, R. (2014). Correspondence Analysis : Theory, Practice and New Strategies. Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=842814
  • Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research, Second Edition (Vol. Second edition). New York: The Guilford Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=831411
  • Crawley, M. J. (2013). The R Book (Vol. Second Edition). Chichester, West Sussex: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=531630
  • Field, A. V. (DE-588)128714581, (DE-627)378310763, (DE-576)186310501, aut. (2012). Discovering statistics using R Andy Field, Jeremy Miles, Zoë Field. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edswao&AN=edswao.363067604
  • Inter-university Consortium for Political and Social Research. (2012). Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.AA22F59E
  • Little, T. D. (2013). The Oxford Handbook of Quantitative Methods. Oxford: Oxford University Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=603942
  • Yau, N. (2013). Data Points : Visualization That Means Something. New York: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=566405