We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Selected Topics in Data Science

2021/2022
Academic Year
ENG
Instruction in English
6
ECTS credits
Course type:
Elective course
When:
1 year, 4 module

Instructor


Suschevskiy, Vsevolod

Course Syllabus

Abstract

During the course students will use Python and JupyterLab for Windows as a tool for data processing and statistical computations. Students are assumed to be familiar with high school math program, have basic computer literacy and some programming experience in R, Python or Julia, have courses of university level in statistics (e.g. Statistics 101), and be willing to work hard to learn the essentials of data science.
Learning Objectives

Learning Objectives

  • The goal of this course is to acquaint students with data science methods and terminology, and to teach them how to implement these methods using Python programming language.
Expected Learning Outcomes

Expected Learning Outcomes

  • understand key terminology from Data Science and its connection to Social Science research methodology
  • choose data science algorithms appropriate to research questions
  • use Python programming language for data analytics
  • design and present explain extract, transform, load pipelines
  • know key concepts, approaches, obstacles and limitations of applying Data Science tools to Social Science problems
Course Contents

Course Contents

  • Topic 1: Intro to Data Science
  • Topic 2: Summarization
  • Topic 3: Prediction
  • Topic 4: Inference
  • Topic 5: Causality
Assessment Elements

Assessment Elements

  • non-blocking In-class Activities
    The most important aspects of assignments that affect grades are following: a) correctness of answers to questions given in an assignment, b) ability to write Python code correctly (if necessary), c) appropriate use of data language, d) correctness of results’ interpretations. If all these criterions are met, you can expect an excellent grade (8-10 on 0-10 scale). Late assignments will be graded down by 1 point for each day of delay (but no more than 3 points in total). All used sources should be correctly cited. Recommendations for Homework: Homework is designed to help a student to acquire skills necessary for conducting independent quantitative research in social and political science. Students will be given five homework assignments before lectures and seminars, covering relevant topics. Assignments should be submitted before the beginning of the lecture or the seminar covering the topic (e.g. homework on topic 2 of the syllabus has to be submitted before the beginning of a lecture or a seminar covering topic 2). Late assignments will be graded down by 1 point for each day of delay (but no more than 3 points in total). All used materials should be correctly cited. Students are encouraged to use any additional sources.
  • non-blocking Final Project
    All course participants must complete a short data analysis in which they will try to apply simple machine learning methods to extract the meaning from the data and answer research questions. Methods should be used correctly and the interpretation of model results should be understandable to the customer without a technical background. The final project is an independent task, so students are not expected to collaborate on code writing or conceptualization. Format Jupyter notebook Purpose The project should demonstrate your ability to produce data driven research and ultimately pass a test task when applying for you first data related position.
Interim Assessment

Interim Assessment

  • 2021/2022 4th module
    0.7 * Final Project + 0.3 * In-class Activities
Bibliography

Bibliography

Recommended Core Bibliography

  • Vanderplas, J.T. (2016). Python data science handbook: Essential tools for working with data. Sebastopol, CA: O’Reilly Media, Inc. https://proxylibrary.hse.ru:2119/login.aspx?direct=true&db=nlebk&AN=1425081.

Recommended Additional Bibliography

  • Romano, F. (2015). Learning Python. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1133614

Authors

  • SUSCHEVSKIY VSEVOLOD VYACHESLAVOVICH
  • IVANOVA ANASTASIYA SERGEEVNA