• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Programming in R

2022/2023
Academic Year
ENG
Instruction in English
3
ECTS credits
Course type:
Compulsory course
When:
1 year, 1 module

Course Syllabus

Abstract

R is a programming language specifically designed for doing advanced statistical analysis. It is highly popular among data scientists, since it allows to solve essentially all types of tasks that analysts may encounter in real-life industrial or scientific applications: from data preparation and cleaning to estimation of complicated statistical models to presentation of results in an effective and human-friendly way. This course introduces fundamentals of R programming and reviews most widely used base R commands for exploratory data analysis and visualization. You will also learn how to prepare you data for analysis and then explore it using tidyverse and ggplot2 libraries, as well as how to communicate your results using Rmarkdown tools. It is expected that you understand some basic statistical concepts, such as variable, distribution, mean, variance, and correlation, but if not, this should not be a big issue, since you will have an opportunity to learn them when playing around real-world examples of applied data analysis with R and also exercises throughout the course.
Learning Objectives

Learning Objectives

  • The key objective of this class is to help students to master basic skills of using R for data manipulation and exploratory data analysis.
Expected Learning Outcomes

Expected Learning Outcomes

  • be able to install R and Rstudio on your PC/laptop
  • be able to perform simple and complicated mathematical and logical operations using R
  • be able to understand basic principles of programming in R and recognize key R data types and data classes
  • be able to import external data sets into R
  • be able to clean, recode, transform, subset, and merge your data using base R tools and tidyverse
  • be able to perform exploratory data analysis in R: frequencies, shares, means, variances, correlations, etc.
  • be able to create effective data visualizations using base R and ggplot2
  • be able to summarize outputs of your analysis in tabular forms
  • be able to write your own simple R functions, profile and debug your code.
  • be able to prepare html and pdf reports on your analyses using Rmarkdown
Course Contents

Course Contents

  • Day 1 (September 6): Getting started with R
  • Day 2 (September 13): Exploratory data analysis using base R – 2
  • Day 3 (September 20): Exploratory data analysis using base R – 2
  • Day 4 (September 27): Exploratory data analysis using tidyverse
  • Day 5 (October 4): Data visualization using ggplot2 and ggpubr
  • Day 6 (October 11): Miscellaneous topics
  • Day 7 (October 18): Practical session
Assessment Elements

Assessment Elements

  • non-blocking Class activity
    Active involvement in discussions, correct responses to my questions and smart questions to me, presentations of your homework, etc. Please notice that in the first place I will evaluate the quality of your participation, not frequency (although one smart comment on the final day of the course will definitely not earn you an excellent grade for this component).
  • non-blocking Home assignments
    after some (not all) lectures you will be asked to complete written home assignments. Most of those assignments will be like “please complete the following programming tasks, report the results, and make short written comments explaining what you have done ” or “please answer some questions covering the content of the last lecture”. Those assignments will be relatively short and simple (1-2 pages) and are expected to be submitted before the start of the next class day (i.e., 18:10 next Thursday relatively to the day of assignment).
  • non-blocking Final exam
    you will be asked to complete several programming and analytical tasks on a real-world data set (see example below). The DEADLINE for submitting your final exam paper is Tuesday, OCTOBER 25, 18:10 (please notice that the date is preliminary can be changed).
Interim Assessment

Interim Assessment

  • 2022/2023 1st module
    0.3 * Home assignments + 0.2 * Class activity + 0.5 * Final exam
Bibliography

Bibliography

Recommended Core Bibliography

  • Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Recommended Additional Bibliography

  • Robert I. Kabacoff. (2015). R in Action : Data Analysis and Graphics with R: Vol. Second edition. Manning.
  • Wickham, H. (2015). Advanced R, Second Edition. Boca Raton, FL: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=934735