• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Social Network Analysis

Учебный год
Обучение ведется на английском языке
Курс по выбору
Когда читается:
2-й курс, 2 модуль


Course Syllabus


The course finalizes the customer analytics track by teaching students essential skills that are especially useful for the analysis of various social networks (e.g. networks of social media users, buyers, etc.): automated data collection from various sources (including the Web), analysis of network data and text mining. The course has two parts: in the online part students will be given access to 4 courses at the DataCamp platform which will serve as lectures and practical tutorials on how to collect and manipulate data from various sources with a special emphasis on scraping web data and using APIs. Academic support for the course is provided via LMS, where students can find guidelines and recommendations for self-study and sample questions for exam preparation. The exam is also conducted using LMS testing functionality. DataCamp platform is used in this course so that students can improve their R coding skills. DataCamp is an interactive learning platform for R, Python & SQL for Data Science. They teach cutting edge data analysis tools in an easily accessible manner.  For every topic an introductory tutorial-style lecture is given to familiarize students with the topic  A set of case studies every week is solved in class  90% of time is allocated to practicing R programming skills
Learning Objectives

Learning Objectives

  • Scrape data from most websites
  • Use APIs to obtain data from websites
  • Present networked data in a format appropriate for quantitative analysis
  • Develop and apply new research methods by combining and modifying existing techniques
  • Solve CRM analytics problems using special methods for analyzing network data and machine learning techniques
Expected Learning Outcomes

Expected Learning Outcomes

  • Importing data from various sources
  • Scraping data from most websites
  • Using APIs to obtain data from websites
  • Building simple Shiny web applications for customer analytics
  • Presenting networked data in a format appropriate for quantitative analysis
  • Solving CRM analytics problems using special methods for analyzing network data and machine learning techniques
  • Processing texts using basic string manipulations, as well as sentiment analysis and topic modeling
Course Contents

Course Contents

  • Advanced aspects of importing data to R
    Importing data from various file formats, including database files, statistical packages, html page
  • Working with Web data in R
    Downloading Files and Using API Clients. Using httr to interact with APIs directly. Handling JSON and XML. Web scraping with XPATHs. CSS Web Scraping and Final Case Study
  • Building Web Applications in R with Shiny
    Essentials of Shiny development. Plotting with Shiny. Interactive exploration of datasets. Creating wordclouds.
  • Network Analysis in R
    Fundamental concepts in social network analysis. Identifying important vertices in a network. Characterizing network structures. Identifying special relationships.
  • Predictive Modeling with networked data
    Measures of Homophily. Network Featurization. Building predictive models using network features.
  • Text Mining
    Converting texts to tidy text format. Word frequency analysis. Word clouds.
  • Sentiment Analysis
    Sentiment analysis using various dictionaries
  • String Manipulations in R
    String manipulations for data cleaning using stringr package. Concatenation. Substrings. Regular expressions.
  • Topic Modeling
    Latent Dirichlet Allocation algorithm for topic modeling and its implementation in R.
Assessment Elements

Assessment Elements

  • non-blocking Kahoot (tests)
    Weekly tests using Kahoot.it platform covering the material studied in previous weeks. To compute Grade_Kahoot the sum of Kahoot points is calculated for each student. Then it is converted to a percentile from 0 to 100 using the corresponding Excel formula. An alternative % of max is calculated as the % of maximum score achieved by the top-performer. Grade_Kahoot=max(percentile, % of max)
  • non-blocking Midterm asessment
    Each student should take a few Data Camp courses specified by the instructor (up to 4 courses). Free access will be granted to students of this course. The grading is based NOT on the DataCamp’s score, but on the student’s performance on the test given by the instructor. The test will check how well students mastered the material studied both in class and at DataCamp. Grade_Midterm is the score from 0% to 100% displayed by the LMS.
  • non-blocking Empirical case studies solved in class
    75-min. tests given at classroom every week. Each problem set consists of 2-5 problems. The total number of case studies equal the number of tutorials (around 10-12 case studies). For each case study a student can get the following scores:  0 (absent or everything is incorrect)  1 (present, but mostly incorrect solution)  2 (some mistakes)  3 (no mistakes) Grade_Cases is computed as the % of a student’s total score out of the maximum achievable score.
  • non-blocking Exam
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.25 * Empirical case studies solved in class + 0.25 * Exam + 0.25 * Kahoot (tests) + 0.25 * Midterm asessment


Recommended Core Bibliography

  • Luke, D. A. (2015). A User’s Guide to Network Analysis in R. Cham: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1114415
  • Munzert S. Automated data collection with R: a practical guide to Web scraping and text mining. Chichester, West Sussex, United Kingdom: Wiley, 2014. 1 p.

Recommended Additional Bibliography

  • Kolaczyk E. D., Csárdi G. Statistical analysis of network data with R. – New York : Springer, 2014. – 207 pp.