Delivered at:: Department of Sociology

Course type:: Elective course

When:: 3 year, 1, 2 module

Instructors

Musabirov, Ilya

Suvorova, Alena

Shirokanova, Anna

Full Syllabus

Abstract

The course is targeted at undergraduate social science students aiming at careers in business-oriented jobs in marketing, sales and service analytics. The course consists of lectures and seminars. The lecture part provides a gentle introduction to several fundamental concepts in business analytics (lifetime value, churn, segmentation, whatif analysis, business processes) and analytical techniques (choice models, complexity reduction), while guiding students through tailored cases relevant for data economy. The seminar part is blended with MOOCs that introduce specific data analysis techniques as tools for solving typical problems in business analytics. We will discuss the business context of each case, leverage analytical techniques that solve the tasks at hand, and discuss the effective delivery of results. This is a rather intense course that requires motivation and genuine interest in business analytics, team work, and a large amount of independent study. To success in the course, participants are expected to be familiar with the linear regression fundamentals and data management techniques in R. Familiarity with machine learning techniques is not required but will be an asset.

Learning Objectives

to stimulate students to apply the methods and concepts they learned in courses on data analysis and research methods to solving practical tasks in business and marketing analytics.

Expected Learning Outcomes

Students solve analytical problems using data analysis techniques suitable for the task; develop policies and scenarios using several methods of analysis
Students perform various statistical analyses, use them appropriately, and develop suggestions (recommendations, policies, scenarios) for the task (churn prevention, segmentation, etc.)
Students work individually and in teams to interpret the results and develop policies and scenarios for the company
Students collect information on the business context of a given case and evaluate possible solutions to a given task
Students select data features to be used in segmentation procedures; as a team, students develop analytical pipelines covering necessary tasks and combining individual results into a summary report
Students can plan the analytical data cycle, from formulating requests for data collection, to data cleaning and dimension reduction, to data analysis and reporting the recommendations

Course Contents

Introduction to Data Analysis for Consumer Behavior and Client Analytics. Customer Lifetime Value
Basic concepts of consumer behavior and client analytics. Differences and similarities between typical academic research and business analytics pipelines. Customer acquisition, conversion, churn, segmentation, consumer behavior. Market basket analysis. Association rules. Classification of consumer behavior models. Generic marketing strategies. Types of business models. Statistical methods for client analytics. Life-time value (CLV, LTV). Net profit. Predicting future margin with current sales data. Predicting customer lifetime value with linear regression in R. Omitted variable problem. Multicollinearity. Model validation. Risk of overfitting: use of statistics (AIC), automatic model selection, out-of-sample validation. Adjusted R-squared.
Customer Churn. Churn Prevention
How to predict customer churn? How to detect and prevent customer churn? Factors of churn: expectations, performance, disconfirmation (disappointment based on perceived quality), satisfaction, churn intention/switching decisions. The push-pull-mooring paradigm for churn and service switching. Measurement of latent variables: satisfaction and expectation disconfirmation. Models of satisfaction, expectation disconfirmation, performance. Sources of data: Experts, logs, surveys. Case: Yandex Music vs. Spotify. Predicting customer's churn with logistic regression in R. The meaning of p-value. Interpretation of logistic regression coefficients. Model selection based on significance vs. theory. Inspecting the results of automatic model selection. In-sample model fit for logistic regression: Pseudo-R-squared (interpretation of reasonable, good, and very good fit); accuracy calculation. The rule of “garbage in, garbage out”. Accuracy. Confusion matrix. Finding the optimal threshold: a table of potential payoffs. Composing a payoff matrix. Dealing with overfitting: out-of-sample validation and cross-validation. Splitting the sample in R. Specifying on the train and predicting on test subsamples. K-fold methods of cross-validation. Accuracy for out-of-sample vs. cross-validation. Addressing churn using segmentation and advertisement. Naive Bayes in predicting churn. Description of task and data for the project.
Predicting Customer’s Time to Churn
Predicting time till next purchase with survival analysis. Addressing churn using segmentation and advertisement. Survival function. Censored data problem. Survival analysis models: pros and cons. Applications of survival models in customer analytics. Types of data censoring (left, interval, right, type I, type II, random). Assumptions of survival analysis. Survival curve analysis by Kaplan-Meier. Survival function and cumulative hazard function. Cumulative risk. Hazard rate. Kaplan-Meier estimation with a categorical covariate. Cox proportional hazards (CPH) model for multiple covariates. Assumptions of CPH. Interpretation of coefficients for categorical and continuous predictors. Survival plot. Visualization of CPH estimates. When assumptions are violated: stratified Cox model, model time-dependent coefficients. Prediction of survival curve for new customers. CPH model interpretation, calculation of customer lifetime value.
Customer Segmentation and Cohort Analysis
Factors of customer segmentation: demographics, technology, geography, lifestyles, behavior, new/returning contract, time from last purchase, frequency and value of spending, etc. Reducing the complexity of extensive correlated data. Differences between the goals of LTV models and segmentation techniques. Business-related criteria for segmentation: RFM (recency, frequency, monetary) analysis. Analytical techniques for customer segmentation: PCA, cluster analysis (k-means, DBSCAN, agglomerative algorithms). Applications of PCA for exploration in customer analytics. Reducing multicollinearity, building an index, visualizing multidimensional data. Visualizing correlations. Standardizing variances (scaling). Loadings of principal components. Interpretation of principal components. PCA model specification. Kaiser-Guttman criterion. Scree plot. Biplot of variables and components. Further analysis: fitting loadings to linear regression. Clustering algorithms. Distances between data points. Linkage criteria. Dendrogram plot. Applications of cluster analysis for customer analytics in R.
What-If Analysis
The analytical pipeline: database, model, dashboard, what-if analysis. Use of simulations in business for decision making. Scenarios as ways to construct prediction on data. From scenario, to simulation model, to prediction. What-if analysis vs. Extraction, Transformation and Loading (ETL) approach. Source variables and scenario parameters. Seven stages of what-if analysis: goal analysis, business modeling, data source analysis, multidimensional modeling, simulation modeling, data design and implementation, and validation (if failed, repeat 4-7). Activity diagram (scenario diagram). Case: productivity of branches. Stating the assumptions required to perform what-if analysis of models. Grouping assumptions into scenarios describing different ways of customers’ reaction to the policies. Building what-if models for each policy for each scenario. Compare and reflect on the results of scenario models. Reactive programming. Functions in R.
Consumer Preferences
Introduction to consumer preference theory. Utility analysis. Cardinal utility, ordinal utility. Indifference curves show combinations that give equal utility. Marginal rate of substitution (MRS). Constraints: income, price, time. Uses of choice models in marketing and business analytics. Modeling customers' choice by product features. A/B testing and preference testing. A/A tests and A/B tests. Power analysis. Multiple A/B testing. Backward A/B testing. Bootstrapping and its use in A/B testing. Checklists and common pitfalls in A/B testing. Common metrics and reporting the results. A/B testing vs(?) qualitative customer research.
Customer Satisfaction
Customer feedback surveys. Net promoter score (NPS) for measuring loyalty. Promoters, passives and detractors. Customer satisfaction survey (CSAT) for meeting expectations. Post-purchase surveys, product/service development surveys, usability surveys. Expectation disconfirmation theory of post-purchase satisfaction. Key constructs: expectations, perceived performance, disconfirmation of beliefs and satisfaction. Inputs to expectations of value. Measuring the perceived performance: overall quality, interaction, service experience, value for money, social status. Problems of customer satisfaction surveys. Self-selection, overdelivering, expectation adjustment. Combining survey and behavior data.

Assessment Elements

project 1
project 2 (ind)
project 2 (team)
MOOC
in-class activity

Interim Assessment

Interim assessment (2 module)
0.25 * in-class activity + 0.25 * MOOC + 0.15 * project 1 + 0.15 * project 2 (ind) + 0.2 * project 2 (team)

Bibliography

Recommended Core Bibliography

Chapman, C., & Feit, E. M. (2015). R for Marketing Research and Analytics. Cham: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=964737

Recommended Additional Bibliography

Struhl, S. M. (2017). Artificial Intelligence Marketing and Predicting Consumer Choice : An Overview of Tools and Techniques. London: Kogan Page. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1494508

Bachelor’s Programme 'Sociology and Social Informatics'

Coronavirus Live

Business Analytics