- to stimulate students to apply the methods and concepts they learned in courses on data analysis and research methods to solving practical tasks in business and marketing analytics.
- Introduction to Data Analysis for Consumer Behavior and Client Analytics. Customer Lifetime ValueBasic concepts of consumer behavior and client analytics. Differences and similarities between typical academic research and business analytics pipelines. Customer acquisition, conversion, churn, segmentation, consumer behavior. Market basket analysis. Association rules. Classification of consumer behavior models. Generic marketing strategies. Types of business models. Statistical methods for client analytics. Life-time value (CLV, LTV). Net profit. Predicting future margin with current sales data. Predicting customer lifetime value with linear regression in R. Omitted variable problem. Multicollinearity. Model validation. Risk of overfitting: use of statistics (AIC), automatic model selection, out-of-sample validation. Adjusted R-squared.
- Customer Segmentation and Unit EconomicsFactors of customer segmentation: demographics, technology, geography, lifestyles, behavior, new/returning contract, time from last purchase, frequency and value of spending, etc. Reducing the complexity of extensive correlated data. Differences between the goals of LTV models and segmentation techniques. Business-related criteria for segmentation: RFM (recency, frequency, monetary) analysis. Analytical techniques for customer segmentation: PCA, cluster analysis (k-means, DBSCAN, agglomerative algorithms). Applications of PCA for exploration in customer analytics. Reducing multicollinearity, building an index, visualizing multidimensional data. Visualizing correlations. Standardizing variances (scaling). Loadings of principal components. Interpretation of principal components. PCA model specification. Kaiser-Guttman criterion. Scree plot. Biplot of variables and components. Further analysis: fitting loadings to linear regression. Clustering algorithms. Distances between data points. Linkage criteria. Dendrogram plot. Applications of cluster analysis for customer analytics in R.
- Customer Churn. Churn PreventionHow to predict customer churn? How to detect and prevent customer churn? Factors of churn: expectations, performance, disconfirmation (disappointment based on perceived quality), satisfaction, churn intention/switching decisions. Push-pull-mooring paradigm for churn and service switching. Measurement of latent variables: satisfaction and expectation disconfirmation. Models of satisfaction, expectation disconfirmation, performance. Sources of data: Experts, logs, surveys. Case: Yandex Music vs. Spotify. Predicting client’s churn with logistic regression in R. The meaning of p-value. Interpretation of logistic regression coefficients. Model selection based on significance vs. theory. Inspecting the results of automatic model selection. Insample model fit for logistic regression: Pseudo-R-squared (interpretation of reasonable, good, and very good fit); accuracy calculation. The rule of “garbage in, garbage out”. Accuracy. Confusion matrix. Finding the optimal threshold: a table of potential payoffs. Composing a payoff matrix. Dealing with overfitting: out-ofsample validation and cross-validation. Splitting the sample in R. Specifying on train and predicting on test subsamples. K-fold methods of cross-validation. Accuracy for out-of-sample vs. cross validation. Addressing churn using segmentation and advertisement. Naive Bayes in predicting churn. Description of task and data for the project.
- Predicting Customer’s Time to ChurnPredicting time till next purchase with survival analysis. Addressing churn using segmentation and advertisement. Survival function. Censored data problem. Survival analysis models: pros and cons. Applications of survival models in customer analytics. Types of data censoring (left, interval, right, type I, type II, random). Assumptions of survival analysis. Survival curve analysis by Kaplan-Meier. Survival function and cumulative hazard function. Cumulative risk. Hazard rate. Kaplan-Meier estimation with a categorical covariate. Cox proportional hazards (CPH) model for multiple covariates. Assumptions of CPH. Interpretation of coefficients for categorical and continuous predictors. Survival plot. Visualization of CPH estimates. When assumptions are violated: stratified Cox model, model time-dependent coefficients. Prediction of survival curve for new customers. CPH model interpretation, calculation of customer lifetime value.
- What-If AnalysisThe analytical pipeline: database, model, dashboard, what-if analysis. Use of simulations in business for decision making. Scenarios as ways to construct prediction on data. From scenario, to simulation model, to prediction. What-if analysis vs. Extraction, Transformation and Loading (ETL) approach. Source variables and scenario parameters. Seven stages of what-if analysis: goal analysis, business modeling, data source analysis, multidimensional modeling, simulation modeling, data design and implementation, and validation (if failed, repeat 4-7). Activity diagram (scenario diagram). Case: productivity of branches. Stating the assumptions required to perform what-if analysis of models. Grouping assumptions into scenarios describing different ways of customers’ reaction to the policies. Building what-if models for each policy for each scenario. Compare and reflect on the results of scenario models. Reactive programming. Functions in R.
- Consumer Preferences and Choice ModelingIntroduction to consumer preference theory. Utility analysis. Cardinal utility, ordinal utility. Indifference curves show combinations that give equal utility. Marginal rate of substitution (MRS). Constraints: income, price, time. Uses of choice models in marketing and business analytics. Modeling customers' choice by product features. Multinomial logit models for choice vs. Conjoint analysis: when and where. Choice-based and metric conjoint. Sample size for a conjoint survey. Preparing the data for choice modeling. Managing and summarizing choice data. Selecting the features for modeling. Building Choice Models. Modeling different preferences for different groups of customers with hierarchical models (mixed logit models). Reporting choice models: choice share predictions, willingness-to-pay metric. Preference share vs. market share forecast. The “red bus/blue bus problem” in multinomial logistic models.
- Customer SatisfactionCustomer feedback surveys. Net promoter score (NPS) for measuring loyalty. Promoters, passives and detractors. Customer satisfaction survey (CSAT) for meeting expectations. Post-purchase surveys, product/service development survey, usability surveys. Expectation disconfirmation theory of post-purchase satisfaction. Key constructs: expectations, perceived performance, disconfirmation of beliefs and satisfaction. Inputs to expectations of value. Measuring the perceived performance: overall quality, interaction, service experience, value for money, social status. Problems of customer satisfaction surveys. Self-selection, overdelivering, expectation adjustment. Combining survey and behavior data. Discovering patterns with Bayesian networks (Bayes nets). Trimming groups of variables, defining the importance of predictors.
- Introduction to Business Process AnalyticsProcess mining. Business process data: extraction, processing and analysis. Process as a control flow, process as performance, process as the organizational background. Association rule mining. Markov chain models and sequential association rules. Identifying the process and process stakeholders. Collecting process information. Event data processing: event log objects, exploratory and descriptive analysis, conditional process analysis, process visualizations and process dashboards. Case study: order-to-cash process.
- Interim assessment (2 module)0.25 * in-class activity + 0.25 * MOOC + 0.15 * project 1 + 0.15 * project 2 (ind) + 0.2 * project 2 (team)