• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Introduction to Data Mining in Finance and Business

Учебный год
Обучение ведется на английском языке
Курс адаптационный
Когда читается:
1-й курс, 1 модуль

Course Syllabus


The course is designed for students interested in data analysis problems. It provides theoretical foundations and practical skills related to data analysis, data mining, text mining, graph mining and ontology-based data analysis.
Learning Objectives

Learning Objectives

  • The course is focused on practical aspects of data mining methods and their applications in finance and business.
Expected Learning Outcomes

Expected Learning Outcomes

  • Students know theoretical foundations of contemporary methods and algorithms used in data analysis area
  • Students know the classification of problems
  • Students know advantages and disadvantages various methods
  • Students are able to choose right methods of analysis to a given problem
  • Students able to design and implement typical schemes of analysis
  • Students interpret the results delivered by data mining methods
  • Students are able to use data mining in decision making problems
  • Students are able to analyze financial data with the use of data mining methods
  • Students know how develop their theoretical knowledge and practical skills related to data analysis area
Course Contents

Course Contents

  • Introduction do data analysis. Data preprocessing. Introduction to R language
    1. Introduction to data analysis 2. Confirmatory and exploratory approach in data analysis 3. Data types and sources 4. Data analysis process 5. Types of problems 6. Introduction to R language 7. Simple data types 8. Assignment statement 9. Basic input/output commands 10. Control statements
  • Complex data structures in R. Scripts and control statements. Data preprocessing and visualization. Linear regression. Logistic regression
    1. Complex data structures (vectors, matrices, data frames, lists) 2. Data preprocessing 3. Data visualization 4. Linear regression 5. Logistic regression
  • Neural network models
    1. Model of artificial neuron 2. Taxonomy of neural network models 3. Multilayer perceptrons 4. RBF networks 5. Kohonen networks 6. Deep learning
  • Decision tree models. Naive Bayes classifier. Support vector machine model. Association rules
    1. Introduction to decision tree models 2. Measurement of group homogeneity (entropy, Gini coefficient) 3. CART (Classification and Regression Tree) model 4. Ensemble models – bagging technique, boosting technique, random forest models 5. Random forests 6. Naive Bayes classifier 7. Support vector machine 8. Association rules and their evaluation 9. Apriori and ECLAT algorithms 10. Market basket analysis and churn analysis
  • Problem of dimensionality reduction. Multidimensional scaling. Correspondence analysis. Genetic algorithms
    1. Dimensionality reduction problem in data analysis 2. Principal component analysis 3. Singular Value Decomposition and its application 4. Multidimensional scaling 5. Correspondence analysis 6. Genetic algorithms and its application in dimensionality reduction
  • Cluster analysis
    1. Hierarchical methods 2. k-means method 3. Comparison of clustering process results 4. Evaluation of clustering quality 5. Model-based clustering
  • Introduction to text mining
    1. Document preprocessing 2. Frequency matrix and its analysis 3. Similarity of documents and words 4. Latent Semantic analysis 5. Latent Dirichlet Allocation
  • Sentiment analysis and ontology-based analysis of text documents
    1. Sentiment analysis 2. Ontologies in data analysis 3. Ontology-bases similarity 4. Ontology-based analysis of text documents
  • Network analysis
    1. Network structure analysis 2. Centrality 3. Network visualization 4. Link analysis 5. Bipartite graphs 6. Similarity of graphs 7. Frequent subgraph mining
Assessment Elements

Assessment Elements

  • non-blocking Test 1
  • non-blocking Test 2
  • non-blocking Final exam
Interim Assessment

Interim Assessment

  • Interim assessment (1 module)
    0.6 * Final exam + 0.2 * Test 1 + 0.2 * Test 2


Recommended Core Bibliography

  • Venables, W. N., & Smith, D. M. (2015). An introduction to R. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.63A6F894

Recommended Additional Bibliography

  • Max Bramer. (2000). Inducer: a rule induction workbench for data mining, in. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.ABA104A4