• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Introduction to Data Mining in Finance and Business

2019/2020
Учебный год
ENG
Обучение ведется на английском языке
3
Кредиты
Статус:
Курс адаптационный
Когда читается:
1-й курс, 1 модуль

Программа дисциплины

Аннотация

The course is designed for students interested in data analysis problems. It provides theoretical foundations and practical skills related to data analysis, data mining, text mining, graph mining and ontology-based data analysis.
Цель освоения дисциплины

Цель освоения дисциплины

  • The course is focused on practical aspects of data mining methods and their applications in finance and business.
Результаты освоения дисциплины

Результаты освоения дисциплины

  • Students know theoretical foundations of contemporary methods and algorithms used in data analysis area
  • Students know the classification of problems
  • Students know advantages and disadvantages various methods
  • Students are able to choose right methods of analysis to a given problem
  • Students able to design and implement typical schemes of analysis
  • Students interpret the results delivered by data mining methods
  • Students are able to use data mining in decision making problems
  • Students are able to analyze financial data with the use of data mining methods
  • Students know how develop their theoretical knowledge and practical skills related to data analysis area
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Introduction do data analysis. Data preprocessing. Introduction to R language
    1. Introduction to data analysis 2. Confirmatory and exploratory approach in data analysis 3. Data types and sources 4. Data analysis process 5. Types of problems 6. Introduction to R language 7. Simple data types 8. Assignment statement 9. Basic input/output commands 10. Control statements
  • Complex data structures in R. Scripts and control statements. Data preprocessing and visualization. Linear regression. Logistic regression
    1. Complex data structures (vectors, matrices, data frames, lists) 2. Data preprocessing 3. Data visualization 4. Linear regression 5. Logistic regression
  • Neural network models
    1. Model of artificial neuron 2. Taxonomy of neural network models 3. Multilayer perceptrons 4. RBF networks 5. Kohonen networks 6. Deep learning
  • Decision tree models. Naive Bayes classifier. Support vector machine model. Association rules
    1. Introduction to decision tree models 2. Measurement of group homogeneity (entropy, Gini coefficient) 3. CART (Classification and Regression Tree) model 4. Ensemble models – bagging technique, boosting technique, random forest models 5. Random forests 6. Naive Bayes classifier 7. Support vector machine 8. Association rules and their evaluation 9. Apriori and ECLAT algorithms 10. Market basket analysis and churn analysis
  • Problem of dimensionality reduction. Multidimensional scaling. Correspondence analysis. Genetic algorithms
    1. Dimensionality reduction problem in data analysis 2. Principal component analysis 3. Singular Value Decomposition and its application 4. Multidimensional scaling 5. Correspondence analysis 6. Genetic algorithms and its application in dimensionality reduction
  • Cluster analysis
    1. Hierarchical methods 2. k-means method 3. Comparison of clustering process results 4. Evaluation of clustering quality 5. Model-based clustering
  • Introduction to text mining
    1. Document preprocessing 2. Frequency matrix and its analysis 3. Similarity of documents and words 4. Latent Semantic analysis 5. Latent Dirichlet Allocation
  • Sentiment analysis and ontology-based analysis of text documents
    1. Sentiment analysis 2. Ontologies in data analysis 3. Ontology-bases similarity 4. Ontology-based analysis of text documents
  • Network analysis
    1. Network structure analysis 2. Centrality 3. Network visualization 4. Link analysis 5. Bipartite graphs 6. Similarity of graphs 7. Frequent subgraph mining
Элементы контроля

Элементы контроля

  • Test 1 (неблокирующий)
  • Test 2 (неблокирующий)
  • Final exam (неблокирующий)
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (1 модуль)
    0.6 * Final exam + 0.2 * Test 1 + 0.2 * Test 2
Список литературы

Список литературы

Рекомендуемая основная литература

  • Venables, W. N., & Smith, D. M. (2015). An introduction to R. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.63A6F894

Рекомендуемая дополнительная литература

  • Max Bramer. (2000). Inducer: a rule induction workbench for data mining, in. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.ABA104A4