Text mining: Advanced Level
- Learn algorithms and their main advantages and limitations in terms of text data analysis
- Obtain skills to work with machine learning software / cod
- Be able to work with text data.
- Have skills to analyze textual data
- Analyze data with machine learning tools
- Do textual preprocessing (lemmatization and tokenization)
- Present the resulting project in terms of machine learning
- Visualize results of the analysis
- Objectives of text analysis - preprocessing, lematization-vectorization.
- Overview of classical classifiers such as KNN, Random Forrest, SVM
- Bayesian classification for sentiment analysis or topic definition.
- Topic modeling (plane), quality metrics (Coherence, Perplexity, Loglokellyhood, stability, Renyi entropy), review of some libraries.
- Topic modeling (hierarchical models, discussion of problems).
- Embedings (gensim), what are word embeddings, how to work with words embedings.
- Topic models with embedings (ETM, GLDAW).
- Introduction to neuron networks (Tensorflow, keras) - the basics of working with Keras, an overview of some neural networks.
- Preprocessing of text data for neural networking.
- Working with recurrent neural networks for textual analysis.
- Working with LSTM neural networks for textual analysis.
- Model with multiple outputs (heads).
- Presentation of student work.
- ExamThe exam is a competition (hakaton) to develop the best model of sentiment analysis for the Russian-language text. The essence of the competition is as follows. At the end of the first part of the course a Russian-language dataset with sentiment scores will be given. Students must train their classification models on this dataset. A week before the exam, students will receive the second part of the dataset, which they must use to test the models they have learned. On the exam, students give a presentation on their models. The grade for the presentation depends, first, on the level of presentation. Second, the grade depends on the results obtained (level of model learning and number of models).
- Sebastian Raschka, & Vahid Mirjalili. (2019). Python Machine Learning : Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2, 3rd Edition. Packt Publishing.
- Miroslav Kubat. (2017). An Introduction to Machine Learning (Vol. 2nd ed. 2017). Springer.