We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Speech Technologies

2024/2025
Academic Year
ENG
Instruction in English
6
ECTS credits
Course type:
Elective course
When:
2 year, 1, 2 module

Instructor


Гурков Иван Евгеньевич

Course Syllabus

Abstract

The course introduces students to the basic principles and methods of speech signal analysis and automatic synthesis, as well as automatic speech recognition. Students obtain an understanding of the acoustics of the speech signal, learn to apply various tools for its processing and markup. Students are also introduced to existing speech recognition and synthesis systems and learn to apply them in practice.
Learning Objectives

Learning Objectives

  • Familiarization with the methods of signal processing
  • Familiarization with the method of recognition and synthesis of speech
  • Recognition by the student of the system and the model of synthesis and recognition
Expected Learning Outcomes

Expected Learning Outcomes

  • has an idea of the acoustic theory of speech formation, operates basic acoustic concepts (frequency, period, amplitude, resonator, spectrum, harmonics, formants, basic tone)
  • possesses skills of signal processing: construction of instantaneous spectra and sonograms, calculation of formants, signal markup in Praat program, manipulation of signal properties (amplitude, basic tone)
  • is oriented in the basic methods of speech signal synthesis (compilative: subphonetic, allophonetic, diphonetic, syllabic, macrosynthesis, unit selection; parametric, articulatory)
  • possesses skills of sound base development for compilative synthesis
  • is fluent in the apparatus of automatic speech recognition system (ASR): acoustic model, language model, decoder
  • possesses skills of extracting acoustic features relevant for ASR from the signal using Kaldi or Python
  • understands the principles of creating pronunciation dictionaries, is oriented in methods and tools of their development
  • possesses the skills of applying ASR systems in practice and evaluating the quality of recognition.
Course Contents

Course Contents

  • Acoustic theory of speech formation
  • Acoustic analysis of speech signal
  • History of speech technologies
  • Directions of speech synthesis
  • Compilative synthesis of speech
  • Automatic transcription and text normalization
  • General information about ASR systems
  • Acoustic modeling in ASR systems
  • Language modeling and dictionaries in ASR systems
  • Finding the right solution
Assessment Elements

Assessment Elements

  • non-blocking Homework
    Homework: includes practical assignments
  • non-blocking Exam
    The examination is conducted in verbal form by tickets. Each ticket contains two questions
Interim Assessment

Interim Assessment

  • 2024/2025 2nd module
    0.3 * Exam + 0.7 * Homework
Bibliography

Bibliography

Recommended Core Bibliography

  • Speech and language processing, Jurafsky, D., 2014

Recommended Additional Bibliography

  • A history of communications : media and society from the evolution of speech to the Internet, Poe, M. T., 2011

Authors

  • Kolmogorova Anastasiia Vladimirovna
  • Корнева Анна Михайловна
  • KESSEL KSENIIA VITALEVNA