HSE Scholars to Participate in Creating a New Platform for Russian National Corpus
The Russian Ministry of Education and Science has announced the results of a grant competition for big research projects. One of the winners is a project with HSE University participation: the creation of a next generation computational linguistic platform to digitally record the Russian language.
367 applications were submitted for the competition, with 41 winners being selected. The grants are offered to research institutions and universities that need public support for big research and technology projects in fields prioritized by the Russian Academy of Sciences. The maximum amount of a yearly grant is 100,000,000 roubles. Projects have a duration of three years, with an opportunity to be extended by two years. Information on the competition winners and grants has been published on the Ministry of Education and Science website.
One of the competition winners is a project called ‘Next Generation Computational Linguistic Platform for Digital Recording of the Russian Language: Infrastructure, Resources, and Research,’ which was submitted by a consortium of universities and research institutions. The funding for three years will be 236,000,000 roubles. The principal contractor is the RAS Institute for Information Transmission Problems. The consortium includes the RAS Vinogradov Institute of Russian Language, the RAS Institute of Linguistic Studies, Voronezh State University, and HSE University, which is represented by scholars from the School of Linguistics.
The grant was awarded for a multidisciplinary project related to renewing the platform of the Russian National Corpus – an information system based on a digital collection of Russian texts in various types and genres. The main goal of the corpus is to provide support for academic research in vocabulary and grammar, and its unique feature is the additional information on the texts – ‘annotations’. In 2004, when the Russian National Corpus was jointly created by the Institute of Russian Language and Yandex, the annotations were unprecedented even when compared to international analogues and has remained unique since then.
The volume of the Russian National Corpus has grown considerably over the last 15 years, with new sup-corpora and new functions. Today, new technology solutions are required to support its continued development and effective operations (Corpus 2.0). Remarkably, researchers from various fields (computer science, linguistics, philology, history, etc) and regions (Moscow, Voronezh Region, and St. Petersburg) will take part in developing these solutions. St. Petersburg houses the RAS Institute of Linguistic Studies, and researchers from the HSE University campus in St. Petersburg are also joining the project. Properly developed cooperation has become one of the success factors in the competition.
The planned core of the HSE team working on the project include professors Ekaterina Rakhilina, Valentina Apresyan, Olga Lyashevskaya, Nina Dobrushina, Natalia Slioussar, and Michael Daniel; associate professors Anastasiya Bonch-Osmolovskaya, Dmitry Sichinava, and Alexander Letuchiy; as well as senior lecturer Maria Kholodilova from St. Petersburg. Among other things, this team will prepare papers for academic journals and engage in thesis defence in the area of research.
Ekaterina Rakhilina, Head of the HSE School of Linguistics
The application for the grant was prepared on a tough deadline during the lockdown. Support from the Faculty of Humanities has been particularly important for us. Its management understand the specific nature of linguistic studies and created an environment for their development. We see upcoming work on the grant as a series of projects that will involve not only teachers and researchers, but also students; not only linguists, but also philologists, historians, and our colleagues from the Faculty of Computer Science and the Centre for Language and Brain.