This book has been long in the making, but now having the result under our eyes, we believe that it was worth the wait.The idea of compiling a volume collecting the experiences of the various constructicon initiatives going on around the world was born in the context of an international collaboration between the universities at Gothenburg in Sweden and Juiz de Fora in Brazil, and the excellent opportunities to interact and learn from each other's experiences afforded by both research visits and the international FrameNet workshops organized jointly by the Swedish and Brazilian teams, together with the FrameNet group in Berkeley, California: IFNW 2013 in Berkeley, IFNW 2016, collocated with ICCG9 in Juiz de Fora, and the upcoming IFNW 2018 with the special theme Multilingual FrameNets and Constructicons, collocated with LREC in Miyazaki, Japan.Moreover, profitable discussions relevant to the works presented in this book took place in the special sessions Cognitively grounded lexica, constructicons, and metaphor repositories, at ICLC12 in Edmonton, Canada, in 2013, and Constructionist resources -a workshop in honor of Charles J. Fillmore, at ICCG8 in Osnabrück, Germany, in 2014.During these events, most -if not all -authors of the chapters in this volume had the chance to share their points of view, positions and questions on the development of constructionist resources.Beyond the group of authors whose contributions make this book, we'd like to thank our -and their -interlocutors.
The Russian Constructicon project currently prioritizes multi-word constructions that are not represented in dictionaries and that are especially useful for learners of Russian. The immediate goal is to identify constructions and determine the semantic constraints on their slots. The Russian Constructicon is being built in parallel with the Swedish Constructicon and will ultimately model the entire Russian language in terms of constructions at all levels from morpheme to discourse. The contents of the Russian Constructicon will serve learners of the language, linguists researching both language-internal and typological phenomena, and will also serve language technology applications such as spell checkers and automated readability assessment tools.
A new kind of frequency dictionary is a valuable reference for researchers and students of Russian. It shows the grammatical profiles of nouns, adjectives, and verbs, namely the distribution of grammatical forms in the inflectional paradigm. The dictionary is based on data from the Russian National Corpus (RNC) and covers a core vocabulary (5,000 most frequently used lexemes). Russian is a morphologically rich language: its noun paradigms harbor two dozen case and number forms, while verb paradigms include up to 160 grammatical forms. The dictionary departs from traditional frequency lexicography in several ways: 1) word forms are arranged in paradigms, so their frequencies can be compared and ranked; 2) the dictionary is focused on the grammatical profiles of individual lexemes, rather than on the overall distribution of grammatical features (e.g., the fact that Future forms are used less frequently than Past forms); 3) the grammatical profiles of lexical units can be compared against the mean scores of their lexico-semantic class; 4) in each part of speech or semantic class, lexemes with certain biases in the grammatical profile can be easily detected (e.g. verbs used mostly in the Imperative, Past neutral, or nouns often used in the plural); and, 5) the distribution of homonymous word forms and grammatical variants can be followed over time and within certain genres and registers. The dictionary will be a source for research in the field of Russian grammar, paradigm structure, form acquisition, grammatical semantics, as well as variation of grammatical forms. The main challenge for this initiative is the intra-paradigm and inter-paradigm homonymy of word forms in the corpus data. Manual disambiguation is accurate but covers approximately five million words in the RNC, so the data may be sparse and possibly unreliable. Automatic disambiguation yields slightly worse results. However, a larger corpus shows more reliable data for rare word forms. A user can switch between a ?basic? version, which is based on a smaller collection of manually disambiguated texts, and an ?expanded? version, which is based on the main corpus, a newspaper corpus, a corpus of poetry, and the spoken corpus (320 million words in total). The article addresses some general issues, such as establishing the common basis of comparison, a level of granularity for the grammatical profile, and units of measurement. We suggest certain solutions related to the selection of data, corpus data processing, and maintaining the online version of the frequency dictionary
The paper provides an overview of the results of the fundamental reconstruction and modernization project of the National Corpus of the Russian Language platform, carried out from 2020 to 2023. The focus of the paper is on the new opportunities that are opening up for linguists and a wider audience. This includes improving the representativeness of existing corpora, creating new corpora, new annotation obtained through the application of neural network models, and new interface solutions. Three notable new components are examined in more detail: a resource-related one, which is the new Social Networks corpus, a search-related one, which is the Panchronic corpus that combines searches across corpora from different periods, and an analytical one, which is the functional complex of statistics and data visualization.