L.A.S.L.A. and Collatinus: a convergence in lexica

2020 
The research group L.A.S.L.A. (Laboratoire d’Analyse Statistique des Langues Anciennes, University of Liege, Belgium) began in 1961 a project of lemmatization and morphosyntactic tagging of Latin texts. This project continues with new texts lemmatized each year (see http://web.philo.ulg.ac.be/lasla/ ). The resulting files, which contain approximatively 2,500,000 words, whose lemmatization and tagging have been verified by a philologist, have recently been made available to interested scholars. In the early 2000s, Collatinus was developed by Yves Ouvrard for teaching. Its goal was to generate a complete lexical aid, with a short translation and the morphological analyses of the forms, for any text that can be given to the students (see https://outils.biblissima.fr/fr/ collatinus/ ). Although these two projects look very different, they met a few years ago in the conception of a new tool to speed up the lemmatization process of Latin texts at L.A.S.L.A. This tool is based on a concurrent lemmatization of each word by looking for the form in those already analyzed in the L.A.S.L.A. files and by Collatinus. This lemmatization is followed by a disambiguation process with a second-order hidden Markov model and the result is presented in a text-editor to be corrected by the philologist.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []