Składnica : a constituency treebank of Polish harmonised with the Walenty valency dictionary

2021 
This paper reports on the developments in three interrelated linguistic resources for Polish. The first is Świgra 2—a rule based constituency parser for Polish. The second is Skladnica—a treebank built using Świgra 2. The third resource is valency dictionary Walenty, which became available when the work on the first two was already advanced. However, since the dictionary is much more comprehensive than the ad-hoc dictionary used previously with Świgra, a decision was made to switch the parser and the treebank to the new dictionary. The switch required several modifications to the Świgra 2 parser, including implementation of unlike coordination, introducing semantically motivated phrases, and non-standard case values. A semi-automated procedure to upgrade previously disambiguated trees in Skladnica was required as well. Modifications introduced in the treebank during the upgrade included systematic changes of notation and resolving newly introduced ambiguities resulting from the use of the more detailed distinctions made in the dictionary. The procedure for confronting Skladnica with the trees generated with the new version of the Świgra 2 parser using the Walenty dictionary allowed us to check all of these resources for consistency. This resulted in several corrections being introduced in both the treebank and the valency dictionary.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []