Paraphrastic Variance between European and Brazilian Portuguese.

2018 
This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and phrasal units, such as the compounds “toda a gente” versus “todo o mundo” ‘everybody’ or the gerundive constructions [estar a + V-Inf] versus [ficar + V-Ger] (e.g., “estive a observar” | “fiquei observando” ‘I was observing’), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases. The construction of a larger dataset of paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    2
    Citations
    NaN
    KQI
    []