Corpus ForenUCA: diseño, objetivos y estado actual en el marco del instituto de investigación en lingüística aplicada

Mario Crespo Miguel

Corpus ForenUCA: diseño, objetivos y estado actual en el marco del instituto de investigación en lingüística aplicada

2018

Mario Crespo Miguel

espanolUna de las disciplinas linguisticas mas recientes en el ambito hispanico es la Linguistica forense, caracterizada por el uso de tecnicas linguisticas para investigar delitos. Entre sus principales focos de investigacion se encuentra la determinacion del emisor de textos electronicos como emails, redes sociales o mensajeria movil. El estudio de los componentes dialectales y sociolectales del habla es esencial para una caracterizacion del genero, edad o nivel educativo del emisor de un texto determinado. En el ambito hispanico existe escasez de corpus de textos electronicos asociados a diferentes variables sociolinguisticas, y que sirva como soporte cientifico en el ambito de la Linguistica forense. Este trabajo presenta el Corpus ForenUCA de actual desarrollo en el Instituto de Investigacion en Linguistica Aplicada de la Universidad de Cadiz, que recopila textos procedentes de nuevos medios de comunicacion social — mensajeria corta, email y redes sociales —. Este trabajo presenta las directrices, diseno y objetivos finales de este corpus que actualmente cuenta con mas de 200 mil palabras EnglishOne of the most recent areas of interest in Spanish studies is Forensic Linguistics, distinguished by the linguistic analysis to investigate crime. Among the main points of interest are the authorship attribution of electronic texts such as emails, social networks or mobile messaging. The study of dialectal and sociolinguistic parameters of a text is essential when characterizing the gender, age or educational level of a certain text sender. There is a lack of corpus of Spanish electronic texts linked to different sociolinguistic variables able to provide scientific support to Forensic Linguistics. This works presents the ForenUCA Corpus, under development at the Applied Linguistics Research Institute of the University of Cadiz, aiming at collecting texts from new social media — short mobile messaging, email and social networks —. This paper presents the guidelines, design and objectives of this corpus that currently contains more than 200 thousand words

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations