Toward the automatic identification of sublanguage vocabulary
1993
Abstract A sublanguage is the language used in a restricted or specialized domain or field, such as computer science. Information about the vocabulary and structure of a sublanguage is used in any domain-related natural language processing application; however, such information is very time-consuming to gather, and much of it must be found and organized manually. Additionally, information retrieval strategies using lexical information depend on finding the appropriate dictionary entry for general and technical words. The ability to automatically identify terms belonging to a sublanguage could aid in these and other applications. In this paper, a simple but effective method is developed for automatic identification of sublanguage vocabulary words as they occur in abstracts. This procedure may significantly reduce the effort required to extract sublanguage vocabulary for sublanguage analysis and other applications, such as information retrieval. First, the sublanguage vocabulary identification procedures are described using abstracts from computer science and library and information science as the sublanguage sources. The results of the experiments are evaluated using three different criteria. Finally, the practical and theoretical significance of this research is discussed along with plans for further experiments on the vocabulary and structure of sublanguages.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
8
References
13
Citations
NaN
KQI