logo
    How similarity influences word recognition
    0
    Citation
    0
    Reference
    10
    Related Paper
    Abstract:
    Of the many factors that influence how the word recognition process unfolds, the influence of similarity has received much attention. In this chapter, we consider how visual and auditory word recognition is influenced by phonological and orthographic similarity defined in terms of neighborhood structure. Orthographic neighbors are words that differ by a letter (e.g., bait and bail), whereas phonological neighbors are words that differ by a phoneme (e.g., bait and date). Throughout the chapter, we situate the effect of neighbors within a theoretical framework consisting of interactive activation and competition. We conclude by evaluating recent research indicating that individual differences play a role in the influence of neighbors on word recognition.
    Keywords:
    Similarity (geometry)
    We describe a method for automatic word sense disambiguation using a text corpus and a machine-readble dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance.
    Similarity (geometry)
    Word Sense Disambiguation
    SemEval
    Polysemy
    Citations (124)
    Neural Machine Translation (NMT) model has become the mainstream technology in machine translation. The supervised neural machine translation model trains with abundant of sentence-level parallel corpora. But for low-resources language or dialect with no such corpus available, it is difficult to achieve good performance. Researchers began to focus on unsupervised neural machine translation (UNMT) that monolingual corpus as training data. UNMT need to construct the language model (LM) which learns semantic information from the monolingual corpus. This paper focuses on the pre-training of LM in unsupervised machine translation and proposes a pre-training method, NER-MLM (named entity recognition masked language model). Through performing NER, the proposed method can obtain better semantic information and language model parameters with better training results. In the unsupervised machine translation task, the BLEU scores on the WMT’16 English–French, English–German, data sets are 35.30, 27.30 respectively. To the best of our knowledge, this is the highest results in the field of UNMT reported so far.
    Named Entity Recognition
    Citations (12)
    In text retrieval,insufficient expression of the client requirements usually leads to large amounts of inappropriate information,which brings inconvenience to user retrieval.The text similarity computing based on word co-occurrence presented in this paper enables users to delete or maintain text collections similar to a certain text in order to improve retrieval efficiency.
    Similarity (geometry)
    Co-occurrence
    Citations (1)
    An evaluation of distributed word representation is generally conducted using a word similarity task and/or a word analogy task. There are many datasets readily available for these tasks in English. However, evaluating distributed representation in languages that do not have such resources (e.g., Japanese) is difficult. Therefore, as a first step toward evaluating distributed representations in Japanese, we constructed a Japanese word similarity dataset. To the best of our knowledge, our dataset is the first resource that can be used to evaluate distributed representations in Japanese. Moreover, our dataset contains various parts of speech and includes rare words in addition to common words.
    Similarity (geometry)
    Representation
    Citations (6)
    This paper studies Chinese word similarity computing. A 3D model is proposed for representing word meaning based on different points of view. The first one is the view of primitive from Hownet, the second one is the view of words' occurrence in sentences from a specific corpus, and the third one is the view of well known background knowledge from online resources. A Chinese content word is represented in a 3D model. Similarity of two words is computed according to it. Experiments on Chinese news have shown that this method could perform better than existed ones based only on one point of view.
    Similarity (geometry)
    The sentence similarity computation plays an significant role in the fields of Chinese Language Processing. The paper presents a new approach to calculate the Chinese question semantic similarity, which is divided into two steps: the first step is to disambiguate the word sense in the question, and the second step is to compute the question semantic similarity based on the word sense. This paper uses HowNet to disambiguate the word sense and to calculate the sense similarity. The experimental results show that the proposed algorithm worked more reasonable in the real calculation and out-performs the conventional computed approaches.
    Similarity (geometry)
    Word Sense Disambiguation
    SemEval
    This paper presents POST STSS, a method of determining short-text semantic similarity in which part-of-speech tags are used as indicators of the deeper syntactic information usually extracted by more advanced tools like parsers and semantic role labelers. Our model employs a part-of-speech weighting scheme and is based on a statistical bag-of-words approach. It does not require either hand-crafted knowledge bases or advanced syntactic tools, which makes it easily applicable to languages with limited natural language processing resources. By using a paraphrase recognition test, we demonstrate that our system achieves a higher accuracy than all existing statistical similarity algorithms and solutions of a more structural kind.
    Paraphrase
    Similarity (geometry)
    Citations (24)
    We describe a method for automatic word sense disambiguation using a text corpus and a machine-readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method performs well, and can learn even from very sparse training data.
    Similarity (geometry)
    Word Sense Disambiguation
    Data set