Word Similarity for Document Grouping using Soft Computing

2007 
Summary The technology world has provided a more efficient and quicker way of accessing information through the web and databases in organizations that implement information systems in order to achieve a competitive edge. The simplest way of filtering information is to extract keywords in measuring the documents relevance. Nonetheless, getting to the right document is often a problem. Synonymy i.e., two words with the same meaning, for example, taxi and cab is a major problem in information searching. This work uses the soft computing techniques in the area of information retrieval and they encompass both fuzzy set theory and probability theory. We propose an algorithm for computing asymmetric word similarities (AWS) to overcome the synonymy problem. The algorithm is computed using mass assignment based on fuzzy sets of words. A key feature of our algorithm is that it is incremental, i.e. words (and documents) can be added or subtracted without extensive re-computation. AWS produced similarity measures of consistently 10% higher than tf.idf algorithm and performed successful document groupings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    2
    Citations
    NaN
    KQI
    []