Polynomial filtering in latent semantic indexing for information retrieval
34
Citation
16
Reference
10
Related Paper
Citation Trend
Abstract:
Latent Semantic Indexing (LSI) is a well established and effective framework for conceptual information retrieval. In traditional implementations of LSI the semantic structure of the collection is projected into the k-dimensional space derived from a rank-k approximation of the original term-by-document matrix. This paper discusses a new way to implement the LSI methodology, based on polynomial filtering. The new framework does not rely on any matrix decomposition and therefore its computational cost and storage requirements are low relative to traditional implementations of LSI. Additionally, it can be used as an effective information filtering technique when updating LSI models based on user feedback.Keywords:
Rank (graph theory)
Implementation
Latent semantic analysis
This chapter introduces an unsupervised learning method—Latent Semantic Analysis (LSA), first describing the word vector space model and the topic vector space model, followed by the SVD algorithm for LSA, and the Non-negative matrix factorization (NMF) algorithm.
Latent semantic analysis
Non-negative Matrix Factorization
Vector space model
Semantic space
Cite
Citations (0)
Latent semantic analysis (LSA) is a method for analyzing a piece of text with certain mathematical computation and analyzing relationship between terms in the documents, between the documents in the corpus.Various application of intelligent information retrieval, search engines, internet news sites requires an accurate method of accessing document similarity in order to carry out classification, clustering, summarizing or search tasks. So in this paper we are studying latent semantic analysis based on single value decomposition. The aim of Latent semantic analysis is to exploit the global structure of documents. The emphasis of latent semantic analysis is to find hidden relationship in document for better understanding the relationship between terms and document in dataset. In this paper, we have conducting a study using Latent semantic analysis (LSA) to find correlation of terms in a dataset consisting of research papers of various natural language processing applications.LSA shows that single value decomposition collapse multiple terms with same semantic and can identify terms with multiple meaning and represent documents in lower dimensional conceptual space.
Latent semantic analysis
Explicit semantic analysis
Document Clustering
Semantic compression
Cite
Citations (34)
We seek insight into Latent Semantic Indexing by establishing a method to identify the optimal number of factors in the reduced matrix for representing a keyword. This method is demonstrated empirically by duplicating all documents containing a term t, and inserting new documents in the database that replace t with t'. By examining the number of times term t is identified for a search on term t' (precision) using differing ranges of dimensions, we find that lower ranked dimensions identify related terms and higher-ranked dimensions discriminate between the synonyms.
Latent semantic analysis
Cite
Citations (25)
By using a small example, an analogy to photographic compression, and a simple visualization using heatmaps, we show that latent semantic analysis (LSA) is able to extract what appears to be semantic meaning of words from a set of documents by blurring the distinctions between the words.
Latent semantic analysis
Cite
Citations (1)
By using a small example, an analogy to photographic compression, and a simple visualization using heatmaps, we show that latent semantic analysis (LSA) is able to extract what appears to be semantic meaning of words from a set of documents by blurring the distinctions between the words.
Latent semantic analysis
Cite
Citations (2)
In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine.
Latent semantic analysis
Hierarchical clustering
Cite
Citations (7)
Probabilistic Latent Semantic Analysis (PLSA) is an information retrieval technique proposed to improve the problems found in Latent Semantic Analysis (LSA). We have applied both LSA and PLSA in our system for grading essays written in Finnish, called Automatic Essay Assessor (AEA). We report the results comparing PLSA and LSA with three essay sets from various subjects. The methods were found to be almost equal in the accuracy measured by Spearman correlation between the grades given by the system and a human. Furthermore, we propose methods for improving the usage of PLSA in essay grading.
Latent semantic analysis
Grading (engineering)
Cite
Citations (55)
Latent semantic analysis
Cite
Citations (1)
Recently, improvements of latent semantic analysis or LSA which stems from singular value decomposition to derive latent semantic classes, especially hk-LSA model, have been proposed. The hk-LSA model is based on reducing dimension of vector space and like-probabilistic relationship between document-term and latent-topic space. This improved model overcomes some shortcomings of standard LSA such as processing very dense and orthogonal matrices and difficulties in parallelization. It is dealt with this paper, some feasible ways to setup such a model and statistical comparisons between proposed ways to recognize good setup feasible for the hk-LSA model. Case studies on this subject suggest some ways to setup hk-LSA and show relationships between the standard LSA and hk-LSA model.
Latent semantic analysis
Cite
Citations (2)