Interactive System Using LDA for Exploratory Visualization to Extract Data Association in a Data Lake

2018 
An interactive system previously developed for exploratory visualization of data associations in a data lake using a self-organizing structure of schemas has been improved by incorporating a machine learning function for latent Dirichlet allocation (LDA) and a categorization function. A topic (i.e., a list of data values and corresponding appearance probabilities) estimated by LDA can be used as a recommendation that indicates latent data association of co-occurrences in a complex network structure. Results of experiments using random data demonstrated that a latent data association with a signal strength of 0.20 (Jaccard coefficient) can be detected over noise with a strength of up to 0.24. The detected recommendation potentially can help the user to create a hypothesis of a useful pattern in big data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []