Encoding Web-based Data for Efficient Storage in Machine Learning Applications

2019 
With the advent of the information era, we have seen a huge boom in the amount of data produced over the years, which is primarily the result of the Internet and its billions of users worldwide. The internet is a storehouse of all kinds of data– text, videos, and images. But most of this data is not suitable for learning algorithms directly. There is a need for the processing of the data prior to being applied to various learning algorithms. Deep learning algorithms using neural networks require efficient and proper datasets to yield better results for predictive analysis. As we need to deal with big data over the internet, efficient storage is a challenge. Encoding the dataset in a convenient and efficient form and then storing it is of immense importance. Here, we compare various encoding algorithms to store pre-processed text data. Using Huffman Encoding, the simulation results, for a random sample of 8000 English words, have indicated that the storage space (memory) requirement dropped to just 0.1% in comparison with the more traditional One-Hot encoding technique.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []