Hardware accelerated algorithms for semantic processing of document streams

2006 
There is a need within the intelligence communities to analyze massive streams of multilingual unstructured data. Mathematical transformation algorithms have proven effective at interpreting multilingual, unstructured data, but high computational requirements of such algorithms prevent their widespread use. The rate of computation can be vastly increased with field programmable gate array (FPGA) hardware. To experiment with this approach, we developed a system with FPGAs that ingests content over a network at high data rates. The system extracts basewords, counts words, scores documents, and discovers concepts on data that are carried in TCP/IP network flows as packets over a Gigabit Ethernet link or in cells transported over an OC48 link. These algorithms, as implemented in FPGA hardware, introduce certain constraints on the complexity and richness of the semantic processing algorithms. To understand the implications of these constraints and to benchmark the performance of the system, we have performed a series of experiments processing multilingual documents. In these experiments, we compare techniques to generate basewords for our semantic concepts, score documents, and discover concepts across a variety of processing operational scenarios.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    24
    Citations
    NaN
    KQI
    []