High Speed Document Clustering in Reconfigurable Hardware

2006 
High-performance document clustering systems enable similar documents to be automatically organized into groups. In the past, the large amount of computational time needed to cluster documents prevented practical use of such systems with a large number of documents. A full hardware implementation of the K-means clustering algorithm has been designed and implemented in reconfigurable hardware that clusters 512k documents rapidly. This implementation, uses four parallel cosine distance metrics to cluster document vectors that each have 4000 dimensions. The synthesized hardware runs on the Field Programmable Port Extender (FPX) platform at a clock rate of 80 MHz. Although the clock rate on the Xilinx VirtexE 2000 is slower than a CPU, the implementation runs 26 times faster than an algorithmically equivalent software algorithm running on an Intel 3.60 GHz Xeon. The same architecture was used to synthesize a faster and larger design for the Xilinx Virtex4 LX200. This larger implementation can contain up to 25 parallel cosine distance metrics. The implementation synthesized with a clock rate of 250 Mhz and outperforms the equivalent software by a factor of 328.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    6
    Citations
    NaN
    KQI
    []