Mathematical Expression Retrieval in PDFs from the Web Using Mathematical Term Queries

2020 
Since mathematical expressions on the web are not annotated with natural language, searching for expressions by conventional search engines is difficult. Our method performs web searches using a mathematical term as a query and extracts expressions related to it from the obtained PDF files. We convert the PDF to TeX, create images from the mathematical descriptions in TeX and obtain image feature quantities. The expressions are discriminated by a support vector machine (SVM) using the feature quantities. Our experimental results show that eliminating slide-derived PDF files effectively improves F-measure and the mean reciprocal rank (MRR) is best when using both PDFs and HTML.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    1
    Citations
    NaN
    KQI
    []