Automatic Patents Classification Using Supervised Machine Learning

2020 
Every year, approximately one million patent documents are issued with unique patent number or symbol. In order to find the relevant patent document, several users query the IPC documents using IPC symbols. So, there is a need of automatic classification and ranking of patent documents w.r.t. user query. Automatic classification is only possible through supervised machine learning techniques. In this paper, we classified patent documents using common classifiers. We collected 1625 patent documents related to eight different classes taken from IPC website using web crawler in an unstructured text. We considered 90% of training and 10% of test samples of the total patents. We built a feature matrix using tf-idf, smart notations and BM25 weighting schemes. This feature matrix is given to each classifier as input and output of each classifier consists of correctly classified and incorrectly classified instances. Finally, we evaluated the accuracy of each classifier using precision, recall and F-measure. We performed comparative analysis of classifiers and observed that by adding more features to each classifier, accuracy of classifier can be improved.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    1
    Citations
    NaN
    KQI
    []