Implementing Graph Based Rank on Online News Media Keyword Extraction

2019 
Commonly used method in keyword extraction usually needs a huge collections of articles to determine important and unimportant words. For example, in TF-IDF, one of the most popular term-weighting schemes, a huge collections of documents is needed for stop-words filtering. Other than that, TF-IDF can be used to rank words using its significance in an article. Words with less significant value will be ranked last. In this paper, a new approach in word ranking is applied in keyword extraction process. This approach using a single document to determine its keywords. Online news media articles, taken from TEMPO.CO website, written in Bahasa Indonesia are used in this paper. Online news media is chosen as it has semi-formal language structure and completed with additional information, such as tags or keywords. TEMPO.CO is believed as one of reliable online news media which has good quality news. In addition, after pre-evaluation, article from TEMPO.CO has the most related keywords with its content compared to other articles taken from different online news media portal. Weight from each word is calculated using graph based method. Then, every word is ranked based on its weight. Words placed in five and ten highest percentile will be assumed as keywords. Precision and recall are estimated using keywords found and given from the article. The evaluation gives a better result in using graph based rank compared to TF-IDF. The difference ranges from ten to twenty percent. Although there is an absolute distinction between the two methods, both their average precision and recall values have high standard deviation. This condition is due to the unclear pattern in keyword selection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    1
    Citations
    NaN
    KQI
    []