Dataset-aware multi-task learning approaches for biomedical named entity recognition.

2020 
MOTIVATION: Named entity recognition (NER) is a critical and fundamental task for biomedical text-mining. Recently, researchers have focused on exploiting deep neural networks for biomedical named entity recognition (Bio-NER). The performance of deep neural networks on a single dataset mostly depends on data quality and quantity while high quality data tends to be limited in size. To alleviate task-specific data limitation, some studies explored the multi-task learning for Bio-NER and achieved state-of-the-art performance. However, these multi-task learning methods did not make full use of information from various datasets of Bio-NER. The performance of state-of-the-art multi-task learning method was significantly limited by the number of training datasets. RESULTS: We propose two dataset-aware multi-task learning (MTL) approaches for Bio-NER which jointly train all models for numerous Bio-NER datasets, thus each of these models could discriminatively exploit information from all of related training datasets. Both of our two approaches achieve substantially better performance compared with the state-of-the-art multi-task learning method on 14 out of 15 Bio-NER datasets. Furthermore, we implemented our approaches by incorporating Bio-NER and biomedical POS (part-of-speech) tagging datasets. The results verify Bio-NER and POS can significantly enhance one another. AVAILABILITY: Our source code is available at https://github.com/zmmzGitHub/MTL-BC-LBC-BioNER and all datasets are publicly available at https://github.com/cambridgeltl/MTL-Bioinformatics-2016. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    2
    Citations
    NaN
    KQI
    []