In spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question. A common strategy is to employ an automatic speech recognition (ASR) system to translate speech contents into auto-transcribed text. Therefore, a SMCQA task is reduced to a classic MCQA task. Under the strategy, bidirectional encoder representations from transformers (BERT) can achieve a certain level of performance despite ASR errors. However, previous studies have evidenced that acoustic-level statistics can compensate for text inaccuracies caused by ASR systems, thereby improving the performance of a SMCQA system. Accordingly, we concentrate on designing a BERT-based SMCQA framework, which not only inherits the advantages of contextualized language representations learned by BERT, but integrates acoustic-level information with text-level information in a systematic and theoretical way. Considering temporal characteristics of speech, we first formulate multi-turn audio-extracter hierarchical convolutional neural networks (MA-HCNNs), which encode acoustic-level features under various temporal scopes. Based on MA-HCNNs, we propose a multi-turn audio-extracter BERT-based (MA-BERT) framework for SMCQA task. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.
In recent years, multilingual question answering has been an emergent research topic and has attracted much attention. Although systems for English and other rich-resource languages that rely on various advanced deep learning-based techniques have been highly developed, most of them in low-resource languages are impractical due to data insufficiency. Accordingly, many studies have attempted to improve the performance of low-resource languages in a zero-shot or few-shot manner based on multilingual bidirectional encoder representations from transformers (mBERT) by transferring knowledge learned from rich-resource languages to low-resource languages. Most methods require either a large amount of unlabeled data or a small set of labeled data for low-resource languages. In Wikipedia, 169 languages have less than 10,000 articles, and 48 languages have less than 1,000 articles. This reason motivates us to conduct a zero-shot multilingual question answering task under a zero-resource scenario. Thus, this study proposes a framework to fine-tune the original mBERT using data from rich-resource languages, and the resulting model can be used for low-resource languages in a zero-shot and zero-resource manner. Compared to several baseline systems, which require millions of unlabeled data for low-resource languages, the performance of our proposed framework is not only highly comparative but is also better for languages used in training.
Spoken multiple-choice question answering (SMCQA) requires machines to select the correct choice to answer the question by referring to the passage, where the passage, the question, and multiple choices are all in the form of speech. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in model development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from Transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, in this study, an audio-aware SMCQA framework is proposed. Two different mechanisms are introduced to distill the useful cues from speech, and then a BERT-based SMCQA framework is presented. In other words, the proposed SMCQA framework not only inherits the advantages of contextualized language representations learned by BERT but integrates the complementary acoustic-level information distilled from audio with the text-level information. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.
We externally validated Fujimoto's post-transplant lymphoproliferative disorder (PTLD) scoring system for risk prediction by using the Taiwan Blood and Marrow Transplant Registry Database (TBMTRD) and aimed to create a superior scoring system using machine learning methods. Consecutive allogeneic hematopoietic cell transplant (HCT) recipients registered in the TBMTRD from 2009 to 2018 were included in this study. The Fujimoto PTLD score was calculated for each patient. The machine learning algorithm, least absolute shrinkage and selection operator (LASSO), was used to construct a new score system, which was validated using the fivefold cross-validation method. We identified 2,148 allogeneic HCT recipients, of which 57 (2.65%) developed PTLD in the TBMTRD. In this population, the probabilities for PTLD development by Fujimoto score at 5 years for patients in the low-, intermediate-, high-, and very-high-risk groups were 1.15%, 3.06%, 4.09%, and 8.97%, respectively. The score model had acceptable discrimination with a C-statistic of 0.65 and a near-perfect moderate calibration curve (HL test p = .81). Using LASSO regression analysis, a four-risk group model was constructed, and the new model showed better discrimination in the validation cohort when compared with The Fujimoto PTLD score (C-statistic: 0.75 vs. 0.65). Our study demonstrated a more comprehensive model when compared with Fujimoto's PTLD scoring system, which included additional predictors identified through machine learning that may have enhanced discrimination. The widespread use of this promising tool for risk stratification of patients receiving HCT allows identification of high-risk patients that may benefit from preemptive treatment for PTLD. This study validated the Fujimoto score for the prediction of post-transplant lymphoproliferative disorder (PTLD) development following hematopoietic cell transplant (HCT) in an external, independent, and nationally representative population. This study also developed a more comprehensive model with enhanced discrimination for better risk stratification of patients receiving HCT, potentially changing clinical managements in certain risk groups. Previously unreported risk factors associated with the development of PTLD after HCT were identified using the machine learning algorithm, least absolute shrinkage and selection operator, including pre-HCT medical history of mechanical ventilation and the chemotherapy agents used in conditioning regimen.
In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in system development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, this study concentrates on designing a BERT-based SMCQA framework, which not only inherits the advantages of contextualized language representations learned by BERT, but integrates the complementary acoustic-level information distilled from audio with the text-level information. Consequently, an audio-enriched BERT-based SMCQA framework is proposed. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.
In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in system development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, this study concentrates on designing a BERT-based SMCQA framework, which not only inherits the advantages of contextualized language representations learned by BERT, but integrates the complementary acoustic-level information distilled from audio with the text-level information. Consequently, an audio-enriched BERT-based SMCQA framework is proposed. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.
This paper presents a framework to answer the questions that require various kinds of inference mechanisms (such as Extraction, Entailment-Judgement, and Summarization). Most of the previous approaches adopt a rigid framework which handles only one inference mechanism. Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks. To alleviate the problems mentioned above, we propose a divide-and-conquer framework, which consists of a set of various answer generation modules, a dispatch module, and an aggregation module. The answer generation modules are designed to provide different inference mechanisms, the dispatch module is used to select a few appropriate answer generation modules to generate answer candidates, and the aggregation module is employed to select the final answer. We test our framework on the 2020 Formosa Grand Challenge Contest dataset. Experiments show that the proposed framework outperforms the state-of-the-art Roberta-large model by about 11.4%.