Think Beyond the Word: Understanding the Implied Textual Meaning by Digesting Context, Local, and Noise

Guoxiu He,Zhe Gao,Zhuoren Jiang,Yangyang Kang,Changlong Sun,Xiaozhong Liu,Wei Lu

Think Beyond the Word: Understanding the Implied Textual Meaning by Digesting Context, Local, and Noise

2020

Implied semantics is a complex language act that can appear everywhere on the Cyberspace. The prevalence of implied spam texts, such as implied pornography, sarcasm, and abuse hidden within the novel, tweet, microblog, or review, can be extremely harmful to the physical and mental health of teenagers. The non-literal interpretation of the implied text is hard to be understood by machine models due to its high context-sensitivity and heavy usage of figurative language. In this study, inspired by human reading comprehension, we propose a novel, simple, and effective deep neural framework, called Skim and Intensive Reading Model (SIRM), for figuring out implied textual meaning. The proposed SIRM consists of three main components, namely the skim reading component, intensive reading component, and adversarial training component. N-gram features are quickly extracted from the skim reading component, which is a combination of several convolutional neural networks, as skim (entire) information. An intensive reading component enables a hierarchical investigation for both sentence-level and paragraph-level representation, which encapsulates the current (local) embedding and the contextual information (context) with a dense connection. More specifically, the contextual information includes the near-neighbor information and the skim information mentioned above. Finally, besides the common training loss function, we employ an adversarial loss function as a penalty over the skim reading component to eliminate noisy information (noise) arisen from special figurative words in the training data. To verify the effectiveness, robustness, and efficiency of the proposed architecture, we conduct extensive comparative experiments on an industrial novel dataset involving implied pornography and three sarcasm benchmarks. Experimental results indicate that (1) the proposed model, which benefits from context and local modeling and consideration of figurative language (noise), outperforms existing state-of-the-art solutions, with comparable parameter scale and running speed; (2) the SIRM yields superior robustness in terms of parameter size sensitivity; (3) compared with ablation and addition variants of the SIRM, the final framework is efficient enough.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations