LexDivPara: A Measure of Paraphrase Quality with Integrated Sentential Lexical Complexity

Thanh Thieu,Ha Do,Thanh Duong,Shi Pu,Sathyanarayanan N. Aakur,Saad Khan

LexDivPara: A Measure of Paraphrase Quality with Integrated Sentential Lexical Complexity

2021

Thanh Thieu
Ha Do
Thanh Duong
Shi Pu
Sathyanarayanan N. Aakur
Saad Khan

We present a novel method that automatically measures quality of sentential paraphrasing. Our method balances two conflicting criteria: semantic similarity and lexical diversity. Using a diverse annotated corpus, we built learning to rank models on edit distance, BLEU, ROUGE, and cosine similarity features. Extrinsic evaluation on STS Benchmark and ParaBank Evaluation datasets resulted in a model ensemble with moderate to high quality. We applied our method on both small benchmarking and large-scale datasets as resources for the community.

Keywords:

Computer science
Semantic similarity
Natural language processing
Benchmark (computing)
Cosine similarity
Learning to rank
Edit distance
Paraphrase
Artificial intelligence
Benchmarking
Lexical diversity

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations