Question pairs matching using text similarity algorithms
Abstract
Nowadays, text similarity problems are critical to solve many important tasks such as information retrieval, text classification, document clustering, machine translation, etc. as the amount of text information on the internet is expanding rapidly and widely.
To solve these problems, my thesis aims to review and discuss approaches in natural language processing field by critically analyzing traditional methods such as string-based, corpus-based and modern methods such as recurrent neural network, long short-term memory, Siamese network.
The final result is a comparison between methods based on time, accuracy rate and conclude with proposed approaches to improve current methods, ways to apply into real problems such as check duplication in text database.
Keywords: review text similarity algorithms; recurrent neural network; long short-term memory.