NLP techniques for Fake News detection
In this project we aim to investigate which NLP techniques are most effective at detecting fake news online. We will research current state-of-the-art in NLP and claim verification techniques, and identify areas for further investigation. Previous offerings in this topic have included knowledge graph-based techniques facilitated with string-matching from large knowledge bases such as Wikipedia. Further research is required at every stage of the process. Fake news detection itself is not the full story: some researchers try to predict whether an article contains false information, while others look at verifying individual claims. We will look into the whole fact checking and fake news process, including interviews with journalists and professional fact-checkers. Several researchers have looked at how to construct large datasets for fake news identification with the right kind and amount of metadata. Further work has then investigated which Machine Learning models should be used to process this data, from Naive Bayes classifiers with no stop-word or stemming stages, to complex convolutional neural networks with stacks of modules. We will look at the interaction of different kinds of data with different kinds of models. Processing input data for relevant information is another step which requires further research, and which is tackled differently by different open challenges and datasets. The FEVER dataset release paper recommends splitting claim verification tasks into document retrieval, sentence selection, and claim verification stages, so that a model will assemble relevant information first before trying to verify a claim. Conversely, the Fake News Challenge dataset is generated with the aim of checking whether or not the headline provided is related to the article body text. We aim to investigate if these are the definitive steps needed for fact- checking, or if some other conceptual step is needed along the way.