Provenance Chain Fact Validation in Neural Knowledge Graphs
Recent years have brought a proliferation of information to the public. Social networks serve up billions of bite size chunks of “information” which we as humans process in the context of our world view and experience. But even with our wealth of “knowledge” about the world, it can be very difficult to infer the veracity or intent of the information presented. The potential for harm cannot be understated – the effects of mis- and disinformation on society, whether it be in politics, public health, or climate change, are already evident. The application of modern Machine Learning, and in particular Deep Learning techniques is constantly evolving and improving. However, the classification of information based solely on its linguistic content can only get us so far. We would like to explore the use of Knowledge Graphs (KG) as additional context for identifying false information. In particular, we would like to explore provenance (to which graph structures ideally lend themselves) as indicative of the probability that the item is/not “true” (this term requires a much more in-depth definition beyond the scope of this introduction). In addition, we are interested in the extent to which sources are biased as a possible proxy for intent. We also believe that it is not enough to provide a model with high precision, but that the model must be explainable. We think it is important to provide a provenance chain with credibility and bias indicators at each step. There is currently a lot of manual effort in this arena – FactCheck.org, PolitiFact, Snopes.com, Hoax Slayer, to name a few. We would like our model to be at least as insightful as these efforts. To build our model we will use existing datasets which will need to be converted to a KG using NLP. This KG would be augmented by existing KGs such as DBpedia (leveraging Semantic Web), or a proprietary solution such as DiffBot. To build an ontology for the fact validation model, we can use a framework like PROVO. We can then combine the ontology and knowledge graph to train a neural network to build and check the provenance chain. To validate our solution, we will compare to baselines such as http://aksw.org/Projects/DeFacto.html to see if it improves results or provides stream or real time validation of facts, or Microsoft’s early detection model claiming to beat existing SOTA.