Injecting Structured Knowledge into Pretrained Language Models
Pre-trained language models such as BERT (Devlin et al., 2018) and XLNet (Yang et al., 2019) have greatly improved the performance of many NLP tasks. These models can capture rich patterns from large-scale corpus and learn good representations for texts. However such models have shortcomings – they underperform on complicated noisy text (Xiong et al., 2019) or texts that need inference and external knowledge to be understood. Niven et al. (2019) found that the reason why BERT performs well on the reasoning task is that it uses spurious statistical cues in the dataset, thus highlighting the limited capability of BERT to truly understand natural language. Liu et al. (2019) proposed to incorporate knowledge graphs into BERT to aid understanding. We build on this work and conjecture that incorporating structured knowledge, such as entity relations or linguistic information, can improve such models’ performance on some NLP tasks.
Specifically, we aim to explore how to inject structured knowledge into large-scale pre-trained models, and we wish to focus on two tasks: Question Answering and Sentiment Analysis. Our main focus will be on the English language, but some other languages such as Chinese will be explored – resources permitting – as well as cross-lingual representations. Two side questions that we also expect to address are 1) how to reduce model size through incorporating structured knowledge, and 2) understanding the role of particular pretraining objectives.