Customising ViT Tokeniser for Improved Performance and Explainability in Medical Imaging Tasks
The success of transformers in natural language processing (NLP) has led to their application in computer vision and image processing tasks. Although tokenizers are key components in enabling transformers to process data efficiently, existing ones may not be effective for complex medical images. Therefore, this project aims to customize tokenizers for general image use and specifically for medical images, with the goal of enhancing performance and explainability in medical imaging tasks. Tokenizers play a crucial role in preprocessing and encoding input data into a suitable format for transformer architectures. Tokenizers are essential for breaking down raw data, like text or images, into representative tokens. While tokenization techniques have mainly been developed and optimized for natural language processing tasks, adapting them to handle medical images comes with unique challenges due to the complexity and specificity of these types of images. Medical imaging modalities such as radiographs, CT scans, and MRIs possess distinct features that require specialized approaches in tokenization. This research project aims to develop customized tokenizers specifically tailored to medical imaging tasks. These specialized tokenizers will help effectively analyze complex medical images by capturing relevant features and encoding them into tokens suitable for vision transformers.