Research on Calibration of Deep Learning Models
We work on finding new methods for improving the calibration and hence reliability of deep learning models. We try to address the problem of uncalibrated models, which could be overconfident or underconfident in their predictions, and can lead to unreliable outcomes in real-world applications. We review some existing methods for calibrating deep learning models, such as temperature scaling, label smoothing, Bayesian neural networks, subnetwork ensembling and data augmentation.
 presents a survey of methods for calibrating deep learning models based on three categories: post-hoc methods, in-training methods, and Bayesian methods. It discusses current challenges in calibration research, such as evaluation metrics, uncertainty quantification, adversarial robustness, and out-of-distribution detection.
 compares different methods to improve the confidence calibration of deep learning models, as the agreement between the predicted probability and the true class prevalence. It tests natural image classification and lung cancer risk estimation tasks with balanced and imbalanced training sets using methods such as temperature scaling and label smoothing.
 studies performance of deep learning models for class-imbalanced medical image classification. It investigates the degree of imbalances in the dataset used for training, calibration methods, and two classification thresholds: default threshold of 0.5, and optimal threshold from precision-recall curves. It concludes that at varying degrees of imbalance, at the default classification threshold of 0.5, the performance achieved through calibration is significantly superior to using uncalibrated probabilities. However, at the PR-guided threshold, these gains are not significant.
 investigates how well deep learning models can predict the probabilities of different outcomes for classification problems in mechanics. It compares several methods to improve calibration of deep learning models, such as ensemble averaging and temperature scaling.
 studies effect of combining Ensembles with data augmentation in multi-class image classification problems, and conclude that using subnetwork ensemble with data augmentation improves model calibration and robustness. It also suggests that combining subnetwork ensemble with MixUp or CutMix improves accuracy while not harming model calibration.
Some possible directions for continuing the research are: