Dr Georgiana Ifrim
What do shapes, sound, motion and covid19 have in common?
They can be described using measurements collectively known as time series data.
Time series data is numeric data that is collected over time and is stored as a sequence of numbers with additional annotations, such as the time-stamp for each data point and possibly an associated class label. For example, a human motion sensor can monitor your movement over time, such as the number of steps you do each hour of the day, or the acceleration of your movement when doing your daily exercise. The data can be easily collected using a mobile phone or a sensor placed on the body. Over the past 2 years, time series databases are the fastest growing type of database according to DB-Engines.
In many applications we need to classify time series data into different classes, to understand for example if a physical exercise is done correctly or not (e.g., prediction into classes correct/incorrect), based on the time series corresponding to the movement of the person.
Recent time series classification methods (a type of machine learning algorithm for time series) can automatically classify time series data with high prediction accuracy, but there is very little work done on how to provide an explanation for the prediction. For example, besides predicting a class, we often need an explanation to guide the user so they can take remedial action, e.g., the hand was lifted too high during the exercise, thus the movement falls into the class “incorrect”.
There is a lot of recent interest in developing explanation methods for machine learning algorithms and the focus so far has been on tabular data where the features have a clear meaning, or image data, where the important concepts are easy to understand by a human user. There is much less work on explanations for time series data.
Additionally, recent studies on explaining time series classifiers have focused almost exclusively on qualitative studies to grasp the usefulness of various explanation methods.
In our work at ML-Labs we focus on explanation methods for time series classification, with the aim to quantitatively assess and rank different explanation methods based on their informativeness. Comparing explanations is a non-trivial task. It is often unclear if the output presented by a given explanation method is at all informative (i.e., relevant for the classification task) and it is also unclear how to compare explanation methods side-by-side. In our work we study saliency-based explanations, a type of explanation which associates each time series point with an explanation weight. The higher the explanation weight, the higher the importance of that point for the classifier. We propose a new model-agnostic approach for quantifying and comparing different saliency-based explanations for time series classification. In our framework, we first extract importance weights for each point in the time series based on the explanation, then use these weights to perturb specific parts of the time series and measure the impact on the classification accuracy. If the explanation is indeed informative, distorting the time series data as guided by the explanation, should affect the classification accuracy more. By this explanation-induced accuracy loss, we can objectively quantify and rank different explanations. We provide a quantitative and qualitative analysis and discussion for a few well-known time series classification methods and datasets.