Causal Machine Learning based on time series data
Time series data, which chronicles a sequence of data points measured at successive points in time, offers invaluable insights across various domains. When causality is involved, understanding the “why” behind changes in time series data becomes essential. Moreover, in many fields of science, learning the causal structure of dynamic systems and time series data is considered an interesting task which plays an important role in scientific discoveries. Estimating the effect of an intervention and identifying the causal relations from the data can be performed via causal inference. Conventional approaches in times series literature are restricted to low-dimension series, linear methods and short horizons. Big data revolution is instead shifting the focus to problems (e.g. issued from the IoT technology and cloud-based applications) characterized by very large dimension, nonlinearity and long forecasting horizon. This research project aims to leverage Machine Learning (ML) to investigate causal relationships and structure within time series datasets, providing deeper insights and better decision-making capabilities.
Unlike cross-sectional data, where each observation is independent, time series data is marked by a temporal dependency. This means that each data point doesn’t stand alone but is often influenced by preceding ones. Such a nature poses a significant challenge, especially when one attempts to detect causality. It becomes a puzzle, discerning whether a change is a genuine outcome of a prior event or just a coincidental fluctuation. As we track multiple variables over time, the data’s complexity grows, plunging us into the so-called ‘curse of dimensionality’. This essentially makes the task of causality detection a Herculean one, as each added dimension can exponentially increase the data’s complexity. And as with any real-world data, time series data comes laden with noise and can display a volatile nature. These factors often make it challenging to separate genuine causal relationships from mere spurious correlations. To add to this complexity, time series data can sometimes change its statistical properties over time, a phenomenon known as ‘non-stationarity’. This dynamic characteristic further complicates the application of many traditional ML algorithms.