Anyone engaged in time series forecasting and outlier detection should be aware of change point detection (CPD). This article will dive into CPD to help you understand what change point detection is, how it works, its implications on time series forecasting, and the best methods for tracking CPD.
What is Change Point Detection
Change points can be described as changes between segments. A change point divides a time series into two segments where each segment has its own statistical characteristics (these are mean, variance, etc.).
Change point detection (CPD) is used across a variety of different fields. In medical condition monitoring, for example, CPD helps to monitor the health condition of a patient. For speech recognition, it is used to detect changes in vocal frequency. In weather forecasting, it helps monitor changes in temperature to signal potential storms.
CPD is especially useful for:
- Detecting anomalous sequences/states in a time series
- Detecting the average velocity of unique states in a time series
- Detecting a sudden change in a time series state in real time
Let’s take a closer look at why we actually need CPD.
A good example of CPD is an iWatch when it is monitoring a person’s heart rate. A person runs for a quarter-mile, walks for fifteen minutes, and then runs for another quarter mile. Accordingly, the heart rate data will show a cluster of higher heart rate data, then lower heart rate data, and then again for the higher rate.
The changes in the time series reveal the changes in the person’s physical activity. Data analysts see these changes and can analyze them to get a more complete picture of an individual’s well-being while they are physically active.
In the Intensive Care Unit, the heart rate of individuals is monitored the same way. CPD helps to quickly detect any changes in heart rate and instantly informs medical professionals should the need arise.
Different types of change points
There are four main types of change points. Let’s take a closer look at each of them.
Change in mean
This is the most common type of change point. It is also the easiest to visually identify. It occurs when a time series can be divided into different constant segments with different mean values. The Cumsum algorithm is the earliest one for detecting changes in mean. This algorithm is applied for quality control in manufacturing.
Change in variance
With a change in variance, the mean value of the signal remains constant. However, several other segments will show different variance values. One can detect a change in the mean and variance by comparing the statistical properties of the signal.
Change in pattern
Changes in pattern can occur in, for example, electrocardiogram signals. One way to detect them is to use Wasserstein distances between empirical distributions. At this point, it becomes evident that change point detection is related to anomaly detection.
Change in periodicity (change in frequency)
Change in periodicity, or frequency, is defined by a change in time series with cyclic properties (for example, a machine’s regime). The change in periodicity occurs when the frequency changes suddenly. This kind of change can be detected in the frequency domain.
There are many other types of change points, depending on the underlying structure of the signal. Usually, the more complex the signal, the more difficult it is to detect the change point.
How to Detect Change Points
The methods for change points detection are practically infinite. Several packages for this have been implemented in R and Python. Most of the packages provide hyperparameters that are helpful to optimize change point detection. Still, many packages are not standard. Some of them are able to calculate the costs but cannot identify real change points. Others are just not well-maintained.
Some of the most popular and most well-established and maintained examples of packages are:
- In R, the following packages are dedicated to change point detection: changepoint, kcpRS, or bcp.
- In Python, the ruptures packages are completely dedicated to change point detection. Other packages such as prophet, luminaire, and scikit-multiflow include, among other features, change point or drift detection.
The most common method for change point detection is the sliding window method. The basic idea is to walk through a signal with a fixed size window. For each step, the function calculates the probability of having a change point in the current window. This is called the cost function. For each signal point, we get a cost value which indicates whether there is a change at this point or not. Typically, costs are “low” if there are no changes in the window and “high” if such changes occur. For instance, if the costs exceed a predefined threshold, the point is marked as a change point.
To detect changes in the mean, you can use the standard deviation such as a cost function. If the signal is constant, the standard deviation is low. If there is a jump in the signal, the standard deviation will rise accordingly.
The window approach can have various extensions. There may be two windows, past and future. You can then determine the point of change by comparing the cost of the two windows. This idea was also used to test the generalized log-likelihood ratio.
The question of how to choose the right change point detection method is key and depends on many factors. Since there are many approaches and methods, we’re gone through some of the most popular factors for coming to a reasonable conclusion.
However, in order to avoid confusion and find the best option for change point detection, you’ll need the help of professionals. Unicsoft is always here to take you through all the innovative and relevant technological developments.