関数sequence_trend_moving_window マニュアル

(The documentation of function sequence_trend_moving_window)

Last Update: 2023/12/21

◆機能・用途(Purpose)

時系列データを直線でフィットしたときの傾きと切片を時間窓を移動させながら計算する。
Calculate the slope and intercept of the straight line that best fits a time series data in a moving time window.

◆形式(Format)

#include <sequence/statistics.h>
inline struct sequence ∗sequence_trend_moving_window
(struct sequence seq,const double length,const char ∗time_ref)

◆引数(Arguments)

seq	計算に用いる時系列データを表す構造体。 A structure to represent the time series data used for the computation.
length	計算に用いる時間窓の長さ。時系列データのサンプリング間隔の整数倍でなければならない。 The length of the time window used for the computation; this value must be an integer multiple of the sampling interval of the time series data.
time_ref	戻り値における時刻の取り方。下記のいずれかを指定する。 The definition of time in the return value. Choose from one of the followings. start 時間窓の先頭の時刻を使用する。 Use the beginning time of the time window. middle 時間窓の中央の時刻を使用する。 Use the central time of the time window. end 時間窓の末尾の時刻を使用する。 Use the end time of the time window.

◆戻り値(Return value)

時系列データの傾きと切片(時間窓を動かすのでどちらも時系列データになる)を並べた配列。切片は時系列データの定義域に関わらず直線の時刻\(t=0\)での値とする。
An array composed of the slope and intercept of the time series data. Each of the slope and intercept is a time series data because of the use of a moving time window. The intercept is defined as the value of the straight line at time \(t=0\) regardless of the definition range of the time series data.

◆使用例(Example)

struct sequence seq;
struct sequence ∗trend =sequence_trend_moving_window(seq,2.0,"middle");

この例ではtrend[0]が傾き、trend[1]が切片の時系列データとなる。これらの時系列データにおける時刻としては時間窓の中央の値が用いられる。
In this example, trend[0] and trend[1] are the time series data that represents the slope and intercept, respectively. The central time of each time window is used for the time in these time series data created.

◆計算方法(Computation method)

時系列データが時刻\(t=t_0,t_1,\cdots,t_{N-1}\)において定義されているものとし、それらの時刻での値を\(f_0,f_1,\cdots,f_{N-1}\)とする。 \(M\)を整数(\(\leq N\))、\(j=0,1,\cdots,N-M\)とし、各\(j\)において時刻\(t_j \leq t\leq t_{j+M-1}\)の範囲の時系列データを最も良く説明する直線を\(f=a_jt+b_j\)とする。最小二乗フィッティングによって傾き\(a_j\)と切片\(b_j\)を求めると \[\begin{eqnarray} a_j &=& \frac{M\sum_{i=j}^{j+M-1}t_if_i -\left(\sum_{i=j}^{j+M-1}t_i\right) \left(\sum_{i=j}^{j+M-1}f_i\right)} {M\sum_{i=j}^{j+M-1}t_i^2 -\left(\sum_{i=j}^{j+M-1}t_i\right)^2} \nonumber \\ &=& \frac{M S_j^{(tf)} -S_j^{(t)}S_j^{(f)}} {M S_j^{(tt)} -\left[S_j^{(t)}\right]^2} \label{eq.a_j} \end{eqnarray}\] \[\begin{eqnarray} b_j &=& \frac{\left(\sum_{i=j}^{j+M-1}t_i^2\right) \left(\sum_{i=j}^{j+M-1}f_i\right) -\left(\sum_{i=j}^{j+M-1}t_i\right) \left(\sum_{i=j}^{j+M-1}t_if_i\right)} {M\sum_{i=j}^{j+M-1}t_i^2 -\left(\sum_{i=j}^{j+M-1}t_i\right)^2} \nonumber \\ &=& \frac{S_j^{(tt)} S_j^{(f)} -S_j^{(t)} S_j^{(tf)}} {M S_j^{(tt)} -\left[S_j^{(t)}\right]^2} \label{eq.b_j} \end{eqnarray}\] となる (関数sequence_trendのマニュアル参照)。ここで \[\begin{equation} S_j^{(t)} \equiv \sum_{i=j}^{j+M-1}t_i \label{eq.S_j_t} \end{equation}\] \[\begin{equation} S_j^{(tt)} \equiv \sum_{i=j}^{j+M-1}t_i^2 \label{eq.S_j_tt} \end{equation}\] \[\begin{equation} S_j^{(f)} \equiv \sum_{i=j}^{j+M-1}f_i \label{eq.S_j_f} \end{equation}\] \[\begin{equation} S_j^{(tf)} \equiv \sum_{i=j}^{j+M-1}t_if_i \label{eq.S_j_tf} \end{equation}\] とおいた。
Let the time series data be defined at time \(t=t_0,t_1,\cdots,t_{N-1}\), and the values of the data at these time samples be \(f_0,f_1,\cdots,f_{N-1}\). Let \(M\) be an integer (\(\leq N\)) and \(j=0,1,\cdots,N-M\). For each \(j\), let the straight line that best explains the time series data in a time range \(t_j \leq t \leq t_{j+M-1}\) be \(f=a_jt+b_j\). A least squares fitting gives the slope \(a_j\) and intercept \(b_j\) as Eqs. (\ref{eq.a_j}) and (\ref{eq.b_j}\), respectively, where \(S_j^{(t)}\), \(S_j^{(tt)}\), \(S_j^{(f)}\), and \(S_j^{(tf)}\) are defined as Eqs. (\ref{eq.S_j_t})-(\ref{eq.S_j_tf}); see the documentation of function sequence_trend for the derivation of this solution.

\(S_j^{(t)}\), \(S_j^{(tt)}\)は和の公式を用いて \[\begin{eqnarray} S_j^{(t)} &=& \sum_{i=j}^{j+M-1}[t_j+(i-j)\Delta t] \nonumber \\ &=& \sum_{i=0}^{M-1}[t_j+i\Delta t] \nonumber \\ &=& Mt_j+\Delta t\sum_{i=0}^{M-1}i \nonumber \\ &=& Mt_j+\Delta t\frac{(M-1)M}{2} \label{eq.S_j_t.formula} \end{eqnarray}\] \[\begin{eqnarray} S_j^{(tt)} &=& \sum_{i=j}^{j+M-1}[t_j+(i-j)\Delta t]^2 \nonumber \\ &=& \sum_{i=0}^{M-1}[t_j+i\Delta t]^2 \nonumber \\ &=& \sum_{i=0}^{M-1}[t_j^2+2t_j i\Delta t+i^2(\Delta t)^2] \nonumber \\ &=& Mt_j^2+2t_j\Delta t\sum_{i=0}^{M-1}i+(\Delta t)^2\sum_{i=0}^{M-1}i^2 \nonumber \\ &=& Mt_j^2+2t_j\Delta t\frac{(M-1)M}{2} +(\Delta t)^2\frac{(M-1)M(2M-1)}{6} \label{eq.S_j_tt.formula} \end{eqnarray}\] と計算できる。ここで\(\Delta t\)は時系列データの時間刻みである。一方、\(S_j^{(f)}\)は漸化式 \[\begin{eqnarray} S_j^{(f)} &=& \sum_{i=j}^{j+M-1}f_i \nonumber \\ &=& \sum_{i=j-1}^{j+M-2}f_i +f_{j+M-1}-f_{j-1} \nonumber \\ &=& S_{j-1}^{(f)}+f_{j+M-1}-f_{j-1} \label{eq.S_j_f.recursive} \end{eqnarray}\] によって計算でき、多数の\(j\)について\(S_j^{(f)}\)を計算する場合には (\ref{eq.S_j_f})式を直接用いるよりも (\ref{eq.S_j_f.recursive})式を用いる方が計算量を削減できる。同様に\(S_j^{(tf)}\)についても漸化式 \[\begin{equation} S_j^{(tf)}=S_{j-1}^{(tf)}+t_{j+M-1}f_{j+M-1}-t_{j-1}f_{j-1} \label{eq.S_j_tf.recursive} \end{equation}\] によって計算できる。
\(S_j^{(t)}\) and \(S_j^{(tt)}\) can be computed using the summation formula of Eqs. (\ref{eq.S_j_t.formula}) and (\ref{eq.S_j_tt.formula}), where \(\Delta t\) is the sampling interval of the time series data. \(S_j^{(f)}\) can be computed using a recursive formula of Eq. (\ref{eq.S_j_f.recursive}), which is more efficient than directly applying Eq. (\ref{eq.S_j_f}) when the computation is repeated for many \(j\). Similarly, \(S_j^{(tf)}\) is computed using a recursive formula of Eq. (\ref{eq.S_j_tf.recursive}).