machine_learningヘッダファイルパッケージで用いている計算式

4. 交差エントロピー誤差\(E\)のモデルパラメータ\(W_{j,i}^{(m)}\)による微分

(Formula used in machine_learning header file package; 4. Derivative of the cross entropy error \(E\) with respect to model parameters \(W_{j,i}^{(m)}\))

前節までで以下の式が得られた。
The formula below were obtained in the previous sections.

2節の(1)式
Eq. (1) of section 2:
\[\begin{equation} E=-\frac{1}{N}\sum_{n=0}^{N-1}\sum_{i=0}^{J^{(M+1)}-1} t_i^{(n)}\ln x_i^{(M+1,n)} \label{eq.E} \end{equation}\]
2節の(2)式
Eq. (2) of section 2:
\[\begin{eqnarray} y_j^{(m,n)} &=& \sum_{i=0}^{J^{(m)}-1}W_{j,i}^{(m)}x_i^{(m,n)}+W_{j,J^{(m)}}^{(m)} \nonumber \\ & & \left(m=0,\cdots,M; n=0,\cdots,N-1; j=0,\cdots,J^{(m+1)}-1\right) \label{eq.x2y} \end{eqnarray}\]
2節の(3)式
Eq. (3) of section 2:
\[\begin{eqnarray} x_j^{(m+1,n)} &=& f_j^{(m)}\left(y_0^{(m,n)},\cdots,y_{J^{(m+1)}-1}^{(m,n)}\right) \nonumber \\ & & \left(m=0,\cdots,M; n=0,\cdots,N-1; j=0,\cdots,J^{(m+1)}-1\right) \label{eq.y2x} \end{eqnarray}\]
3.3節の(3)式
Eq. (3) of section 3.3:
\[\begin{equation} W_{J^{(M+1)}-1,i}^{(M)}=-\sum_{j=0}^{J^{(M+1)}-2}W_{j,i}^{(M)} \hspace{2em} (i=0,\cdots,J^{(M)}) \label{eq.W.constraint} \end{equation}\]

交差エントロピー誤差の定義式(\ref{eq.E})において、 \(t_i^{(n)}\)は与える教師データそのものであるのでモデルパラメータに依存しない。一方、\(x_{i}^{(M+1,n)}\)は (\ref{eq.x2y})(\ref{eq.y2x})式を繰り返し用いて計算される値であり、モデルパラメータ\(W_{j,i}^{(m)}\) (\(m=0,\cdots,M\); \(i=0,\cdots,J^{(m)}\); \(j=0,\cdots,J^{(m+1)}-1\)) に依存する。ある特定の\(m\)に注目したとき、 \(x_{i}^{(0,n)}\)から出発して(\ref{eq.x2y})(\ref{eq.y2x})式を繰り返し用いて \(y_{i}^{(0,n)}\), \(x_{i}^{(1,n)}\), \(y_{i}^{(1,n)}\), \(x_{i}^{(2,n)}\), \(\cdots\), と順に計算する一連の過程の中で最初に\(W_{j,i}^{(m)}\)が現れるのが \(x_i^{(m,n)}\)から\(y_i^{(m,n)}\)への変換時であり、 (\ref{eq.x2y})式より \[\begin{eqnarray} y_{j’}^{(m,n)} &=& \sum_{i’=0}^{J^{(m)}-1} W_{j’,i’}^{(m)}x_{i’}^{(m,n)} +W_{j’,J^{(m)}}^{(m)} \nonumber \\ & & \left(m=0,\cdots,M; n=0,\cdots,N-1; j’=0,\cdots,J^{(m+1)}-1\right) \label{eq.x2y.dash} \end{eqnarray}\] と書ける。以下では(\ref{eq.x2y.dash})式を出発点として交差エントロピー誤差の微分を考えるが、 \(W_{J^{(M+1)}-1,i}^{(M)}\)が(\ref{eq.W.constraint})式を通じて \(W_{j,i}^{(M)}\) \((j=0,\cdots,J^{(M+1)}-2)\)に依存するので \(m<M\)の場合と\(m=M\)の場合とで異なる扱いが必要になる。
In the definition of the cross entropy error (Eq. \ref{eq.E}), \(t_i^{(n)}\) is independent of the model parameters because it is the teaching data ifself. In contrast, \(x_{i}^{(M+1,n)}\) depends on the model parameters \(W_{j,i}^{(m)}\) (\(m=0,\cdots,M\); \(i=0,\cdots,J^{(m)}\); \(j=0,\cdots,J^{(m+1)}-1\)) as it is calculated by repeatedly using Eqs. (\ref{eq.x2y}) and (\ref{eq.y2x}). For a given \(m\), the parameters \(W_{j,i}^{(m)}\) first appear during the conversion from \(x_i^{(m,n)}\) to \(y_i^{(m,n)}\) in the sequence of computing \(y_{i}^{(0,n)}\), \(x_{i}^{(1,n)}\), \(y_{i}^{(1,n)}\), \(x_{i}^{(2,n)}\), \(\cdots\), by Eqs. (\ref{eq.x2y}) and (\ref{eq.y2x}) starting from \(x_{i}^{(0,n)}\). We start with Eq. (\ref{eq.x2y.dash}), obtained by (\ref{eq.x2y}), to consider the derivatives of the cross entropy error. Since \(W_{J^{(M+1)}-1,i}^{(M)}\) is dependent on \(W_{j,i}^{(M)}\) \((j=0,\cdots,J^{(M+1)}-2)\) through Eq. (\ref{eq.W.constraint}), treatments are different between the cases for \(m<M\) and \(m=M\).

4.1. \(m<M\)の場合 (For \(m<M\))
4.2. \(m=M\)の場合 (For \(m=M\))
4.3. 両方の場合をまとめた式 (Unified formula for the both cases)
4.4. \(D_{i’,j}^{(m,n)}\)の計算 (Calculation of \(D_{i’,j}^{(m,n)}\)})
4.5. 計算式の簡単化 (Simpler formula)
4.6. \(f_j^{(m)}\)が\(y_j^{(m,n)}\)のみに依存する場合 (In cases where \(f_j^{(m)}\) depends only on \(y_j^{(m,n)}\))