machine_learningヘッダファイルパッケージで用いている計算式

5. 入力変数\(x_i^{(0,n)}\)の規格化

(Formula used in machine_learning header file package; 5. Normalization of the input variable \(x_i^{(0,n)}\))

入力変数\(x_i^{(0,n)}\)の範囲を例えば\([-1,1]\)などに揃えるとモデルパラメータの推定がやりやすくなる。この規格化のための計算式について考えよう。教師データにおける規格化前の入力変数を\(x_i^{(0,n;U)}\)と表し、その最小値を\(x_{i,min}^{(0;U)}\)、最大値を\(x_{i,max}^{(0;U)}\)とする (上付き添字の\(U\)はun-normalized(規格化無し)の意)。入力変数を最小値が\(x_{min}^{(0;N)}\)、最大値が\(x_{max}^{(0;N)}\)となるように規格化する (上付き添字の\(N\)はnormalized(規格化有り)の意)。規格化後の変数の値を\(x_i^{(0,n;N)}\)と表すことにする。なお規格化は\(i\)毎に独立に行うものとする。すなわち全ての\(n,i\)に対する\(x_i^{(0,n;N)}\)全体の最小値・最大値が \(x_{min}^{(0;N)}\), \(x_{max}^{(0;N)}\)になるようにするのではなく、どの\(i\)においても\(x_i^{(0,n;N)}\)の\(n\)に関する最小値・最大値が \(x_{min}^{(0;N)}\), \(x_{max}^{(0;N)}\)になるようにする。言い換えれば規格化後の変数\(x_i^{(0,n;N)}\)はどの\(i\)においても共通の範囲 \([x_{min}^{(0;N)},x_{max}^{(0;N)}]\)に入るようにする。このとき\(x_i^{(0,n;U)}\)と\(x_i^{(0,n;N)}\)の間には以下の関係が成り立つ。 \[\begin{eqnarray} x_i^{(0,n;N)} &=& x_{min}^{(0;N)}+\left(x_i^{(0,n;U)}-x_{i,min}^{(0;U)}\right) \frac{x_{max}^{(0;N)}-x_{min}^{(0;N)}} {x_{i,max}^{(0;U)}-x_{i,min}^{(0;U)}} \nonumber \\ & & \left(n=0,\cdots,N-1;i=0,\cdots,J^{(0)}-1\right) \label{eq.x0.normalize} \end{eqnarray}\] The estimation of the model parameters becomes easier by converting the input variables \(x_i^{(0,n)}\) to a unified range of, e.g., \([-1,1]\). Let us consider the formula for this normalization. Let \(x_i^{(0,n;U)}\) be the input variables of the teaching data before the normalization, and let \(x_{i,min}^{(0;U)}\) and \(x_{i,max}^{(0;U)}\) be the minimum and maximum, respectively, of them; the superscript \(U\) means “un-normalized”. Let us normalize \(x_i^{(0,n;U)}\) to a range \([x_{min}^{(0;N)},x_{max}^{(0;N)}]\), where the superscript \(N\) represents “normalized”. Let \(x_i^{(0,n;N)}\) be the variables after the normalization. Note that the normalization is conducted independently for each \(i\). Namely, \(x_{min}^{(0;N)}\) and \(x_{max}^{(0;N)}\) are not defined as the total minimum and maximum of \(x_i^{(0,n;N)}\) for all \(n\) and \(i\) but defined as the minimum and maximum of \(x_i^{(0,n;N)}\) with respect to \(n\) for any \(i\). In other wards, the variables \(x_i^{(0,n;N)}\) after the normalization for any \(i\) are in the same range \([x_{min}^{(0;N)},x_{max}^{(0;N)}]\). Then Eq. (\ref{eq.x0.normalize}) holds between \(x_i^{(0,n;U)}\) and \(x_i^{(0,n;N)}\).

規格化を行うとモデルパラメータ\(W_{j,i}^{(m)}\)も変化する。規格化前の入力変数\(x_i^{(0,n;U)}\)に対応するモデルパラメータを \(W_{j,i}^{(m:U)}\)、規格化後の入力変数\(x_i^{(0,n;N)}\)に対応するモデルパラメータを \(W_{j,i}^{(m:N)}\) とする。これらを用いて入力層の変数の線形結合\(y_j^{(0,n)}\)を表すと以下のようになる。まず、規格化前の変数とモデルパラメータを用いると \[\begin{equation} y_j^{(0,n)} =\sum_{i=0}^{J^{(0)}-1}W_{j,i}^{(0;U)}x_i^{(0,n;U)}+W_{j,J^{(0)}}^{(0;U)} \hspace{1em} \left(n=0,\cdots,N-1; j=0,\cdots,J^{(1)}-1\right) \label{eq.y0.unnormalized} \end{equation}\] である。また規格化後の変数とモデルパラメータを用いると \[\begin{equation} y_j^{(0,n)} =\sum_{i=0}^{J^{(0)}-1}W_{j,i}^{(0;N)}x_i^{(0,n;N)}+W_{j,J^{(0)}}^{(0;N)} \hspace{1em} \left(n=0,\cdots,N-1; j=0,\cdots,J^{(1)}-1\right) \label{eq.y0.normalized} \end{equation}\] と書ける。これらが等しくなるように \(W_{j,i}^{(0:U)}\)と\(W_{j,i}^{(0:N)}\)の関係を定めればそれ以降の変数には影響が及ばずに済む。 (\ref{eq.y0.normalized})に(\ref{eq.x0.normalize})を代入すると \[\begin{eqnarray} y_j^{(0,n)} &=& \sum_{i=0}^{J^{(0)}-1}W_{j,i}^{(0;N)}\left[ x_{min}^{(0;N)}+\left(x_i^{(0,n;U)}-x_{i,min}^{(0;U)}\right) \frac{x_{max}^{(0;N)}-x_{min}^{(0;N)}}{x_{i,max}^{(0;U)}-x_{i,min}^{(0;U)}} \right]+W_{j,J^{(0)}}^{(0;N)} \nonumber \\ &=& \sum_{i=0}^{J^{(0)}-1}W_{j,i}^{(0;N)} \frac{x_{max}^{(0;N)}-x_{min}^{(0;N)}} {x_{i,max}^{(0;U)}-x_{i,min}^{(0;U)}} x_i^{(0,n;U)} \nonumber \\ & & +\sum_{i=0}^{J^{(0)}-1}W_{j,i}^{(0;N)}\left[ x_{min}^{(0;N)}-x_{i,min}^{(0;U)} \frac{x_{max}^{(0;N)}-x_{min}^{(0;N)}} {x_{i,max}^{(0;U)}-x_{i,min}^{(0;U)}} \right]+W_{j,J^{(0)}}^{(0;N)} \nonumber \\ & & \left(n=0,\cdots,N-1; j=0,\cdots,J^{(1)}-1\right) \label{eq.normalized.arranged} \end{eqnarray}\] となり、これが(\ref{eq.y0.unnormalized})と等しくなるようにするには \[\begin{equation} W_{j,i}^{(0;U)}= W_{j,i}^{(0;N)} \frac{x_{max}^{(0;N)}-x_{min}^{(0;N)}}{x_{i,max}^{(0;U)}-x_{i,min}^{(0;U)}} \hspace{1em} \left(i=0,\cdots,J^{(0)}-1;j=0,\cdots,J^{(1)}-1\right) \label{eq.W0.unnormalize} \end{equation}\] \[\begin{eqnarray} W_{j,J^{(0)}}^{(0;U)} &=& \sum_{i=0}^{J^{(0)}-1}W_{j,i}^{(0;N)}\left[ x_{min}^{(0;N)}-x_{i,min}^{(0;U)} \frac{x_{max}^{(0;N)}-x_{min}^{(0;N)}} {x_{i,max}^{(0;U)}-x_{i,min}^{(0;U)}} \right]+W_{j,J^{(0)}}^{(0;N)} \nonumber \\ & & \left(j=0,\cdots,J^{(1)}-1\right) \label{eq.WJ0.unnormalize} \end{eqnarray}\] とすれば良いことが分かる。 \(W_{j,i}^{(0:U)}\)と\(W_{j,i}^{(0:N)}\)の関係をこのように定めれば \(y_j^{(0,n)}\)以降の変数は規格化の影響を受けないので \[\begin{equation} W_{j,i}^{(m;U)}=W_{j,i}^{(m;N)} \hspace{1em} \left(m=1,\cdots,M;i=0,\cdots,J^{(m)};j=0,\cdots,J^{(m+1)}-1\right) \label{eq.W.unnormalize} \end{equation}\] である。
The normalization results in changes in \(W_{j,i}^{(m)}\). Let \(W_{j,i}^{(m:U)}\) and \(W_{j,i}^{(m:N)}\) be the model parameters corresponding to \(x_i^{(0,n;U)}\) (before the normalization) and \(x_i^{(0,n;N)}\) (after the normalization), respectively. The linear combinations of the variables in the input layer (\(y_j^{(0,n)}\)) are then expressed as follows. Using the variables and model parameters before the normalization, \(y_j^{(0,n)}\) values are represented by Eq. (\ref{eq.y0.unnormalized}). Using those after the normalization, \(y_j^{(0,n)}\) values are represented by Eq. (\ref{eq.y0.normalized}). If they are set equal, then the subsequent variables are not affected by the normalization. Inserting Eq. (\ref{eq.x0.normalize}) into (\ref{eq.y0.normalized}) results in (\ref{eq.normalized.arranged}), and requiring that this equation is equivalent to (\ref{eq.y0.unnormalized}), we obtain the relations between \(W_{j,i}^{(0:U)}\) and \(W_{j,i}^{(0:N)}\) expressed by Eqs. (\ref{eq.W0.unnormalize}) and (\ref{eq.WJ0.unnormalize}). Setting the relations between \(W_{j,i}^{(0:U)}\) and \(W_{j,i}^{(0:N)}\), all the variables after \(y_j^{(0,n)}\) are not affected by the normalization; we thus have Eq. (\ref{eq.W.unnormalize}).