関数calculate_stddev_remove_outliers マニュアル

(The documentation of function calculate_stddev_remove_outliers)

Last Update: 2025/8/7

◆機能・用途(Purpose)

実数値のリストから大きい方と小さい方の一定割合を除いた上標準偏差(データ数を分母とするもの)を計算する。
Calculate the standard deviation of a list of real numbers, excluding a given fraction in larger and smaller sides; the number of data is used for the denominator.

\(N\)個の実数\(d_1,\cdots,d_N\)を降順に並べ替えたリストを \(\hat{d}_1,\cdots,\hat{d}_N\)とし、大きい方と小さい方からそれぞれ\(N_{cut}\)個を除く場合、標準偏差の計算式は \[\begin{equation} \sigma_d=\sqrt{\frac{1}{N-2N_{cut}} \sum_{i=N_{cut}+1}^{N-N_{cut}}\left(\hat{d}_i-\bar{d}\right)^2} \label{eq.stddev} \end{equation}\] である。ここで\(\bar{d}\)は標準偏差と同じデータを用いて計算した平均値 \[\begin{equation} \bar{d}=\frac{1}{N-2N_{cut}}\sum_{i=N_{cut}+1}^{N-N_{cut}}\hat{d}_i \label{eq.average} \end{equation}\] である。この関数では\(N_{cut}\)を直接指定するのではなく除くデータの割合\(r\)を与え、 \(rN\)を四捨五入した値を\(N_{cut}\)として計算を行う。
Let \(N\) be the number of real numbers, and \(d_1,\cdots,d_N\) be the list of real numbers. Let \(\hat{d}_1,\cdots,\hat{d}_N\) be the list of real numbers sorted in a descending order, and \(N_{cut}\) be the number of data samples removed from each of the largest and smallest sides. Then the standard deviation is given by eq. (\ref{eq.stddev}), where \(\bar{d}\) is the average (eq. \ref{eq.average}) calculated with the same data as the standard deviation. In this function, \(N_{cut}\) is not directly specified. Instead, the ratio, \(r\), of the data samples to be removed from each of the largest and smallest sides is specified, and \(N_{cut}\) is calculated by rounding \(rN\) to the nearest integer.

◆形式(Format)

#include <statistics.h>
inline double calculate_stddev_remove_outliers
(const int N,const double ∗d,const double ratio,const double average)

◆引数(Arguments)

N	データサンプル数\(N\)。 The number of data samples \(N\).
d	実数値\(d_1,\cdots,d_N\)を並べた配列。 An array composed of the real numbers \(d_1,\cdots,d_N\).
ratio	大きい方と小さい方の計算から除外するデータサンプル数の割合\(r\)。大きい方と小さい方からそれぞれ\(r\)の割合ずつ、合計で\(2r\)の割合が除外される。したがって\(0\leq r < 0.5\)とし、かつ除外後に最低1サンプルが残らなければならない (\(N-2N_{cut}\geq 1\))。 Ratio, \(r\), of the number of data samples removed from the largest/smallest sides of the list of the data for the calculation of the standard deviation. Since the data samples of the given ratio are removed from each of the largest/smallest sides, a total of \(2r\) are removed. The value thus must satisfy \(0\leq r < 0.5\), and in addition at least 1 data sample must be remained (i.e., \(N-2N_{cut}\geq 1\)).
average	(\ref{eq.average})式で計算した平均値。関数calculate_average_remove_outliersの戻り値を渡せば良い。 An average calculated with eq. (\ref{eq.average}); use the return value of function calculate_average_remove_outliers for this.

◆戻り値(Return value)

(\ref{eq.stddev})式で計算した標準偏差。
The standard deviation calculated with eq. (\ref{eq.stddev}).

◆使用例(Example)

const int N=10;
const double d[]={1.2,3.4,5.6,7.8,9.1,2.3,4.5,6.7,8.9,0.1};
double a=calculate_average_remove_outliers(N,d,0.2);
double sigma=calculate_stddev_remove_outliers(N,d,0.2,a);

この例では実数値のリストdを大きい順に並べると

9.1
8.9
7.8
6.7
5.6
4.5
3.4
2.3
1.2
0.1

となる。この中から大きい方と小さい方の\(r=0.2\)の割合(すなわち20%) ずつを除くと

7.8
6.7
5.6
4.5
3.4
2.3

となり、その平均値はa=5.05、標準偏差はsigma=1.878608となる。
In this example, the real numbers listed in “d” can be sorted as an ascending order as:

9.1
8.9
7.8
6.7
5.6
4.5
3.4
2.3
1.2
0.1

and removing the ratios of \(r=0.2\) (i.e., 20%) from the largest and smallest sides results in:

7.8
6.7
5.6
4.5
3.4
2.3

The average and standard deviation of this list are a=5.05 and sigma=1.878608, respectively.

◆使用上の注意(Important note)

この関数では第3引数で与えた平均値が正しいかどうかのチェックは行われないので、間違いを防ぐために上の使用例のように必ず関数calculate_average_remove_outliersで得た戻り値をそのまま与えること。
No check is conducted for the average given by the 3rd argument. To avoid wrong calculation, make sure to use the return value of function calculate_average_remove_outliers for the 3rd argument of this function, as the example above.

◆検証(Validation)

上の「使用例」の計算をこの関数を用いて行い、正しい結果(1.878608)が得られることを確認した。
A calculation of the “Example” above using this function yielded a correct result (1.878608).

◆補足(Additional notes)

大きい方と小さい方の\(r\)の割合のサンプルは単に標準偏差の計算から除外されるだけであり、配列dの要素は関数呼び出し前後で変化しない。
The samples of ratio \(r\) in each of the largest and smallest sides are simply not used for calculation of the standard deviation, without changes for the array components of “d”.