関数read_and_decompose_line マニュアル

(The documentation of function read_and_decompose_line)

Last Update: 2021/12/1

◆機能・用途(Purpose)

ファイルから1行を読み込んで列数が正しいかをチェックし、列に分解する。タブで区切られた行の読み込み用。
Read a line from a file, check the number of columns, and decompose the line text to columns, for a line separated by tabs.

◆形式(Format)

#include <file.h>
inline char ∗∗read_and_decompose_line
(FILE ∗fp,const int correct_number_of_column)

◆引数(Arguments)

fp	読み込むファイルのファイルポインタ。 The file pointer of a file to read.
correct_number_of_column	読み込む行の正しい列数。 The correct number of columns for the line to read.

◆戻り値(Return value)

fpの現在の位置から1行読み込んでタブで分解した文字列を並べた配列。配列の先頭要素は第1列の文字列、配列の2番目の要素は第2列の文字列、…、というように並ぶ。ファイルエンドに達した場合はNULLを返す。
An array composed of the strings that are in a line starting at the current position of fp, separated by tabs. The string in the 1st column is inserted to the 1st array component, that in the 2nd column is inserted to the 2nd array component, …. NULL is returned if fp reached the file end.

◆使用例(Example)

FILE ∗fp;
char ∗∗linecontents_decomposed=read_and_decompose_line(fp,3);

◆補足(Additional remarks)

フォーマットが決まっているファイルを読み込む際はうっかりミスを防ぐため、 1行ずつ読み込むこと、および1行の列数をチェックしながら読み込むことが重要である。この関数はそのような読み込み方法を簡単に実現するために作成したものである。
When reading a file of a given format, it is important to read one line by line, with checking the number of columns in each line, to avoid mistakes. This function was developed to realize such a method of reading.

例として次のようなファイルを読み込むことを考える。このファイルには周波数(第1列)、フーリエスペクトルの実部(第2列)、フーリエスペクトルの虚部(第3列)が書き並べてあり、先頭の1行はデータサンプル数を表すものとする。
For example, consider to read a file as below, where the frequencies, real parts, and imaginary parts of a Fourier spectrum are written in the 1st to 3rd columns, respectively. The first line indicates the number of data samples.

1024
0.000000	1.000000	0.000000
0.001000	0.999998	0.002345
0.002000	0.999995	0.006789
0.003000	0.999991	0.000123
•	•	•
•	•	•
•	•	•

このようなデータを読み込む際には「第何行第何列のデータを読み込むのか」を明示せずにファイル内での登場順だけで読み込むのはミスの元である。以下にそのような悪いコードの例を示す。
When reading this data, mistakes would be likely to occur if the data were read based only on the order of the entire file, without explicitly specifying which column of which line to be read. Below, an example of such bad code is shown.

int N=1024;
int n;
double frequency[N],spectrum_r[N],spectrum_i[N];
FILE ∗fp=fopen("data.dat","r");
for(n=0;n<N;n++){
fscanf(fp,"%lf%lf%lf", &(frequency[n]),&(spectrum_r[n]),&(spectrum_i[n]));
}
fclose(fp);

この例では読み込むファイルの1行目にデータサンプル数が書かれていることを忘れており、このプログラムを実行すると読み込みの値が1つずつずれて

周波数1024Hzの成分の実部が0.000000、虚部が1.000000
周波数0.000000Hzの成分の実部が0.001000、虚部が0.999998
周波数0.002345Hzの成分の実部が0.002000、虚部が0.999995
周波数0.006789Hzの成分の実部が0.003000、虚部が0.999991

のような間違ったデータとして読み込んでしまう。ここで重要な点はこのような間違った読み込みを行ってもエラーが起きないことである。そのため間違いに気づかないまま誤った解析を行ってしまう可能性が高い。
In the program above, the 1st line of the data file (the number of samples) is forgodden. Executing this program results in reading the following wrong data, where the locations of the numbers are shifted by one place:

for a frequency of 1024 Hz, the real part is 0.000000 and the imaginary part is 1.000000;
for a frequency of 0.000000 Hz, the real part is 0.001000 and the imaginary part is 0.999998;
for a frequency of 0.002345 Hz, the real part is 0.002000 and the imaginary part is 0.999995; and
for a frequency of 0.006789 Hz, the real part is 0.003000 and the imaginary part is 0.999991.

It should be noted that no error occurs by this wrong code of reading. As a result, it is likely that a wrong analysis is conducted without being aware of it.

そこで関数read_and_decompose_lineの出番である。この関数を用いると上のコードは次のように書き直せる。
Now, let us use the function read_and_decompose_line, by which the code above can be rewritten as below.

int N=1024;
int n;
double frequency[N],spectrum_r[N],spectrum_i[N];
char ∗∗data_tmp;
FILE ∗fp=fopen("data.dat","r");
for(n=0;n<N;n++){
    data_tmp=read_and_decompose_line(fp,3);
    frequency[n]=atof(data_tmp[0]);
    spectrum_r[n]=atof(data_tmp[1]);
    spectrum_i[n]=atof(data_tmp[2]);
    free2(&data_tmp,3);
}
fclose(fp);

このコードにおいても読み込むファイルの1行目にデータサンプル数が書かれていることを忘れている。しかし1行分の中身をdata_tmpという一時変数に読み込んでから変換を行っている。その読み込みの際に列数が3であることのチェックが行われる。実際のデータでは第1行の列数が1であるので読み込み時にエラーが起きてプログラムが停止する。それによってミスに気づくことができる。
In this code, the 1st line of the data file (the number of samples) is forgodden again. However, data in each line is now read into a temporary variable data_tmp before converting to real numbers. When reading, a check is conducted whether the number of columns is 3. Indeed, the first line of the data has only one column, so that an error occurs and the program finishes when that line is read. In this way, users are aware of the mistake.