HDF5

ZhuYuanxiang 2023-02-14 11:47:31
Categories: Tags:

HDF5 数据文件简介

HDF5 (Hierarchical Data Format) 由美国伊利诺伊大学厄巴纳-香槟分校 UIUC (University of Illinois at Urbana-Champaign) 开发,是一种常见的跨平台数据储存文件,可以存储不同类型的图像和数码数据,并且可以在不同类型的机器上传输,同时还有统一处理这种文件格式的函数库。

HDF5 结构

HDF5 文件一般以 .h5 或者 .hdf5 作为后缀名,需要专门的软件才能打开预览文件的内容。HDF5 文件结构中有 2 个主要对象:GroupsDatasets

每个数据集(dataset)可以分成两部分: 原始数据值(raw data values)元数据(metadata) (用于描述原始数据,给出其信息的数据集合)

1
2
3
4
5
6
7
8
+-- Dataset
| +-- (Raw) Data Values (eg: a 4 x 5 x 6 matrix)
| +-- Metadata
| | +-- Dataspace (eg: Rank = 3, Dimensions = {4, 5, 6})
| | +-- Datatype (eg: Integer)
| | +-- Properties (eg: Chuncked, Compressed)
| | +-- Attributes (eg: attr1 = 32.4, attr2 = "hello", ...)
|

从上面的结构中可以看出Metadata:

整个 HDF5 文件的结构如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
+-- /
| +-- group_1
| | +-- dataset_1_1
| | | +-- attribute_1_1_1
| | | +-- attribute_1_1_2
| | | +-- ...
| | |
| | +-- dataset_1_2
| | | +-- attribute_1_2_1
| | | +-- attribute_1_2_2
| | | +-- ...
| | |
| | +-- ...
| |
| +-- group_2
| | +-- dataset_2_1
| | | +-- attribute_2_1_1
| | | +-- attribute_2_1_2
| | | +-- ...
| | |
| | +-- dataset_2_2
| | | +-- attribute_2_2_1
| | | +-- attribute_2_2_2
| | | +-- ...
| | |
| | +-- ...
| |
| +-- ...
|

HDF5 下载与安装

下载安装完成后可以在终端使用 h5dump 命令查看 HDF5 文件的内容。官网同时提供一个 JAVA 开发的 HDF5 数据可视化工具 **HDFView**,支持全平台查看数据, 但是注意打开文件的路径中不要包含中文。

Python 读写 HDF5 文件

HDF5pythonh5py 调用起来比较简单,我在这给出一个简单的例子:

/HDF5/h5py_example.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
#!/usr/bin/python
# -*- coding: UTF-8 -*-
#
# Created by WW on Jan. 26, 2020
# All rights reserved.
#

import h5py
import numpy as np

def main():
#===========================================================================
# Create a HDF5 file.
f = h5py.File("h5py_example.hdf5", "w") # mode = {'w', 'r', 'a'}

# Create two groups under root '/'.
g1 = f.create_group("bar1")
g2 = f.create_group("bar2")

# Create a dataset under root '/'.
d = f.create_dataset("dset", data=np.arange(16).reshape([4, 4]))

# Add two attributes to dataset 'dset'
d.attrs["myAttr1"] = [100, 200]
d.attrs["myAttr2"] = "Hello, world!"

# Create a group and a dataset under group "bar1".
c1 = g1.create_group("car1")
d1 = g1.create_dataset("dset1", data=np.arange(10))

# Create a group and a dataset under group "bar2".
c2 = g2.create_group("car2")
d2 = g2.create_dataset("dset2", data=np.arange(10))

# Save and exit the file.
f.close()

''' h5py_example.hdf5 file structure
+-- '/'
| +-- group "bar1"
| | +-- group "car1"
| | | +-- None
| | |
| | +-- dataset "dset1"
| |
| +-- group "bar2"
| | +-- group "car2"
| | | +-- None
| | |
| | +-- dataset "dset2"
| |
| +-- dataset "dset"
| | +-- attribute "myAttr1"
| | +-- attribute "myAttr2"
| |
|
'''

#===========================================================================
# Read HDF5 file.
f = h5py.File("h5py_example.hdf5", "r") # mode = {'w', 'r', 'a'}

# Print the keys of groups and datasets under '/'.
print(f.filename, ":")
print([key for key in f.keys()], "\n")

#===================================================
# Read dataset 'dset' under '/'.
d = f["dset"]

# Print the data of 'dset'.
print(d.name, ":")
print(d[:])

# Print the attributes of dataset 'dset'.
for key in d.attrs.keys():
print(key, ":", d.attrs[key])

print()

#===================================================
# Read group 'bar1'.
g = f["bar1"]

# Print the keys of groups and datasets under group 'bar1'.
print([key for key in g.keys()])

# Three methods to print the data of 'dset1'.
print(f["/bar1/dset1"][:]) # 1. absolute path

print(f["bar1"]["dset1"][:]) # 2. relative path: file[][]

print(g['dset1'][:]) # 3. relative path: group[]



# Delete a database.
# Notice: the mode should be 'a' when you read a file.
'''
del g["dset1"]
'''

# Save and exit the file
f.close()

if __name__ == "__main__":
main()

C++ 读写 HDF5 文件

C++ 读写 HDF5 文件比较复杂,参考官网给出的 Examples,下面给出一个创建 HDF5 文件的例子和一个读写 HDF5 文件的例子:

  1. /HDF5/CPP/h5cpp_creating.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright © 2020 Wei Wang. *
* Created by WW on 2020/01/26. *
* All rights reserved. *
* *
* This example illustrates how to create a dataset that is a 4 x 6 array. *
* Reference: HDF5 Tutorial (https://portal.hdfgroup.org/display/HDF5/HDF5) *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

//
// h5cpp_creating.cpp
// CPP
//

#include <iostream>
#include <string>
#include "H5Cpp.h"

#ifndef _H5_NO_NAMESPACE_
using namespace H5;
#ifndef _H5_NO_STD_
using std::cout;
using std::endl;
#endif /* _H5_NO_STD_ */
#endif /* _H5_NO_NAMESPACE_ */

#define PI 3.1415926535

/*
* Define the names of HDF5 file, groups, datasets, and attributes.
* Use H5::H5std_string for name strings.
*/
const H5std_string FILE_NAME("h5cpp_example.hdf5");
const H5std_string GROUP_NAME("group1");
const H5std_string DATASET_NAME("dset");
const H5std_string ATTR_NAME1("myAttr1");
const H5std_string ATTR_NAME2("myAttr2");

const int DIM0 = 4; // dataset dimensions
const int DIM1 = 6;
const int RANK = 2;

int main (int argc, char **argv)
{
// Try block to detect exceptions raised by any of the calls inside it.
try
{
/*
* Turn off the auto-printing when failure occurs so that we can
* handle the errors appropriately.
*/
Exception::dontPrint();

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

double data[DIM0][DIM1]; // buffer for data to write

for (int i = 0; i < DIM0; i++)
for (int j = 0; j < DIM1; j++)
data[i][j] = (i + 1) * PI + j;


// Create a new file using the default property lists.
// H5::H5F_ACC_TRUNC : create a new file or overwrite an existing file.
H5File file(FILE_NAME, H5F_ACC_TRUNC);

// Create a group under root '/'.
Group group(file.createGroup(GROUP_NAME));


// Use H5::hsize_t (similar to int) for dimensions.
hsize_t dims[RANK]; // dataset dimensions
dims[0] = DIM0;
dims[1] = DIM1;

// Create the dataspace for a dataset first.
DataSpace dataspace(RANK, dims);

// Create the dataset under group with specified dataspace.
DataSet dataset = group.createDataSet(DATASET_NAME, PredType::NATIVE_DOUBLE, dataspace);

// Write data in buffer to dataset.
dataset.write(data, PredType::NATIVE_DOUBLE);

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

int attr1_data[2] = {100, 200}; // buffer for attribute data to wirte

hsize_t attr1_dims[1] = {2}; // attribute dimension, rank = 1

// Create the dataspace for an attribute first.
DataSpace attr1_dataspace(1, attr1_dims); // rank = 1

// Create the attribute of dataset with specified dataspace.
Attribute attribute1 = dataset.createAttribute(ATTR_NAME1, PredType::STD_I32BE, attr1_dataspace);

// Write data in buffer to attribute.
attribute1.write(PredType::NATIVE_INT, attr1_data);

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

/* String Data */

char attr2_data[30]; // buffer for attribute data to wirte
sprintf(attr2_data, "hello, world!\nAuthor: Wei Wang");

hsize_t attr2_dims[1] = {30}; // attribute dimension, rank = 1

// Create the dataspace for an attribute first.
DataSpace attr2_dataspace(1, attr2_dims); // rank = 1

// Create the attribute of dataset with specified dataspace.
Attribute attribute2 = dataset.createAttribute(ATTR_NAME2, PredType::NATIVE_CHAR, attr2_dataspace);

// Write data in buffer to attribute.
attribute2.write(PredType::NATIVE_CHAR, attr2_data);

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

// Save and exit the group.
group.close();
// Save and exit the file.
file.close();

/* h5cpp_example.hdf5 file structure
* +-- '/'
* | +-- group 'group1'
* | | +-- dataset 'dset'
* | | | +-- attribute 'myAttr1'
* | | | +-- attribute 'myAttr2'
* | | |
* | |
* |
*/

} // end of try block


// Catch failure caused by the H5File operations.
catch(FileIException error)
{
error.printErrorStack();
return -1;
}

// Catch failure caused by the DataSet operations.
catch(DataSetIException error)
{
error.printErrorStack();
return -1;
}

// Catch failure caused by the DataSpace operations.
catch(DataSpaceIException error)
{
error.printErrorStack();
return -1;
}

return 0; // successfully terminated

}

\2. /HDF5/CPP/h5cpp_reading.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright © 2020 Wei Wang. *
* Created by WW on 2020/01/26. *
* All rights reserved. *
* *
* This example illustrates how to read and edit an existing dataset. *
* Reference: HDF5 Tutorial (https://portal.hdfgroup.org/display/HDF5/HDF5) *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

//
// h5cpp_reading.cpp
// CPP
//

#include <iostream>
#include <string>
#include "H5Cpp.h"

#ifndef _H5_NO_NAMESPACE_
using namespace H5;
#ifndef _H5_NO_STD_
using std::cout;
using std::endl;
#endif /* _H5_NO_STD_ */
#endif /* _H5_NO_NAMESPACE_ */

/*
* Define the names of HDF5 file, groups, datasets, and attributes.
* Use H5::H5std_string for name strings.
*/
const H5std_string FILE_NAME("h5cpp_example.hdf5");
const H5std_string GROUP_NAME("group1");
const H5std_string DATASET_NAME("dset");
const H5std_string ATTR_NAME("myAttr2");

int main (int argc, char **argv)
{

// Try block to detect exceptions raised by any of the calls inside it.
try
{
/*
* Turn off the auto-printing when failure occurs so that we can
* handle the errors appropriately
*/
Exception::dontPrint();

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

/* HOW TO DELETING A DATASET! */

/*

// Open an existing file.
// H5::H5F_ACC_RDWR : read or edit an existing file.
H5File file_d(FILE_NAME, H5F_ACC_RDWR);

// Open an existing group.
Group group_d = file_d.openGroup(GROUP_NAME);

// Use H5::H5Ldelete to delete an existing dataset.
int result = H5Ldelete(group_d.getId(), DATASET_NAME.c_str(), H5P_DEFAULT);
// String.c_str() convert "string" to "const char *".

cout << result << endl;
// Non-negtive: successfully delete;
// Otherwise: fail.

// Save and exit the group.
group_d.close();
// Save and exit the file.
file_d.close();
// Important! The two close()s above can't be omitted!
// Otherwise, the deleting behavior won't be saved to file.

*/

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

// Open an existing file.
// H5::H5F_ACC_RDWR : read or edit an existing file.
H5File file(FILE_NAME, H5F_ACC_RDWR);

// Open an existing group of the file.
Group group = file.openGroup(GROUP_NAME);

// Open an existing dataset of the group.
DataSet dataset = group.openDataSet(DATASET_NAME);

// Get the dataspace of the dataset.
DataSpace filespace = dataset.getSpace();

// Get the rank of the dataset.
int rank = filespace.getSimpleExtentNdims();

// Use H5::hsize_t (similar to int) for dimensions
hsize_t dims[rank]; // dataset dimensions

// Get the dimensions of the dataset.
rank = filespace.getSimpleExtentDims(dims);

cout << DATASET_NAME << " rank = " << rank << ", dimensions "
<< dims[0] << " x "
<< dims[1] << endl;

// Dataspace for data read from file.
DataSpace myspace(rank, dims);

double data_out[dims[0]][dims[1]]; // buffer for data read from file

// Read data from file to buffer.
dataset.read(data_out, PredType::NATIVE_DOUBLE, myspace, filespace);

for (int i = 0; i < dims[0]; i++)
{
for (int j = 0; j < dims[1]; j++)
cout << data_out[i][j] << " ";
cout << endl;
}

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
// Read the attribute of the dataset.
cout << endl;

// Open an existing attribute of the dataset.
Attribute attr = dataset.openAttribute(ATTR_NAME);

// Get the dataspace of the attribute.
DataSpace attr_space = attr.getSpace();

// Get the rank of the attribute.
int attr_rank = attr_space.getSimpleExtentNdims();

// Use H5::hsize_t (similar to int) for dimensions.
hsize_t attr_dims[attr_rank]; // attribute dimensions

// Get the dimension of the attribute.
attr_rank = attr_space.getSimpleExtentDims(attr_dims);

cout << ATTR_NAME << " rank = " << attr_rank << ", dimensions " << attr_dims[0] << endl;

char attr_data_out[attr_dims[0]]; // buffer for attribute data read from file

// Read attribute data from file to buffer.
attr.read(PredType::NATIVE_CHAR, attr_data_out);

cout << attr_data_out << endl;

/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

// Save and exit the group.
group.close();
// Save and exit the file.
file.close();

} // end of try block

// Catch failure caused by the H5File operations.
catch(FileIException error)
{
error.printErrorStack();
return -1;
}

// Catch failure caused by the DataSet operations.
catch(DataSetIException error)
{
error.printErrorStack();
return -1;
}

// Catch failure caused by the DataSpace operations.
catch(DataSpaceIException error)
{
error.printErrorStack();
return -1;
}

return 0; // successfully terminated

}

总结

更多高级 API s (Application Program Interface) 的调用,如 Subset, Hyperslab, Chunk , Compress, Single-Writer/Multiple-Reader (SWMR), Parallel HDF5 (即 HDF5 MPI - Message Passing Interface 并行读写) 以及 Virtual Dataset (VDS) 等,可以查阅官网的 Documentation

除了储存数码数据,HDF5 文件还可以用于存储图像、PDF文件,甚至 Excel 文件。