HDF5#

Reading/writing from/to |HDF5| files is supported with format='hdf5' (this requires |h5py| to be installed). However, the .hdf5 file extension is automatically recognized when writing files, and HDF5 files are automatically identified (even with a different extension) when reading in (using the first few bytes of the file to identify the format), so in most cases you will not need to explicitly specify format='hdf5'.

Since HDF5 files can contain multiple tables, the full path to the table should be specified via the path= argument when reading and writing.

Examples#

To read a table called data from an HDF5 file named observations.hdf5, you can do:

>>> from astropy.table import QTable
>>> t = QTable.read('observations.hdf5', path='data')

To read a table nested in a group in the HDF5 file, you can do:

>>> t = QTable.read('observations.hdf5', path='group/data')

To write a table to a new file, the path should also be specified:

>>> t.write('new_file.hdf5', path='updated_data')

It is also possible to write a table to an existing file using append=True:

>>> t.write('observations.hdf5', path='updated_data', append=True)

As with other formats, the overwrite=True argument is supported for overwriting existing files. To overwrite only a single table within an HDF5 file that has multiple datasets, use both the overwrite=True and append=True arguments.

Finally, when writing to HDF5 files, the compression= argument can be used to ensure that the data is compressed on disk:

>>> t.write('new_file.hdf5', path='updated_data', compression=True)

Metadata and Mixin Columns#

astropy tables can contain metadata, both in the table meta attribute (which is an ordered dictionary of arbitrary key/value pairs), and within the columns, which each have attributes unit, format, description, and meta.

By default, when writing a table to HDF5 the code will attempt to store each key/value pair within the table meta as HDF5 attributes of the table dataset. This will fail if the values within meta are not objects that can be stored as HDF5 attributes. In addition, if the table columns being stored have defined values for any of the above-listed column attributes, these metadata will not be stored and a warning will be issued.

serialize_meta#

To enable storing all table and column metadata to the HDF5 file, call the write() method with serialize_meta=True. This will store metadata in a separate HDF5 dataset, contained in the same file, which is named <path>.__table_column_meta__. Here path is the argument provided in the call to write():

>>> t.write('observations.hdf5', path='data', serialize_meta=True)

The table metadata are stored as a dataset of strings by serializing the metadata in YAML following the ECSV header format definition. Since there are YAML parsers for most common languages, one can easily access and use the table metadata if reading the HDF5 in a non-astropy application.

By specifying serialize_meta=True one can also store to HDF5 tables that contain Mixin Columns such as Time or SkyCoord columns.

Note

Certain kind of metadata (e.g., numpy object arrays) cannot be serialized correctly using YAML.