Text (CSV, fixed-width, HTML, and specialized)#
The read()
and write()
methods can
be used to read and write text-based table data in a wide variety of supported
formats. In addition to common formats like CSV and fixed-width, the unified interface also supports specialized formats like
LaTeX tables and the AAS MRT
format.
Most of the formats are provided by astropy.io.ascii, which is a flexible and powerful interface for reading and writing text tables. In addition, the interface provides wrappers around select I/O functions in the pandas library for additional flexibility and performance.
Note
For reading large CSV files, the astropy PyArrow CSV reader is a good option to consider since it can be up to 15 times faster than other readers.
Supported Formats#
Character-delimited Formats#
These formats use a character delimiter to separate columns. This is most commonly a comma (CSV) or a whitespace character like space or tab.
Format |
Write |
Suffix |
Description |
---|---|---|---|
ascii |
Yes |
ASCII table in most supported formats (uses guessing) |
|
ascii.basic |
Yes |
|
|
ascii.commented_header |
Yes |
|
|
ascii.csv |
Yes |
.csv |
|
ascii.ecsv |
Yes |
.ecsv |
|
ascii.no_header |
Yes |
|
|
ascii.rdb |
Yes |
.rdb |
|
ascii.tab |
Yes |
|
|
ascii.tdat |
Yes |
.tdat |
|
pandas.csv |
Yes |
|
|
pyarrow.csv |
No |
|
Fixed-width Formats#
These formats use fixed-width columns, where each column has a fixed width in characters. This can be useful for tables that are intended to also be read by humans.
Format |
Write |
Suffix |
Description |
---|---|---|---|
ascii.fixed_width |
Yes |
|
|
ascii.fixed_width_no_header |
Yes |
|
|
ascii.fixed_width_two_line |
Yes |
|
|
pandas.fwf |
No |
|
HTML and JSON Formats#
Format |
Write |
Suffix |
Description |
---|---|---|---|
ascii.html |
Yes |
.html |
|
jsviewer |
Yes |
JavaScript viewer format (write-only) |
|
pandas.html |
Yes |
|
|
pandas.json |
Yes |
|
Specialized Formats#
Format |
Write |
Suffix |
Description |
---|---|---|---|
ascii.aastex |
Yes |
|
|
ascii.cds |
No |
|
|
ascii.daophot |
No |
|
|
ascii.ipac |
Yes |
|
|
ascii.latex |
Yes |
.tex |
|
ascii.mrt |
Yes |
|
|
ascii.qdp |
Yes |
.qdp |
|
ascii.rst |
Yes |
.rst |
|
ascii.sextractor |
No |
|
astropy.io.ascii
#
The astropy.io.ascii sub-package provides read and write support for many different formats, including astronomy-specific formats like AAS Machine-Readable Tables (MRT).
We strongly recommend using the unified interface for reading and writing tables via
the astropy.io.ascii sub-package. This is done by prefixing the
format name with the ascii.
prefix. For example to read a
DAOphot table use:
>>> from astropy.table import Table
>>> t = Table.read('photometry.dat', format='ascii.daophot')
Use format='ascii'
in order read a table and guess the table format by successively
trying most of the available formats in a specific order. This can be slow and is not
recommended for large tables.
>>> t = Table.read('astropy/io/ascii/tests/t/latex1.tex', format='ascii')
>>> print(t)
cola colb colc
---- ---- ----
a 1 2
b 3 4
When writing a table with format='ascii'
the output is a basic
space-delimited file with a single header line containing the
column names.
All additional arguments are passed to the astropy.io.ascii
read()
and write()
functions. Further details are available in the sections on
Parameters for read() and Parameters for write(). For
example, to change the column delimiter and the output format for the colc
column use:
>>> t.write(sys.stdout, format='ascii', delimiter='|', formats={'colc': '%0.2f'})
cola|colb|colc
a|1|2.00
b|3|4.00
Attention
ECSV is recommended
For writing and reading tables to text in a way that fully reproduces the table data,
types, and metadata (i.e., the table will “round-trip”), we highly recommend using
the ECSV Format with format="ascii.ecsv"
. This writes the actual data in a
space- or comma-delimited format that most text table readers can parse, but also
includes metadata encoded in a comment block that allows full reconstruction of the
original columns. This includes support for Mixin Columns (such as
SkyCoord
or Time
) and
Multidimensional Columns.
Pandas#
astropy
Table
supports the ability to read or write tables
using some of the I/O methods
available within pandas. This interface thus provides convenient wrappers to
the following functions / methods:
Format name |
Data Description |
Reader |
Writer |
---|---|---|---|
|
|||
|
|||
|
|||
|
Fixed Width |
Notes:
This is subject to the limitations discussed in Astropy Table and DataFrames.
There is no fixed-width writer in pandas.
Reading HTML requires BeautifulSoup4 and html5lib to be installed.
When reading or writing a table, any keyword arguments apart from the
format
and file name are passed through to pandas, for instance:
>>> t.write('data.csv', format='pandas.csv', sep=' ', header=False)
>>> t2 = Table.read('data.csv', format='pandas.csv', sep=' ', names=['a', 'b', 'c'])
PyArrow CSV#
The pyarrow library provides a highly-performant CSV reader that can be used in
Astropy with Table.read(input_file, format="pyarrow.csv", ...)
. This can by up to 15
times faster and more memory-efficient than the astropy.io.ascii fast
reader or the default pandas.csv
reader. The best performance is achieved for files
with only numeric data types, but even for files with mixed data types, the performance
is still better than the standard astropy.io.ascii fast CSV reader.
This reader uses the read_csv()
function, which in
turn uses the PyArrow CSV reader and
sets the various options to pyarrow.csv.read_csv()
appropriately. The interface is
designed to be similar to the io.ascii read interface
where possible, but there are differences, most notably:
Input can only be a string file name,
pathlib.Path
, or a binary file-like object.Whitespace in string data fields and header column names is preserved.
Use
dtypes
instead ofconverters
to specify the column data types.Use
null_values
instead offill_values
to specify the null (missing) values.No
guess
parameter and no guessing of the table format (e.g.,delimiter
).No
data_end
parameter.No
exclude_names
parameter.Columns consisting of only string values
True
andFalse
are parsed as boolean data.Columns with ISO 8601 date/time strings are parsed as shown below: -
12:13:14.123456
:object[datetime.time]
-2025-01-01
:np.datetime64[D]
-2025-01-01T01:02:03
:np.datetime64[s]
-2025-01-01T01:02:03.123456
:np.datetime64[ns]
Timestamp parsing behavior can be customized with the
timestamp_parsers
parameter.
Using the PyArrow CSV reader directly#
The astropy.io.misc.pyarrow.csv
module also provides the
convert_pa_table_to_astropy_table()
function to
allow converting a pyarrow.Table
to an astropy.table.Table
. This allows using
the PyArrow CSV reader directly
with custom options that are not available in the astropy interface.