read_csv#

astropy.io.misc.pyarrow.csv.read_csv(input_file: PathLike[str] | str | BinaryIO, *, delimiter: str = ',', quotechar: str | Literal[False] = '"', doublequote: bool = True, escapechar: str | bool = False, header_start: int | None = 0, data_start: int | None = None, names: list[str] | None = None, include_names: list[str] | None = None, dtypes: dict[str, dtype[Any] | None | type[Any] | _SupportsDType[dtype[Any]] | str | tuple[Any, int] | tuple[Any, SupportsIndex | Sequence[SupportsIndex]] | list[Any] | _DTypeDict | tuple[Any, Any]] | None = None, comment: str | None = None, null_values: list[str] | None = None, encoding: str = 'utf-8', newlines_in_values: bool = False, timestamp_parsers: list[str] | None = None) Table[source]#

Read a CSV file into an astropy Table using PyArrow.

This function allows highly performant reading of text CSV files into an astropy Table using PyArrow. The best performance is achieved for files with only numeric data types, but even for files with mixed data types, the performance is still better than the the standard astropy.io.ascii fast CSV reader.

By default, empty values (zero-length string “”) in the CSV file are read as masked values in the Table. This can be changed by using the null_values parameter to specify a list of strings to interpret as null (masked) values.

Entirely empty lines in the CSV file are ignored.

Columns consisting of only string values True and False are parsed as boolean data.

Columns with ISO 8601 date/time strings are parsed as shown below: - 12:13:14.123456: object[datetime.time] - 2025-01-01: np.datetime64[D] - 2025-01-01T01:02:03: np.datetime64[s] - 2025-01-01T01:02:03.123456: np.datetime64[ns]

Support for ignoring comment lines in the CSV file is provided by the comment parameter. If this is set to a string, any line starting with optional whitespace and then this string is ignored. This is done by reading the entire file and scanning for comment lines. If the comment lines are all at the beginning of the file and both header_start and data_start are not specified, then the file is read efficiently by setting header_start to the first line after the comments. Otherwise the entire file is read into memory and the comment lines are removed before passing to the PyArrow CSV reader. Any values of header_start and data_start apply to the lines counts after the comment lines have been removed.

Parameters:
input_filepython:str, python:path-like object, or python:file-like object

File path or binary file-like object to read from.

delimiter1-character python:str, optional (default “,”)

Character delimiting individual cells in the CSV data.

quotechar1-character python:str or python:False, optional (default ‘”’)

Character used optionally for quoting CSV values (False if quoting is not allowed).

doublequotebool, optional (default True)

Whether two quotes in a quoted CSV value denote a single quote in the data.

escapechar1-character python:str or False, optional (default False)

Character used optionally for escaping special characters (False if escaping is not allowed).

header_startpython:int, python:None, optional (default 0)

Line index for the header line with column names. If None this implies that there is no header line and the column names are taken from names or generated automatically (“f0”, “f1”, …).

data_startpython:int, python:None, optional (default python:None)

Line index for the start of data. If None, then data starts one line after the header line, or on the first line if there is no header.

namespython:list, python:None, optional (default python:None)

List of names for input data columns when there is no header line. If supplied, then header_start must be None.

include_namespython:list, python:None, optional (default python:None)

List of column names to include in output. If None, all columns are included.

dtypespython:dict[python:str, Any], python:None, optional (default python:None)

If provided, this is a dictionary of data types for output columns. Each key is a column name and the value is either a PyArrow data type or a data type specifier that is accepted as an argument to numpy.dtype. Examples include pyarrow.Int32(), pyarrow.time32("s"), int, np.float32, np.dtype('f4') or "float32". Default is to infer the data types.

comment1-character python:str or python:None, optional (default python:None)

Character used to indicate the start of a comment. Any line starting with optional whitespace and then this character is ignored. Using this option will cause the parser to be slower and potentially use more memory as it uses Python code to strip comments.

Returns:
astropy.table.Table

An astropy Table containing the data from the CSV file.

Other Parameters:
null_valuespython:list, optional (default python:None)

List of strings to interpret as null values. By default, only empty strings are considered as null values (equivalent to null_values=[""]). Set to [] to disable null value handling.

encoding: str, optional (default ‘utf-8’)

Encoding of the input data.

newlines_in_values: bool, optional (default False)

Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.

timestamp_parsers: list, optional

A sequence of strptime()-compatible format strings, tried in order when attempting to infer or convert timestamp values. The default is the special value pyarrow.csv.ISO8601 uses the optimized internal ISO8601 parser.