read_csv#
- astropy.io.misc.pyarrow.csv.read_csv(input_file: PathLike[str] | str | BinaryIO, *, delimiter: str = ',', quotechar: str | Literal[False] = '"', doublequote: bool = True, escapechar: str | bool = False, header_start: int | None = 0, data_start: int | None = None, names: list[str] | None = None, include_names: list[str] | None = None, dtypes: dict[str, dtype[Any] | None | type[Any] | _SupportsDType[dtype[Any]] | str | tuple[Any, int] | tuple[Any, SupportsIndex | Sequence[SupportsIndex]] | list[Any] | _DTypeDict | tuple[Any, Any]] | None = None, comment: str | None = None, null_values: list[str] | None = None, encoding: str = 'utf-8', newlines_in_values: bool = False, timestamp_parsers: list[str] | None = None) Table [source]#
Read a CSV file into an astropy Table using PyArrow.
This function allows highly performant reading of text CSV files into an astropy
Table
using PyArrow. The best performance is achieved for files with only numeric data types, but even for files with mixed data types, the performance is still better than the the standardastropy.io.ascii
fast CSV reader.By default, empty values (zero-length string “”) in the CSV file are read as masked values in the Table. This can be changed by using the
null_values
parameter to specify a list of strings to interpret as null (masked) values.Entirely empty lines in the CSV file are ignored.
Columns consisting of only string values
True
andFalse
are parsed as boolean data.Columns with ISO 8601 date/time strings are parsed as shown below: -
12:13:14.123456
:object[datetime.time]
-2025-01-01
:np.datetime64[D]
-2025-01-01T01:02:03
:np.datetime64[s]
-2025-01-01T01:02:03.123456
:np.datetime64[ns]
Support for ignoring comment lines in the CSV file is provided by the
comment
parameter. If this is set to a string, any line starting with optional whitespace and then this string is ignored. This is done by reading the entire file and scanning for comment lines. If the comment lines are all at the beginning of the file and bothheader_start
anddata_start
are not specified, then the file is read efficiently by settingheader_start
to the first line after the comments. Otherwise the entire file is read into memory and the comment lines are removed before passing to the PyArrow CSV reader. Any values ofheader_start
anddata_start
apply to the lines counts after the comment lines have been removed.- Parameters:
- input_file
python:str
, python:path-like object, or python:file-like object File path or binary file-like object to read from.
- delimiter1-character
python:str
, optional (default “,”) Character delimiting individual cells in the CSV data.
- quotechar1-character
python:str
orpython:False
, optional (default ‘”’) Character used optionally for quoting CSV values (
False
if quoting is not allowed).- doublequotebool, optional (default
True
) Whether two quotes in a quoted CSV value denote a single quote in the data.
- escapechar1-character
python:str
orFalse
, optional (defaultFalse
) Character used optionally for escaping special characters (
False
if escaping is not allowed).- header_start
python:int
,python:None
, optional (default 0) Line index for the header line with column names. If
None
this implies that there is no header line and the column names are taken fromnames
or generated automatically (“f0”, “f1”, …).- data_start
python:int
,python:None
, optional (defaultpython:None
) Line index for the start of data. If
None
, then data starts one line after the header line, or on the first line if there is no header.- names
python:list
,python:None
, optional (defaultpython:None
) List of names for input data columns when there is no header line. If supplied, then
header_start
must beNone
.- include_names
python:list
,python:None
, optional (defaultpython:None
) List of column names to include in output. If
None
, all columns are included.- dtypes
python:dict
[python:str
,Any
],python:None
, optional (defaultpython:None
) If provided, this is a dictionary of data types for output columns. Each key is a column name and the value is either a PyArrow data type or a data type specifier that is accepted as an argument to
numpy.dtype
. Examples includepyarrow.Int32()
,pyarrow.time32("s")
,int
,np.float32
,np.dtype('f4')
or"float32"
. Default is to infer the data types.- comment1-character
python:str
orpython:None
, optional (defaultpython:None
) Character used to indicate the start of a comment. Any line starting with optional whitespace and then this character is ignored. Using this option will cause the parser to be slower and potentially use more memory as it uses Python code to strip comments.
- input_file
- Returns:
astropy.table.Table
An astropy Table containing the data from the CSV file.
- Other Parameters:
- null_values
python:list
, optional (defaultpython:None
) List of strings to interpret as null values. By default, only empty strings are considered as null values (equivalent to
null_values=[""]
). Set to[]
to disable null value handling.- encoding: str, optional (default ‘utf-8’)
Encoding of the input data.
- newlines_in_values: bool, optional (default False)
Whether newline characters are allowed in CSV values. Setting this to True reduces the performance of multi-threaded CSV reading.
- timestamp_parsers: list, optional
A sequence of strptime()-compatible format strings, tried in order when attempting to infer or convert timestamp values. The default is the special value
pyarrow.csv.ISO8601
uses the optimized internal ISO8601 parser.
- null_values