Table element#
VOTable files can contain RESOURCE
elements, each of
which may contain one or more TABLE
elements. The TABLE
elements contain the arrays of data.
To get at the TABLE
elements, you can write a loop over the
resources in the VOTABLE
file:
for resource in votable.resources:
for table in resource.tables:
# ... do something with the table ...
pass
However, if the nested structure of the resources is not important,
you can use iter_tables
to
return a flat list of all tables:
for table in votable.iter_tables():
# ... do something with the table ...
pass
Finally, if you expect only one table in the file, it might be most convenient
to use get_first_table
:
table = votable.get_first_table()
Alternatively, there is a convenience method to parse a VOTable file and return the first table all in one step:
from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml")
From a TableElement
object, you can get the data itself
in the array
member variable:
data = table.array
This data is a numpy
record array.
The columns get their names from both the ID
and name
attributes of the FIELD
elements in the VOTABLE
file.
Suppose we had a FIELD
specified as follows:
<FIELD ID="Dec" name="dec_targ" datatype="char" ucd="POS_EQ_DEC_MAIN"
unit="deg">
<DESCRIPTION>
representing the ICRS declination of the center of the image.
</DESCRIPTION>
</FIELD>
Note
The mapping from VOTable name
and ID
attributes to numpy
dtype names
and titles
is highly confusing.
In VOTable, ID
is guaranteed to be unique, but is not
required. name
is not guaranteed to be unique, but is
required.
In numpy
record dtypes, names
are required to be unique and
are required. titles
are not required, and are not required
to be unique.
Therefore, VOTable’s ID
most closely maps to numpy
’s
names
, and VOTable’s name
most closely maps to numpy
’s
titles
. However, in some cases where a VOTable ID
is not
provided, a numpy
name
will be generated based on the VOTable
name
. Unfortunately, VOTable fields do not have an attribute
that is both unique and required, which would be the most
convenient mechanism to uniquely identify a column.
When converting from an astropy.io.votable.tree.TableElement
object to
an astropy.table.Table
object, you can specify whether to give
preference to name
or ID
attributes when naming the
columns. By default, ID
is given preference. To give
name
preference, pass the keyword argument
use_names_over_ids=True
:
>>> votable.get_first_table().to_table(use_names_over_ids=True)
This column of data can be extracted from the record array using:
>>> table.array['dec_targ']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
or equivalently:
>>> table.array['Dec']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
Datatype Mappings#
The datatype specified by a FIELD
element is mapped to a numpy
type according to the following table:
VOTABLE type
NumPy type
boolean
b1
bit
b1
unsignedByte
u1
char (variable length)
O - A
bytes()
object.char (fixed length)
S
unicodeChar (variable length)
O - A
str
objectunicodeChar (fixed length)
U
short
i2
int
i4
long
i8
float
f4
double
f8
floatComplex
c8
doubleComplex
c16
If the field is a fixed-size array, the data is stored as a numpy
fixed-size array.
If the field is a variable-size array (that is, arraysize
contains
a ‘*’), the cell will contain a Python list of numpy
values. Each
value may be either an array or scalar depending on the arraysize
specifier.
Examining Field Types#
To look up more information about a field in a table, you can use the
get_field_by_id
method, which returns
the Field
object with the given ID.
To look up more information about a field:
>>> field = table.get_field_by_id('Dec')
>>> field.datatype
'char'
>>> field.unit
'deg'
Note
Field descriptors should not be mutated. To change the set of
columns, convert the Table to an astropy.table.Table
, make the
changes, and then convert it back.
Building a New Table from Scratch#
It is also possible to build a new table, define some field datatypes, and populate it with data.
To build a new table from a VOTable file:
from astropy.io.votable.tree import VOTableFile, Resource, TableElement, Field
# Create a new VOTable file...
votable = VOTableFile()
# ...with one resource...
resource = Resource()
votable.resources.append(resource)
# ... with one table
table = TableElement(votable)
resource.tables.append(table)
# Define some fields
table.fields.extend([
Field(votable, name="filename", datatype="char", arraysize="*"),
Field(votable, name="matrix", datatype="double", arraysize="2x2")])
# Now, use those field definitions to create the numpy record arrays, with
# the given number of rows
table.create_arrays(2)
# Now table.array can be filled with data
table.array[0] = ('test1.xml', [[1, 0], [0, 1]])
table.array[1] = ('test2.xml', [[0.5, 0.3], [0.2, 0.1]])
# Now write the whole thing to a file.
# Note, we have to use the top-level votable file object
votable.to_xml("new_votable.xml")
Missing Values#
Any value in the table may be “missing”. astropy.io.votable
stores
a numpy
masked array in each TableElement
instance. This behaves like an ordinary numpy
masked array, except
for variable-length fields. For those fields, the datatype of the
column is “object” and another numpy
masked array is stored there.
Therefore, operations on variable-length columns will not work — this
is because variable-length columns are not directly supported
by numpy
masked arrays.