Introduction#

Extract basic provenance information from VOTable header. The information is described in DataOrigin IVOA note: https://www.ivoa.net/documents/DataOrigin/.

DataOrigin includes both the query information (such as publisher, contact, versions, etc.) and the Dataset origin (such as Creator, bibliographic links, URL, etc.)

This API retrieves Metadata from INFO in VOTable.

Getting Started#

To extract DataOrigin from VOTable

Example: VizieR catalogue J/AJ/167/18

>>> from astropy.io.votable import parse
>>> from astropy.io.votable.dataorigin import extract_data_origin
>>> votable = parse("https://vizier.cds.unistra.fr/viz-bin/conesearch/J/AJ/167/18/table4?RA=265.51&DEC=-22.71&SR=0.1")
>>> data_origin = extract_data_origin(votable)
>>> print(data_origin)  
publisher: CDS
server_software: 7.4.5
service_protocol: ivo://ivoa.net/std/ConeSearch/v1.03
request_date: 2025-03-05T14:18:05
contact: cds-question@unistra.fr
publisher: CDS

ivoid: ivo://cds.vizier/j/aj/167/18
citation: doi:10.26093/cds/vizier.51670018
reference_url: https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18
rights_uri: https://cds.unistra.fr/vizier-org/licences_vizier.html
creator: Hong K.
...

Contents and metadata#

astropy.io.votable.dataorigin.extract_data_origin returns a astropy.io.votable.dataorigin.DataOrigin (class) container which is made of:

  • a astropy.io.votable.dataorigin.QueryOrigin (class) container describing the request. QueryOrigin is considered to be unique for the whole VOTable. It includes metadata like the publisher, the contact, date of execution, query, etc.

  • a list of astropy.io.votable.dataorigin.DatasetOrigin (class) container for each Element having DataOrigin information. DataSetOrigin is a basic provenance of the datasets queried. Each attribute is a list. It includes metadata like authors, ivoid, landing pages, ….

Examples#

Get the (Data Center) publisher and the Creator of the dataset

>>> print(data_origin.query.publisher)
CDS
>>> print(data_origin.origin[0].creator)
['Hong K.']

Other capabilities#

DataOrigin container includes VO Elements:

  • Extract list of astropy.io.votable.tree.Info

    >>> # get DataOrigin with the description of each INFO
    >>> for dataset_origin in data_origin.origin:
    ...    for info in dataset_origin.infos:
    ...        print(f"{info.name}: {info.value} ({info.content})")
    ivoid: ivo://cds.vizier/j/aj/167/18 (IVOID of underlying data collection)
    creator: Hong K. (First author or institution)
    cites: bibcode:2024AJ....167...18H (Article or Data origin sources)
    editor: Astronomical Journal (AAS) (Editor name (article))
    original_date: 2024 (Year of the article publication)
    ...
    
  • Extract tree node astropy.io.votable.tree.Element

The following example extracts the citation from the header (in APA style).

>>> # get the Title retrieved in Element
>>> origin = data_origin.origin[0]
>>> vo_elt = origin.get_votable_element()
>>> title = vo_elt.description if vo_elt else ""
>>> print(f"APA: {','.join(origin.creator)} ({origin.publication_date[0]}). {title} [Dataset]. {data_origin.query.publisher}. {origin.citation[0]}")
APA: Hong K. (2024-11-06). Period variations of 32 contact binaries (Hong+, 2024) [Dataset]. CDS. doi:10.26093/cds/vizier.51670018
  • Add Data Origin INFO into VOTable:

>>> votable = parse("votable.xml")
>>> dataorigin.add_data_origin_info(votable, "query", "Data center name")
>>> dataorigin.add_data_origin_info(votable.resources[0], "creator", "Author name")