Class Gff3Codec
- All Implemented Interfaces:
FeatureCodec<Gff3Feature, LineIterator>
Codec for parsing Gff3 files, as defined in https://github.com/The-Sequence-Ontology/Specifications/blob/31f62ad469b31769b43af42e0903448db1826925/gff3.md
Note that while spec states that all feature types must be defined in sequence ontology, this implementation makes no check on feature types, and allows any string as feature type
Each feature line in the Gff3 file will be emitted as a separate feature. Features linked together through the "Parent" attribute will be linked through
Gff3Feature.getParents(), Gff3Feature.getChildren(),
Gff3Feature.getAncestors(), Gff3Feature.getDescendents(), amd Gff3Feature.flatten(). This linking is not guaranteed to be comprehensive when the file is read for only features overlapping a particular
region, using a tribble index. In this case, a particular feature will only be linked to the subgroup of features it is linked to in the input file which overlap the given region.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumstatic enumEnum for parsing directive lines. -
Constructor Summary
ConstructorsConstructorDescriptionGff3Codec(Gff3Codec.DecodeDepth decodeDepth) Gff3Codec(Gff3Codec.DecodeDepth decodeDepth, Predicate<String> filterOutAttribute) -
Method Summary
Modifier and TypeMethodDescriptionbooleanThis function returns true iff the File potentialInput can be parsed by this codec.voidclose(LineIterator lineIterator) Adapter method that closes the providedSOURCE.decode(LineIterator lineIterator) decodeLoc(LineIterator lineIterator) Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.Gets map from line number to comment found on that line.Gets list of comments parsed by the codec.Get list of sequence regions parsed by the codec.Define the tabix format for the feature, used for indexing.booleanisDone(LineIterator lineIterator) Adapter method that assesses whether the providedSOURCEhas more data.makeIndexableSourceFromStream(InputStream bufferedInputStream) Return aSOURCEfor thisFeatureCodecthat implementsLocationAware, and is thus suitable for use during indexing.makeSourceFromStream(InputStream bufferedInputStream) Generates a reader of typeSOURCEappropriate for use by this codec from the generic input stream.readHeader(LineIterator lineIterator) Read and return the header, or null if there is no header.Methods inherited from class AbstractFeatureCodec
getFeatureTypeMethods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface FeatureCodec
getPathToDataFile
-
Constructor Details
-
Gff3Codec
public Gff3Codec() -
Gff3Codec
-
Gff3Codec
- Parameters:
decodeDepth- a value from DecodeDepthfilterOutAttribute- filter to remove keys from the EXTRA_FIELDS column
-
-
Method Details
-
decode
Description copied from interface:FeatureCodecDecode a singleFeaturefrom theSOURCE, reading no further in the underlying source than beyond that feature.- Parameters:
lineIterator- the input stream from which to decode the next record- Returns:
- Return the Feature encoded by the line, or null if the line does not represent a feature (e.g. is a comment)
- Throws:
IOException
-
getSequenceRegions
Get list of sequence regions parsed by the codec.- Returns:
- list of sequence regions
-
getCommentsWithLineNumbers
-
getCommentTexts
-
decodeLoc
Description copied from interface:FeatureCodecDecode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.- Specified by:
decodeLocin interfaceFeatureCodec<Gff3Feature, LineIterator>- Overrides:
decodeLocin classAbstractFeatureCodec<Gff3Feature, LineIterator>- Parameters:
lineIterator- the input stream from which to decode the next record- Returns:
- Return the FeatureLoc encoded by the line, or null if the line does not represent a feature (e.g. is a comment)
- Throws:
IOException
-
canDecode
Description copied from interface:FeatureCodecThis function returns true iff the File potentialInput can be parsed by this codec. Note that checking the file's extension is a perfectly acceptable implementation of this method and file contents only rarely need to be checked.
There is an assumption that there's never a situation where two different Codecs return true for the same file. If this occurs, the recommendation would be to error out.
Note this function must never throw an error. All errors should be trapped and false returned.- Parameters:
inputFilePath- the file to test for parsability with this codec- Returns:
- true if potentialInput can be parsed, false otherwise
-
readHeader
Description copied from interface:FeatureCodecRead and return the header, or null if there is no header. Note: Implementers of this method must be careful to read exactly as much fromSOURCEas needed to parse the header, and no more. Otherwise, data that might otherwise be fed into parsing aFeaturemay be lost.- Parameters:
lineIterator- the source from which to decode the header- Returns:
- header object
-
makeSourceFromStream
Description copied from interface:FeatureCodecGenerates a reader of typeSOURCEappropriate for use by this codec from the generic input stream. Implementers should assume the stream is buffered. -
makeIndexableSourceFromStream
Description copied from interface:FeatureCodecReturn aSOURCEfor thisFeatureCodecthat implementsLocationAware, and is thus suitable for use during indexing. LikeFeatureCodec.makeSourceFromStream(java.io.InputStream), except theLocationAwarecompatibility is required for creating indexes. Implementers of this method must return a type that is bothLocationAwareas well asSOURCE. Note that this requirement cannot be enforced via the method signature due to limitations in Java's generic typing system. Instead, consumers should cast the call result into aSOURCEwhen applicable. NOTE: During the indexing process, the indexer passes theSOURCEto the codec to consume Features from the underlyingSOURCE, one at a time, recording the Feature location via theSOURCE'sLocationAwareinterface. Therefore, it is essential that theSOURCEimplementation, theFeatureCodec.readHeader(SOURCE)method, and theFeatureCodec.decodeLoc(SOURCE)method, which are used during indexing, not introduce any buffering that would that would advance theSOURCEmore than a single feature (or the more than the size of the header, in the case ofFeatureCodec.readHeader(SOURCE)). -
isDone
Description copied from interface:FeatureCodecAdapter method that assesses whether the providedSOURCEhas more data. True if it does, false otherwise. -
close
Description copied from interface:FeatureCodecAdapter method that closes the providedSOURCE. -
getTabixFormat
Description copied from interface:FeatureCodecDefine the tabix format for the feature, used for indexing. Default implementation throws an exception. Note that onlyAsciiFeatureCodeccould read tabix files as defined inAbstractFeatureReader.getFeatureReader(String, String, FeatureCodec, boolean, java.util.function.Function, java.util.function.Function)- Returns:
- the format to use with tabix
-