Class OpenNLPTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.SegmentingTokenizerBase
org.apache.lucene.analysis.opennlp.OpenNLPTokenizer
- All Implemented Interfaces:
Closeable,AutoCloseable
Run OpenNLP SentenceDetector and Tokenizer. The index of each sentence is stored in
SentenceAttribute.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State -
Field Summary
Fields inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
buffer, BUFFERMAX, offsetFields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
ConstructorsConstructorDescriptionOpenNLPTokenizer(AttributeFactory factory, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp) -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()protected booleanvoidreset()protected voidsetNextSentence(int sentenceStart, int sentenceEnd) Methods inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
end, incrementToken, isSafeEndMethods inherited from class org.apache.lucene.analysis.Tokenizer
correctOffset, setReader, setReaderTestPointMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Constructor Details
-
OpenNLPTokenizer
public OpenNLPTokenizer(AttributeFactory factory, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp) throws IOException - Throws:
IOException
-
-
Method Details
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classTokenizer- Throws:
IOException
-
setNextSentence
protected void setNextSentence(int sentenceStart, int sentenceEnd) - Specified by:
setNextSentencein classSegmentingTokenizerBase
-
incrementWord
protected boolean incrementWord()- Specified by:
incrementWordin classSegmentingTokenizerBase
-
reset
- Overrides:
resetin classSegmentingTokenizerBase- Throws:
IOException
-