Class ConcatenateGraphFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.miscellaneous.ConcatenateGraphFilter
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public final class ConcatenateGraphFilter extends TokenStream
Concatenates/Joins every incoming token with a separator into one output token for every path through the token stream (which is a graph). In simple cases this yields one token, but in the presence of any tokens with a zero positionIncrmeent (e.g. synonyms) it will be more. This filter uses the token bytes, position increment, and position length of the incoming stream. Other attributes are not used or manipulated.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceConcatenateGraphFilter.BytesRefBuilderTermAttributeAttribute providing access to the term builder and UTF-16 conversionstatic classConcatenateGraphFilter.BytesRefBuilderTermAttributeImplImplementation ofConcatenateGraphFilter.BytesRefBuilderTermAttribute-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_GRAPH_EXPANSIONSstatic booleanDEFAULT_PRESERVE_POSITION_INCREMENTSstatic booleanDEFAULT_PRESERVE_SEPstatic CharacterDEFAULT_TOKEN_SEPARATORstatic intSEP_LABELRepresents the default separator between tokens.-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description ConcatenateGraphFilter(TokenStream inputTokenStream)Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph.ConcatenateGraphFilter(TokenStream inputTokenStream, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions)ConcatenateGraphFilter(TokenStream inputTokenStream, Character tokenSeparator, boolean preservePositionIncrements, int maxGraphExpansions)Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voidend()booleanincrementToken()voidreset()AutomatontoAutomaton()Converts the tokenStream to an automaton, treating the transition labels as utf-8.AutomatontoAutomaton(boolean unicodeAware)Converts the tokenStream to an automaton.-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Field Detail
-
SEP_LABEL
public static final int SEP_LABEL
Represents the default separator between tokens.- See Also:
- Constant Field Values
-
DEFAULT_MAX_GRAPH_EXPANSIONS
public static final int DEFAULT_MAX_GRAPH_EXPANSIONS
- See Also:
- Constant Field Values
-
DEFAULT_TOKEN_SEPARATOR
public static final Character DEFAULT_TOKEN_SEPARATOR
-
DEFAULT_PRESERVE_SEP
public static final boolean DEFAULT_PRESERVE_SEP
- See Also:
- Constant Field Values
-
DEFAULT_PRESERVE_POSITION_INCREMENTS
public static final boolean DEFAULT_PRESERVE_POSITION_INCREMENTS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream)
Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph.This constructor uses the default settings of the constants in this class.
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream, Character tokenSeparator, boolean preservePositionIncrements, int maxGraphExpansions)
Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph.- Parameters:
inputTokenStream- The input/incoming TokenStreamtokenSeparator- Separator to use for concatenation. Can be null, in this case tokens will be concatenated without any separators.preservePositionIncrements- Whether to add an empty token for missing positions. The effect is a consecutiveSEP_LABEL. When false, it's as if there were no missing positions (we pretend the surrounding tokens were adjacent).maxGraphExpansions- If the tokenStream graph has more than this many possible paths through, then we'll throwTooComplexToDeterminizeExceptionto preserve the stability and memory of the machine.- Throws:
TooComplexToDeterminizeException- if the tokenStream graph has more thanmaxGraphExpansionsexpansions
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions)
CallsConcatenateGraphFilter(org.apache.lucene.analysis.TokenStream, java.lang.Character, boolean, int)- Parameters:
preserveSep- WhetherSEP_LABELshould separate the input tokens in the concatenated token
-
-
Method Detail
-
reset
public void reset() throws IOException- Overrides:
resetin classTokenStream- Throws:
IOException
-
incrementToken
public boolean incrementToken() throws IOException- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
end
public void end() throws IOException- Overrides:
endin classTokenStream- Throws:
IOException
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classTokenStream- Throws:
IOException
-
toAutomaton
public Automaton toAutomaton() throws IOException
Converts the tokenStream to an automaton, treating the transition labels as utf-8. Does *not* close it.- Throws:
IOException
-
toAutomaton
public Automaton toAutomaton(boolean unicodeAware) throws IOException
Converts the tokenStream to an automaton. Does *not* close it.- Throws:
IOException
-
-