Package org.apache.lucene.util.automaton
Class Automata
java.lang.Object
org.apache.lucene.util.automaton.Automata
Construction of basic automata.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intmakeStringUnion(Iterable)limits terms of this max length to ensure the stack doesn't overflow while building, since our algorithm currently relies on recursion. -
Method Summary
Modifier and TypeMethodDescriptionstatic intappendAnyChar(Automaton a, int state) Accept any single character starting from the specified state, returning the new statestatic intappendChar(Automaton a, int state, int c) Appends the specified character to the specified state, returning a new state.static AutomatonReturns a new (deterministic) automaton that accepts all binary terms.static AutomatonReturns a new (deterministic) automaton that accepts any single codepoint.static AutomatonReturns a new (deterministic) automaton that accepts all strings.static AutomatonmakeBinary(BytesRef term) Returns a new (deterministic) automaton that accepts the single given binary term.static AutomatonmakeBinaryInterval(BytesRef min, boolean minInclusive, BytesRef max, boolean maxInclusive) Creates a new deterministic, minimal automaton accepting all binary terms in the specified interval.static AutomatonmakeBinaryStringUnion(Iterable<BytesRef> utf8Strings) Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRefs representing UTF-8 encoded strings.static AutomatonmakeBinaryStringUnion(BytesRefIterator utf8Strings) Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRefs representing UTF-8 encoded strings.static AutomatonmakeChar(int c) Returns a new (deterministic) automaton that accepts a single codepoint of the given value.static AutomatonmakeCharClass(int[] starts, int[] ends) Returns a new minimal automaton that accepts any of the codepoint rangesstatic AutomatonmakeCharRange(int min, int max) Returns a new (deterministic) automaton that accepts a single codepoint whose value is in the given interval (including both end points).static AutomatonmakeCharSet(int[] codepoints) Returns a new minimal automaton that accepts any of the provided codepointsstatic AutomatonmakeDecimalInterval(int min, int max, int digits) Returns a new automaton that accepts strings representing decimal (base 10) non-negative integers in the given interval.static AutomatonReturns a new (deterministic) automaton with the empty language.static AutomatonReturns a new (deterministic) automaton that accepts only the empty string.static AutomatonReturns a new (deterministic) automaton that accepts all binary terms except the empty string.static AutomatonmakeString(int[] word, int offset, int length) Returns a new (deterministic) automaton that accepts the single given string from the specified unicode code points.static AutomatonmakeString(String s) Returns a new (deterministic) automaton that accepts the single given string.static AutomatonmakeStringUnion(Iterable<BytesRef> utf8Strings) Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRefs representing UTF-8 encoded strings.static AutomatonmakeStringUnion(BytesRefIterator utf8Strings) Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRefs representing UTF-8 encoded strings.
-
Field Details
-
MAX_STRING_UNION_TERM_LENGTH
public static final int MAX_STRING_UNION_TERM_LENGTHmakeStringUnion(Iterable)limits terms of this max length to ensure the stack doesn't overflow while building, since our algorithm currently relies on recursion.- See Also:
-
-
Method Details
-
makeEmpty
Returns a new (deterministic) automaton with the empty language. -
makeEmptyString
Returns a new (deterministic) automaton that accepts only the empty string. -
makeAnyString
Returns a new (deterministic) automaton that accepts all strings. -
makeAnyBinary
Returns a new (deterministic) automaton that accepts all binary terms. -
makeNonEmptyBinary
Returns a new (deterministic) automaton that accepts all binary terms except the empty string. -
makeAnyChar
Returns a new (deterministic) automaton that accepts any single codepoint. -
appendAnyChar
Accept any single character starting from the specified state, returning the new state -
makeChar
Returns a new (deterministic) automaton that accepts a single codepoint of the given value. -
appendChar
Appends the specified character to the specified state, returning a new state. -
makeCharRange
Returns a new (deterministic) automaton that accepts a single codepoint whose value is in the given interval (including both end points). -
makeCharSet
Returns a new minimal automaton that accepts any of the provided codepoints -
makeCharClass
Returns a new minimal automaton that accepts any of the codepoint ranges -
makeBinaryInterval
public static Automaton makeBinaryInterval(BytesRef min, boolean minInclusive, BytesRef max, boolean maxInclusive) Creates a new deterministic, minimal automaton accepting all binary terms in the specified interval. Note that unlikemakeDecimalInterval(int, int, int), the returned automaton is infinite, because terms behave like floating point numbers leading with a decimal point. However, in the special case where min == max, and both are inclusive, the automata will be finite and accept exactly one term. -
makeDecimalInterval
public static Automaton makeDecimalInterval(int min, int max, int digits) throws IllegalArgumentException Returns a new automaton that accepts strings representing decimal (base 10) non-negative integers in the given interval.- Parameters:
min- minimal value of intervalmax- maximal value of interval (both end points are included in the interval)digits- if > 0, use fixed number of digits (strings must be prefixed by 0's to obtain the right length) - otherwise, the number of digits is not fixed (any number of leading 0s is accepted)- Throws:
IllegalArgumentException- if min > max or if numbers in the interval cannot be expressed with the given fixed number of digits
-
makeString
Returns a new (deterministic) automaton that accepts the single given string. -
makeBinary
Returns a new (deterministic) automaton that accepts the single given binary term. -
makeString
Returns a new (deterministic) automaton that accepts the single given string from the specified unicode code points. -
makeStringUnion
Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRefs representing UTF-8 encoded strings.- Parameters:
utf8Strings- The input strings, UTF-8 encoded. The collection must be in sorted order.- Returns:
- An
Automatonaccepting all input strings. The resulting automaton is codepoint based (full unicode codepoints on transitions).
-
makeBinaryStringUnion
Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRefs representing UTF-8 encoded strings. The resulting automaton will be built in a binary representation.- Parameters:
utf8Strings- The input strings, UTF-8 encoded. The collection must be in sorted order.- Returns:
- An
Automatonaccepting all input strings. The resulting automaton is binary based (UTF-8 encoded byte transition labels).
-
makeStringUnion
Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRefs representing UTF-8 encoded strings.- Parameters:
utf8Strings- The input strings, UTF-8 encoded. The iterator must be in sorted order.- Returns:
- An
Automatonaccepting all input strings. The resulting automaton is codepoint based (full unicode codepoints on transitions). - Throws:
IOException
-
makeBinaryStringUnion
Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRefs representing UTF-8 encoded strings. The resulting automaton will be built in a binary representation.- Parameters:
utf8Strings- The input strings, UTF-8 encoded. The iterator must be in sorted order.- Returns:
- An
Automatonaccepting all input strings. The resulting automaton is binary based (UTF-8 encoded byte transition labels). - Throws:
IOException
-