Class RegExp
- java.lang.Object
-
- org.apache.lucene.util.automaton.RegExp
-
public class RegExp extends Object
Regular Expression extension toAutomaton.Regular expressions are built from the following abstract syntax:
description of regular expression grammar regexp ::= unionexp | unionexp ::= interexp |unionexp(union) | interexp interexp ::= concatexp &interexp(intersection) [OPTIONAL] | concatexp concatexp ::= repeatexp concatexp (concatenation) | repeatexp repeatexp ::= repeatexp ?(zero or one occurrence) | repeatexp *(zero or more occurrences) | repeatexp +(one or more occurrences) | repeatexp {n}( noccurrences)| repeatexp {n,}( nor more occurrences)| repeatexp {n,m}( ntomoccurrences, including both)| complexp complexp ::= ~complexp(complement) [OPTIONAL] | charclassexp charclassexp ::= [charclasses](character class) | [^charclasses](negated character class) | simpleexp charclasses ::= charclass charclasses | charclass charclass ::= charexp -charexp(character range, including end-points) | charexp simpleexp ::= charexp | .(any single character) | #(the empty language) [OPTIONAL] | @(any string) [OPTIONAL] | "<Unicode string without double-quotes>"(a string) | ()(the empty string) | (unionexp)(precedence override) | <<identifier>>(named automaton) [OPTIONAL] | <n-m>(numerical interval) [OPTIONAL] charexp ::= <Unicode character> (a single non-reserved character) | \d(a digit [0-9]) | \D(a non-digit [^0-9]) | \s(whitespace [ \t\n\r]) | \S(non whitespace [^\s]) | \w(a word character [a-zA-Z_0-9]) | \W(a non word character [^\w]) | \<Unicode character>(a single character) The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the
RegExpconstructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and ifnandmhave the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's).- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classRegExp.KindThe type of expression represented by a RegExp node.
-
Field Summary
Fields Modifier and Type Field Description static intALLSyntax flag, enables all optional regexp syntax.static intANYSTRINGSyntax flag, enables anystring (@).static intASCII_CASE_INSENSITIVEAllows case insensitive matching of ASCII characters.static intAUTOMATONSyntax flag, enables named automata (<identifier>).intcCharacter expressionstatic intCOMPLEMENTSyntax flag, enables complement (~).intdigitsLimits for repeatable type expressionsstatic intEMPTYSyntax flag, enables empty language (#).RegExpexp1Child expressions held by a container type expressionRegExpexp2Child expressions held by a container type expressionintfromExtents for range type expressionsstatic intINTERSECTIONSyntax flag, enables intersection (&).static intINTERVALSyntax flag, enables numerical intervals (<n-m>).RegExp.KindkindThe type of expressionintmaxLimits for repeatable type expressionsintminLimits for repeatable type expressionsstatic intNONESyntax flag, enables no optional regexp syntax.StringsString expressioninttoExtents for range type expressions
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Set<String>getIdentifiers()Returns set of automaton identifiers that occur in this regular expression.StringgetOriginalString()The string that was used to construct the regex.AutomatontoAutomaton()Constructs newAutomatonfrom thisRegExp.AutomatontoAutomaton(int determinizeWorkLimit)Constructs newAutomatonfrom thisRegExp.AutomatontoAutomaton(Map<String,Automaton> automata, int determinizeWorkLimit)Constructs newAutomatonfrom thisRegExp.AutomatontoAutomaton(AutomatonProvider automaton_provider, int determinizeWorkLimit)Constructs newAutomatonfrom thisRegExp.StringtoString()Constructs string from parsed regular expression.StringtoStringTree()Like to string, but more verbose (shows the higherchy more clearly).
-
-
-
Field Detail
-
INTERSECTION
public static final int INTERSECTION
Syntax flag, enables intersection (&).- See Also:
- Constant Field Values
-
COMPLEMENT
public static final int COMPLEMENT
Syntax flag, enables complement (~).- See Also:
- Constant Field Values
-
EMPTY
public static final int EMPTY
Syntax flag, enables empty language (#).- See Also:
- Constant Field Values
-
ANYSTRING
public static final int ANYSTRING
Syntax flag, enables anystring (@).- See Also:
- Constant Field Values
-
AUTOMATON
public static final int AUTOMATON
Syntax flag, enables named automata (<identifier>).- See Also:
- Constant Field Values
-
INTERVAL
public static final int INTERVAL
Syntax flag, enables numerical intervals (<n-m>).- See Also:
- Constant Field Values
-
ALL
public static final int ALL
Syntax flag, enables all optional regexp syntax.- See Also:
- Constant Field Values
-
NONE
public static final int NONE
Syntax flag, enables no optional regexp syntax.- See Also:
- Constant Field Values
-
ASCII_CASE_INSENSITIVE
public static final int ASCII_CASE_INSENSITIVE
Allows case insensitive matching of ASCII characters.- See Also:
- Constant Field Values
-
kind
public final RegExp.Kind kind
The type of expression
-
exp1
public final RegExp exp1
Child expressions held by a container type expression
-
exp2
public final RegExp exp2
Child expressions held by a container type expression
-
s
public final String s
String expression
-
c
public final int c
Character expression
-
min
public final int min
Limits for repeatable type expressions
-
max
public final int max
Limits for repeatable type expressions
-
digits
public final int digits
Limits for repeatable type expressions
-
from
public final int from
Extents for range type expressions
-
to
public final int to
Extents for range type expressions
-
-
Constructor Detail
-
RegExp
public RegExp(String s) throws IllegalArgumentException
Constructs newRegExpfrom a string. Same asRegExp(s, ALL).- Parameters:
s- regexp string- Throws:
IllegalArgumentException- if an error occurred while parsing the regular expression
-
RegExp
public RegExp(String s, int syntax_flags) throws IllegalArgumentException
Constructs newRegExpfrom a string.- Parameters:
s- regexp stringsyntax_flags- boolean 'or' of optional syntax constructs to be enabled- Throws:
IllegalArgumentException- if an error occurred while parsing the regular expression
-
RegExp
public RegExp(String s, int syntax_flags, int match_flags) throws IllegalArgumentException
Constructs newRegExpfrom a string.- Parameters:
s- regexp stringsyntax_flags- boolean 'or' of optional syntax constructs to be enabledmatch_flags- boolean 'or' of match behavior options such as case insensitivity- Throws:
IllegalArgumentException- if an error occurred while parsing the regular expression
-
-
Method Detail
-
toAutomaton
public Automaton toAutomaton()
Constructs newAutomatonfrom thisRegExp. Same astoAutomaton(null)(empty automaton map).
-
toAutomaton
public Automaton toAutomaton(int determinizeWorkLimit) throws IllegalArgumentException, TooComplexToDeterminizeException
Constructs newAutomatonfrom thisRegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.- Parameters:
determinizeWorkLimit- maximum effort to spend while determinizing the automata. If determinizing the automata would require more than this effort, TooComplexToDeterminizeException is thrown. Higher numbers require more space but can process more complex regexes. UseOperations.DEFAULT_DETERMINIZE_WORK_LIMITas a decent default if you don't otherwise know what to specify.- Throws:
IllegalArgumentException- if this regular expression uses a named identifier that is not available from the automaton providerTooComplexToDeterminizeException- if determinizing this regexp requires more effort than determinizeWorkLimit states
-
toAutomaton
public Automaton toAutomaton(AutomatonProvider automaton_provider, int determinizeWorkLimit) throws IllegalArgumentException, TooComplexToDeterminizeException
Constructs newAutomatonfrom thisRegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.- Parameters:
automaton_provider- provider of automata for named identifiersdeterminizeWorkLimit- maximum effort to spend while determinizing the automata. If determinizing the automata would require more than this effort, TooComplexToDeterminizeException is thrown. Higher numbers require more space but can process more complex regexes. UseOperations.DEFAULT_DETERMINIZE_WORK_LIMITas a decent default if you don't otherwise know what to specify.- Throws:
IllegalArgumentException- if this regular expression uses a named identifier that is not available from the automaton providerTooComplexToDeterminizeException- if determinizing this regexp requires more effort than determinizeWorkLimit states
-
toAutomaton
public Automaton toAutomaton(Map<String,Automaton> automata, int determinizeWorkLimit) throws IllegalArgumentException, TooComplexToDeterminizeException
Constructs newAutomatonfrom thisRegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.- Parameters:
automata- a map from automaton identifiers to automata (of typeAutomaton).determinizeWorkLimit- maximum effort to spend while determinizing the automata. If determinizing the automata would require more than this effort, TooComplexToDeterminizeException is thrown. Higher numbers require more space but can process more complex regexes.- Throws:
IllegalArgumentException- if this regular expression uses a named identifier that does not occur in the automaton mapTooComplexToDeterminizeException- if determinizing this regexp requires more effort than determinizeWorkLimit states
-
getOriginalString
public String getOriginalString()
The string that was used to construct the regex. Compare to toString.
-
toString
public String toString()
Constructs string from parsed regular expression.
-
toStringTree
public String toStringTree()
Like to string, but more verbose (shows the higherchy more clearly).
-
-