Sidebar

Is there list of Java regular expression errors, causes, solutions?

0 votes
2.7K views
asked Oct 16, 2018 by rich-c-2789 (16,180 points)
I was looking for a list that had sample regular expressions that fail
to compile.  Specifically Java regular expressions that cause
PatternSyntaxException's when calling
java.util.regex.Pattern.Compile(String regex). I wanted something that I
could use to quickly analyse the types of syntax errors that might be
encountered and suggestions to correct it.

2 Answers

0 votes

I could not find one so...here is my list.  I was looking for such a list to figure out if the messages returned were useful while implementing the regex search feature in the Qvera interface engine.  I am adding it to the KB because I think it will be useful to others as well.  Sometimes that best way to learn is to learn what not to do. ;-)  The regular expressions in this list are not very useful.  I tried to keep them short to show the cause of the error rather than get lost in the details of some want to be fancy pattern.
Does anyone know of a better list?

RegEx RegEx Hint/Solutions
*J,M.* Dangling meta character '*' near index 0
*J,M.*
^
Leading quantifiers do not make sense.

A character or character class should proceed the quantifier.
or
Escape if looking for literal character.
* Dangling meta character '*' near index 0
*
^
Leading quantifiers do not make sense.

A character or character class should proceed the quantifier.
or
Escape if looking for literal character.
? Dangling meta character '?' near index 0
?
^
Leading quantifiers do not make sense.

A character or character class should proceed the quantifier.
or
Escape if looking for literal character.
+ Dangling meta character '+' near index 0
+
^
Leading quantifiers do not make sense.

A character or character class should proceed the quantifier.
or
Escape if looking for literal character.
\\ Unexpected internal error near index 1
\
^
A back slash is reserve character used to introduce escape, back reference, and quotation constructs.

Follow back slash with intended construct
or
Escape the \ if looking for a back slash
[ Unclosed character class near index 0
[
^
Brackets are reserved to define character classes.

Provide character class details and close the class with ]
or
Escape the [ if looking for an open bracket
( Unclosed group near index 1
(
^
Parentheses are reserved to define capturing, named, and non-capturing groups.

Provide character class details and close the class with ]
or
Escape the [ if looking for an open bracket
{ Illegal repetition
{
Braces are reserved to define quantifier repetitions and to introduce named character classes.

Provide the quantifier repetition detail and close the repetition
or
Provide the character glass name and close the name
or
Escape the { if looking for an open brace
[_-.] Illegal character range near index 3
[_-.]
    ^
The dash is used in a character class to denote a range.

Provide a valid character range
or
If looking for a dash in the character class, try moving the dash to the beginning, the end, or escape it.
[[\\]?*+|{}()@.] Unclosed character class near index 14
[[\]?*+|{}()@.]
                    ^
The character class is not closed.
Possible solutions:
Escape the second open bracket to look for the enclosed special characters
}),({ Unmatched closing ')' near index 0
}),({
^
Close the group and/or the character class.
localhost\\kindness\\example \k is not followed by '<' for named capturing group near index 11
localhost\kindness\example
                ^
Slash k is used to refer to named capturing groups.

Escape the \ with another \ if looking for \k
or
Follow k with less than, the group name and greater than
(?<>) Unknown look-behind group near index 3
(?<>)
     ^
Expected to find the name of a group but nothing was specified.

Add the group name between < and >
\\k<> named capturing group is missing trailing '>' near index 4
\k<>
       ^
Expected to find the name of a group but nothing was specified.

Add the group name between < and >
(?<.>) Unknown look-behind group near index 4
(?<.>)
      ^
Could not find group name called "."

Replace "." with a valid group name
\\k<.> (named capturing group <.> does not exit near index 4
\k<.>
      ^
Could not find group name called "."

Replace "." with a valid group name
(?<a.>) named capturing group is missing trailing '>' near index 4
(?<a.>)
       ^
Found an invalid group name containing a special character.

Replace with a valid group (group name can not contain special characters)
\\k<a.> named capturing group is missing trailing '>' near index 4
\k<a.>
       ^
Found an invalid group name containing a special character.

Replace with a valid group (group name can not contain special characters)
(?<gold>)(?<gold>) Named capturing group <gold> is already defined near index 16
(?<gold>)(?<gold>)
                           ^
The group names must be unique and should only be defined once.

Remove second group name declaration
or
Rename second group name declaration

Continued in Part 2

 

answered Oct 16, 2018 by rich-c-2789 (16,180 points)
edited Oct 17, 2018 by rich-c-2789
0 votes

Part 2: continuation...

.*\\great life\\b.* Illegal/unsupported escape sequence near index 3
.*\great life\b.*
   ^
Found an invalid escape sequence "\g".

Replace "\g" with a valid escape sequence
or
Escape the \ if looking for a back slash
[&&] Bad class syntax near index 2
[&&]
   ^
Found an intersection operator in a character class without a left or right expression.

Add a left and right expression
or
Delete second ampersand if looking for an ampersand
or
Remove brackets if looking for double ampersands
\\p{notLower} Unknown character property name {notLower} near index 11
\p{notLower}
                   ^
Found an invalid character property name "notLower".

Switch to another valid character preperty name like "Upper" or "Lower"
or
Use the character class [^\\p{Lower}]
 
\\p{sc} Unknown character property name {sc} near index 5
\p{sc}
       ^
Found an invalid character property name "sc"

Replace with "Sc" to find unicode currency symbol
\\p{IsBasic_Latin} Unknown character script name {Basic_Latin} near index 16
\p{IsBasic_Latin}
                         ^
Character scripts are identified with the prefix "is" but the script name "Basic_Latin" was invalid.

Use a valid scipt name with "is" like "Latin"
or
Use a valid block name with "in" like "Basic_Latin"
\\p{InLatin} Unknown character block name {Latin} near index 10
\p{InLatin}
               ^
Character blocks are identified with the prefix "in" but the script name "Latin" was invalid.

Use a valid scipt name with "is" like "Latin"
or
Use a valid block name with "in" like "Basic_Latin"
(?<!\\d.+) Look-behind group does not have an obvious maximum length near index 7
(?<!\d.+)
          ^
Found a look-behind group with an unlimited quantifier.

Replace unlimited quantiers like "+" and "*" with limited quantifiers like "?" or "{2,10}" etc.
or
Escape + if looking for a plus sign
(?@) Unknown group type near index 2
(?@)
   ^
Special constructs (named-capturing and non-capturing) start with "(?" and can not be followed by "@" and "$"

If intent was to use a character class to find "?@" then replace perens with brackets
or
Replace with a valid special construct. See list in Pattern JavaDoc
(?a) Unknown inline modifier near index 2
(?a)
   ^
Found an invalid inline modifier (or embedded flag)
Possible solutions:
Use one of the valid flags/modifiers: i, d, m, s, u, x, U
a{1, 1000} Unclosed counted closure near index 4
a{1, 1000}
      ^
Found a space in a repetition quantifier.

Remove space
{0,999999999999} Illegal repetition range near index 15
{0,999999999999}
                            ^
Found a repetition range with an value that is too small or too Large.

Use a smaller number
or
Use positive numbers
or
Use something like {5,} for unbounded upper limit
{1,0} Illegal repetition range near index 4
{1,0}
      ^
Found an invalid range where the max was less than min.

Use a max value greater than the min value like {1,2}
\\0 Illegal octal escape sequence near index 2
\0
  ^
Found an octal escape sequence with invalid value.

Specify a valid octal value after the escape sequence \0 like \009
\\x{110000} Hexadecimal codepoint is too big near index 8
\x{110000}
              ^
The hexadecimal value in the escape sequence is too big.

Use a value in the hexadecimal range 0 to 10FFFF.
\\x{0P} Unclosed hexadecimal escape sequence near index 4
\x{0P}
      ^
Found an invalid hexidecimal digit in a hexidecimal escape sequence.

Use valid hexidecimal digits of 0-9 and A-F to define a hex value
\\x0P Illegal hexadecimal escape sequence near index 3
\x0P
     ^
Found an invalid hexidecimal digit in a hexidecimal escape sequence.

Use valid hexidecimal digits of 0-9 and A-F to define a hex value
\\u110P Illegal Unicode escape sequence near index 5
\\u110P
          ^
Found an invalid hexidecimal digit in a unicode escape sequence.

Use valid hexidecimal digits of 0-9 and A-F to define a hex value
\\p{} Empty character family near index 3
\p{}
    ^
Found a character class where the character property name was blank.

Specify the name for a class, script, block, category, or binary property
\\p{ Unclosed character family near index 3
\p{
    ^
Found a character class where the character property name was blank and not closed.

Specify the name for a class, script, block, category, or binary property
\\c Illegal control escape sequence near index 1
\c
 ^
Found a control escape sequnce without an associated control character.

Specify the control character following "\c" ie ctrl-V = \cv
answered Oct 16, 2018 by rich-c-2789 (16,180 points)
edited Oct 16, 2018 by rich-c-2789
...