Using capture groups in regular expressions

Using regular expressions, it is easy to extract parts which match a particular pattern from a string. Suppose the input string contains some text, followed by a number, possibly followed by some garbage which is of no further interest:

"Hello World12345Garbage"

To extract the leading text part (“Hello World”) and the number (“12345”), the following regular expression can be used:

(\D*)(\d*)

\D is a special sequence inside a regular expression which matches any non-numeric characters. \d is the opposite and matches any numeric character. The * specifies that we want to match any number of the preceeding character, so \D* matches any number (including none) of non-digits, while \d* matches any number of digits. Finally, the parentheses define so-called capture groups. They group the various parts of the pattern so that these groups can later be accessed. Since we want to match text (non-digits) followed by a number (digits), the capture groups are (\D*) for the text part and (\d*) for the numeric part. See https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for a complete list of the supported regular expression syntax. The following code shows a complete example, using the Pattern and the Matcher classes. Note that we need to escape the backslash with another backslash in the regular expression string, so that the java compiler actually inserts a backslash character, and does not treat it as an escape sequence. We can then access the two groups using Matcher.group(int group):

String input = "Hello World12345Garbage";
String regexp = "(\\D*)(\\d*)";

Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.find();
System.out.println("Text  : " + matcher.group(1));
System.out.println("Number: " + matcher.group(2));

Output:

Text  : Hello World
Number: 12345 

Accessing the capture groups by index can be misleading and difficult to maintain, especially for more complex regular expressions. Starting with Java 7, the API also supports named capture groups. A capture group can be given a name by adding ?<name> directly after the opening paranthesis:

(?<text>\D*)(?<number>\d*)

Then, the capture groups can be accessed by their name instead of their index, using Matcher.group(String name):

String input = "Hello World12345Garbage";

String regexp = "(?<text>\\D*)(?<number>\\d*)";

Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.find();
System.out.println("Text  : " + matcher.group("text"));
System.out.println("Number: " + matcher.group("number"));

Output:

Text  : Hello World
Number: 12345