Month: May 2014

Including untracked files in new directories in git status output

git status shows the status of the workspace – like which files have been modified or which files are new. For new directories with new files in them, the default behavior is to only show the directory name – but not the contents of the directory:

$ mkdir -p newDir/subDir
$ touch newDir/newFile1 newDir/newFile2 newDir/subDir/newFile3
$ git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       newDir/
nothing added to commit but untracked files present (use "git add" to track)

I usually prefer to also see the contents of the directories – this can be achieved by adding the parameter --untracked-files=all to git status:

$ git status --untracked-files=all
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       newDir/newFile1
#       newDir/newFile2
#       newDir/subDir/newFile3
nothing added to commit but untracked files present (use "git add" to track)

It is a littlebit cumbersome to type this parameter each time, but Git allows to define aliases for commands. Straightforward, we can try to add the following alias to ~/.gitconfig:

[alias]
        status = status --untracked-files=all

Unfortunately, this will not work – Git does not allow to redefine existing commands (see “alias.*” at https://git-scm.com/docs/git-config). Some commands allow to add default parameters in ~/.gitconfig, in a format similar to

[status]
        untracked-files=all

but again, for status, this does not work. The simplest way is to just define a new command alias, like

[alias]
        statusall = status --untracked-files=all

This will not change the default behaviour of the status command (which is probably good, since scripts might rely on the default output format of the command), but it simplifies the above command line:

$ git statusall
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       newDir/newFile1
#       newDir/newFile2
#       newDir/subDir/newFile3
nothing added to commit but untracked files present (use "git add" to track)

Using capture groups in regular expressions

Using regular expressions, it is easy to extract parts which match a particular pattern from a string. Suppose the input string contains some text, followed by a number, possibly followed by some garbage which is of no further interest:

"Hello World12345Garbage"

To extract the leading text part (“Hello World”) and the number (“12345”), the following regular expression can be used:

(\D*)(\d*)

\D is a special sequence inside a regular expression which matches any non-numeric characters. \d is the opposite and matches any numeric character. The * specifies that we want to match any number of the preceeding character, so \D* matches any number (including none) of non-digits, while \d* matches any number of digits. Finally, the parentheses define so-called capture groups. They group the various parts of the pattern so that these groups can later be accessed. Since we want to match text (non-digits) followed by a number (digits), the capture groups are (\D*) for the text part and (\d*) for the numeric part. See https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for a complete list of the supported regular expression syntax. The following code shows a complete example, using the Pattern and the Matcher classes. Note that we need to escape the backslash with another backslash in the regular expression string, so that the java compiler actually inserts a backslash character, and does not treat it as an escape sequence. We can then access the two groups using Matcher.group(int group):

String input = "Hello World12345Garbage";
String regexp = "(\\D*)(\\d*)";

Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.find();
System.out.println("Text  : " + matcher.group(1));
System.out.println("Number: " + matcher.group(2));

Output:

Text  : Hello World
Number: 12345 

Accessing the capture groups by index can be misleading and difficult to maintain, especially for more complex regular expressions. Starting with Java 7, the API also supports named capture groups. A capture group can be given a name by adding ?<name> directly after the opening paranthesis:

(?<text>\D*)(?<number>\d*)

Then, the capture groups can be accessed by their name instead of their index, using Matcher.group(String name):

String input = "Hello World12345Garbage";

String regexp = "(?<text>\\D*)(?<number>\\d*)";

Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.find();
System.out.println("Text  : " + matcher.group("text"));
System.out.println("Number: " + matcher.group("number"));

Output:

Text  : Hello World
Number: 12345