.Net - Regex pattern for lazy balanced group matching

A couple of days ago, I needed to find all calls to a specific method in a large number of javascript files. The exact method call looked something like following:

Rahul.t('String to be translated.');

If you have been following my blog posts, or have otherwise read my post on Using PowerShell and Google Translate to provide automatic localization in ASP.NET, you would recognize that Rahul.t is a javascript localization method in that post that accepts a string and returns its translated version based on browser's preferred language if the translated string is available, or the original string otherwise.

I wanted to find all invocations to this method over all javascript files in an ASP.NET application to be able to extract string from the method call, so that its translated version can be made available on the page (please read the other blog post if you are interested in knowing how exactly this translation works).

This again was a pretty good candidate to be handled in PowerShell, the only real challenge was to be able to effectively locate all invocations to this method in a file. I was aware of the balancing group extension to Regular Expressions available in .NET, and I decided to try it once instead of the traditional alternative of the string searching in a loop.

Fortunately, MSDN itself provided a pretty good example for matching balanced group definition as follows:

 

string pattern = "^[^<>]*" +
                     "(" + 
                       "((?'Open'<)[^<>]*)+" +
                       "((?'Close-Open'>)[^<>]*)+" +
                     ")*" +
                     "(?(Open)(?!))$";

 

According to MSDN itself, the above example:

demonstrates using a balancing group definition to match left and right angle brackets (<>) in an input string. The capture collections of the Open and Close groups in the example are used like a stack to track matching pairs of angle brackets: each captured left angle bracket is pushed into the capture collection of the Open group; each captured right angle bracket is pushed into the capture collection of the Close group; and the balancing group definition ensures there is a matching right angle bracket for each left angle bracket.

This was preety good and I adapted it to the following to instead match for balanced opening and closing braces:

 

string pattern = "(" + 
                 "((?'Open'\()[^\(\)]*)+" +
                 "((?'Close-Open'\))[^\(\)]*)+" +
                 ")*" +
                 "(?(Open)(?!))$";

However it looked like a greedy match to me, which it actually was when I actually tried it. Greedy match as you might know means that it starts with the first matching character and continues as far as possible till the last matching character that can be matched successfully. This meant that the whole match started at the first Rahul.t method call and ended at the last Rahul.t method call's ending round bracket, which clearly is undesirable.

It took me sometime to figure out a lazy pattern for matching balanced round braces idenifying each of the method calls to Rahul.t. The successfuly pattern turned out to be the following:

 

string pattern = "Rahul.t(" + 
               "((?<Open>\()[^\(\)]*)+" +
               "((?<Close-Open>\))[^\(\)]*?)+" +
             ")+?" +
             "(?(Open)(?!))";

Compared to the above pattern, my pattern removes the ending "$" removing forcing matching at the end of the string. Further it adds a "?" in third and fourth lines of the pattern forcing a lazy match for as few characters as possible while looking for balancing closing braces.

In the end, I thought it was worth the effort for this Regex pattern, as it virtually opens the gate for easily locating and analyzing calls to specific methods in javascript (and probably other languages too), where code editors are not that good in putting a meaning to the code (obviously because of the dynamic nature of javascipt like languages).