UPDATE (June 11, 2011): A complete sample ASP.NET application demonstrating features in this blog post is available here: ASP.NET – A complete translation framework through Powershell and Google Translate

 

This was a rather interesting thing I did in the last week. One of our large ASP.NET applications has been in production successfully for some months now, and the client has felt comfortable enough to start leveraging and deploying the application over larger parts of their corporate intranet.

As part of this process, a request that came in was multi-language support, to enable their employees to use the application in one’s own native language. Sure, this should be easy we said, especially with the extensive framework ASP.NET provides for localization of any application. You might be knowing that you need to place suitably named locale specific Resource (.resx) files in App_LocalResources folder inside each folder of your ASP.NET application that has .aspx/.ascx files and then provide appropriate content in each locale specific resource file.

For the text below, I assume you are aware of ASP.NET’s localization framework and how it uses Resource files together with meta:resourcekey attribute to provide automatic substitution of localized content in the UI.

For us, localization had 2 aspects:

  1. Server-side localization which I said was easy given the out-of-the-box ASP.NET localization framework.
  2. Client-side localization – The application is Ext.Net/ExtJs based and hence executes large parts of its functionality especially related to UI client-side. So, strings on the client-side should be localizable too.

After some deliberation, I proposed to exploit Resource (.resx) files for providing client-side localization too. More on this later, but our immediate headache was to prepare the resource files themselves, and then providing localized content in locale specific resource files.

There were multiple time-consuming operations involved here:

  1. Adding meta:resourcekey attribute to all tags that needed localizable content.
  2. Extracting attributes to the primary resource file (i.e. Default.aspx.resx etc) for all tags.
  3. Creating locale specific resource files from primary resource files and providing localized/translated content in them.

The application was not built with localization in mind initially, and therefore all the above operations meant maybe hundreds of man hours needed, executing all the above operations for >100 .aspx/.ascx files and thousands of tags in these files.

I was really worried at the time it would have taken for such a mundane task, and decided to see if the process can be automated.

I first had a look at the Google Translate’s developer docs to see if it provides API for programatic access. And it did. I saw a ray of hope there.

And I started writing Powershell scripts to see how much of this task can be automated. I took it one step at a time, to automate as much of the translation/localization efforts as possible. Following is a quick description of the scripts I produced and what each script does:

  1. GTranslate.ps1
    My first script attempted at translating contents of an ASP.NET resource file automatically through Google Translate.
    Basically it searched for all locale specific resource files (e.g. App_LocalResources\Default.aspx.fr.resx would be a French locale resource file), then find all resource keys from that file, send resource key values to Google Translate, receive the translated content and update it back to the resource file.
    It took me over 1 day to get this script working and remove kinks here and there in it.
  2. AddMetaTag.ps1
    Having been able to translate any Resource file automatically through Google Translate, the next step was to add meta:resourcekey attribute to all desired tags in .aspx/.ascx files all over the application, so that ASP.NET picks them up for localization during a request.
    I chose 2 simple criteria for adding meta:resourcekey attribute:
    1) Any tag that has an ID attribute should get meta:resourcekey attribute with its value being the same as the value for the ID attribute.
    2) I added an array of additional tag names, so that these tags get an automatically generated meta:resourcekey attribute even if they did not had an ID attribute. This was important because not all server-side tags had ID attributes and because there were non-control tags (e.g. <ext:GridCommand>) for which no ID attribute is present but which had other attributes that needed to be localizable.
    So this script goes all over the application’s .aspx/.ascx files adding meta:resourcekey attribute to the desired tags.

    It took me slightly less than half a day to get the Regular expressions and the logic right for adding the required meta:resourcekey attribute all over the application. 

  3. MetaTagExtractor.ps1
    I was really getting excited at what I was doing and by the fact that it was giving me results.

    This script was also interesting. After AddMetaTag.ps1 had added the required attribute all over the application, the next important step was to search for the tags having this attribute and extract their other attributes (e.g. Text, Title, Tooltip, Width, Height etc) to a resource file enabling these to be localized.

    2-3 hours and this script was ready too.

    Now I had everything ready for testing. I ran these scripts in this order for a particular folder of our application:
    AddMetaTag.ps1, MetaTagExtractor.ps1, GTranslate.ps1

    French was the locale I targeted. It took approximately 5 minutes for these scripts to process nearly 25-30 resource files in the folder, and when I tested the results by setting my browser language to French, I was exhilarated at the results. The UI was showing beautifully in the French language.

    I got the client to check that, and he pointed out that some accented French characters were not displaying properly (I do not know French you know). A quick check through the scripts revealed that the Encoding was not set properly to UTF8 while receiving translated results from Google Translate. A quick fix for the code, followed by another test run of the scripts, the client was as happy as I was.

    And I decided to put it all together and run it in an integrated manner. So, 2 more Powershell scripts followed.

  4. Controller.ps1
    This script controls all the other scripts and executes them in order. So, it searches for the files that need processing and invokes the above 3 scripts in order so that the process effectively reduces to just executing this script from a Powershell prompt.
  5. ResourceFileGenerator.ps1
    The next language we need to support would be Spanish. So, I thought why not another script to generate empty Spanish resource files (e.g. Default.aspx.es.resx) and then have the above scripts process and fill these resource files.

    Therefore this script can create  empty resource files for any number of locales you want.

  6. JsStringExtractor.ps1
    This script was added much later to automate more tasks. The script extracts translatable strings from your javascript files (string wrapped in Rahul.t). e.g.
    alert(Rahul.t(‘Some message’));
    This script would automatically extract ‘Some message’ and add it to a resource file. The GTranslate.ps1 script would then take care of processing and translating your javascript strings too.

Voila, I had my automatic translation/localization framework for ASP.NET ready. And today morning, I decided to release it to the community. You will find all the scripts attached below with this blog post.

To make it all sense for you, here are the precise steps you need to perform to use this framework for yourself:

  1. Ensure you have Powershell 2.0 installed.
  2. Extract the scripts to a folder on your disk.
  3. Open Controller.ps1 in any text editor, and change line 5 to provide the path to root of your ASP.NET application.
  4. Next open ResourceFileGenerator.ps1 and change line 3 to include all locales you need to support.
  5. Then open GTranslate.ps1, and substitute your Google Translate API key in line 3. You can get the same at: https://code.google.com/apis/console/
  6. That’s it, execute Controller.ps1 from Powershell console and see the messages that provide you the progress of the activity and what the scripts are doing.

    For advanced users

  7. Open AddMetaTag.ps1 and change lines 8-10 to include all tags which should be definitely localized no matter they have an ID attribute or not. Please specify the tags in lowercase only together with the tag prefix if any.
  8. Open MetaTagExtractor.ps1 and change $localizableAttributes, $translatableAttributes, and $conditionallyTranslatable arrays/hashes on lines 9, 11 and 15 respectively.
    The comments in the file would help you understand what each of these array/hash is for.
  9. Open Controller.ps1 and change the paths at the bottom of the file if you need automatic localization/translation for a specific folder in your application only and not the whole application. You can also choose whether to process the directories recursively or not. Again comments there will help you.

Now something about javascript translation. This aspect of the automatic localization framework/attached scripts has been influenced by my work with Drupal.

You can place specially named files in any App_LocalResources folder. The files need to end in “.js.resx”, e.g. MyApp.js.resx.
You can put your javascript strings (i.e. strings used in your javascript files) in the name column of this resource file (a sample file is attached with this blog post).

These resource files would also be automatically processed and a file called MyApp.locale.js would be produced by these scripts (e.g. MyApp.fr.js). This file contains the locale specific translations of the strings. Now you need to do the following 4 steps to provide translation for javascript strings:

  1. Put all strings you use in javascript in the name column for this file and execute the Powershell script, Controller.ps1.
  2. Include the locale specific javascript file on your web page (e.g. MyApp.fr.js).
  3. Add the following javascript method to your web page:

    {syntaxhighlighter brush: jscript;fontsize: 100; first-line: 1; }Rahul = {};
    Rahul.t = function(format, args) {
    if (Rahul.locale && Rahul.locale.strings) {
    var temp = Rahul.locale.strings[format];
    if (temp) format = temp;
    }

    if (args) {
    if (!Ext.isArray(args)) {
    args = Ext.toArray(arguments, 1);
    }
    format = format.replace(/\{(\d+)\}/g, function(m, i) {
    return args[i];
    });
    }

    return (format);
    }{/syntaxhighlighter}

  4. Wrap all your javascript strings in Rahul.t().
    e.g. alert(‘Hello World’) becomes alert(Rahul.t(‘Hello World’)).

    For an advanced example, see this:

    alert(Rahul.t(‘Your name is: {0}’, name));

    i.e. you can pass a format and arguments to Rahul.t as you would pass to String.Format in .NET.

And this gives you support for showing localized strings in javascript too.

There are other advanced features of these scripts, that I am skipping detailing for now. e.g. the scripts try to be intelligent, and if you run them again over directories/files that have previously been processed, no duplicate processing would be done. This is even finer. e.g. suppose in a file Default.aspx, you add a new tag/attribute to some tag. Now if you run these scripts again, only the new tags/attributes would be processed, and files would be updated for new additions only. Tags/Attributes that have previously been processed and translated through Google Translate would not be translated again. This prevents redundant requests and saves time after additions/modifications.

Moreover, if you are not happy with a translation provided by Google Translate, feel free to update the Resource file directly (but leave the content in the Comment column of the resource file). Any manual overrides you make would NOT be overwritten the next time you execute these scripts over the same files again.

I really hope people will find these scripts useful. In case of any issues, feel free to use the comment form below to discuss them.

The first zip file attached below contains all the Powershell scripts. The second file is a sample of .resx file you need to prepare for supporting javascript translation (i.e. translating strings in javascript).

UPDATE:

  • Jun 11, 2011
    • Better exception reporting – Exceptions generated during translation are now shown with the message and in red color to make it stand-out of the regular script output.
    • Added JsStringExtractor.ps1 file – This script automatically parses all your javascript files looking for references to Rahul.t (the javascript translation method), extracts the string out of it and automatically creates a resource (.resx) file out of it.
    • Added option to re-process already translated strings. e.g. if you have already run translation scripts once, strings translated automatically would have “Translated” in their comment in resource file which helps to prevent their processing again if you run scripts again. But if you set $retranslate to $true, such strings would be processed again.
      Please note that strings marked “Manual” or “NT” (meaning No Translation) are never processed by the scripts and they are left as is.
    • Bug-fix related to Powershell’s foreach loop.
    • Bug-fix related to attributes with empty values. Such attributes are correctly processed now.
    • App_LocalResources directory is created automatically by scripts now as required.
    • Added ResourceFileSync.ps1. This is just a utility script that synchronizes Resources files between 2 directories recursively. I often use this to sync resource files between development, staging and production copies of the same code.
  • Added a related blog post with a complete sample ASP.NET application demonstrating how to use these scripts:
    ASP.NET – A complete translation framework through Powershell and Google Translate