Latest Issue of the Just Great Software Newsletter December 2008Contents
1. Happy Holidays and Best Wishes for 2009
2. Updated: RegexBuddy 3.2.1
3. Updated: TPerlRegEx for Delphi 2009
Happy Holidays and Best Wishes for 2009
2008 has been a bit of a quiet year for Just Great Software. Though we released at least two free minor upgrades for each of our products, we didn't have any major upgrades or new product releases in 2008.
Behind the scenes, however, 2008 has been a busy year. As you may have read on my blog at http://www.regexguru.com/2008/05/writing-offline/ I've been busy co-writing a book on regular expressions. The writing is done, and O'Reilly will publish it in April 2009. I'm not too bashful to say that this will be the most practical book on regular expressions yet.
What I haven't blogged about yet is a brand new tool for working with regular expressions. I first started working on it in 2007. It's been a hard nut to crack, but I'm quite pleased with the result. You'll find out all about it and get the 1.0 release in 2009. Hopefully before April.
I hope that 2008 has been good to you. Don't let the economic recession bite you. We're not letting airport closures and other nonsense get to us ( http://www.phuket.me/2008/12/back-in-phuket/ ). Enjoy the holidays. My best wishes to you and your friends and family. May 2009 be a happy year for you.
Updated: RegexBuddy 3.2.1
RegexBuddy 3.2.1 is now available for download. This free update brings quite a number of fixes and improvements.
This release fixes a serious bug that only occurred on Windows Vista. RegexBuddy would lock up if you edited the regular expression while the last List All/Replace/Split command on the Test tab was still running.
RegexBuddy 3.2.0 improved copying free-spacing regular expressions to the clipboard when using string styles that don't support multi-line strings. Unfortunately, those improvements weren't handled well by the corresponding paste commands. RegexBuddy 3.2.1 fixes this. Basic-style strings that are split across lines now include the _ line continuator.
Many people now to regular expressions tend to escape all punctuation characters that they want to include in their regular expressions, even though only those characters that are metacharacters need to be escaped. This sometimes leads to problems. The underscore is not a metacharacter. Escaping it doesn't hurt, unless you're using a POSIX or GNU flavor or the .NET flavor. POSIX and GNU don't allow any characters that aren't metacharacters to be escaped. The .NET flavor allows non-metacharacters to be escaped, except for the underscore. RegexBuddy already flagged escaped underscores as an error with these flavors. Version 3.2.1 provides a clearer explanation on the Create tab for the .NET flavor.
Escaped angle brackets are a more complicated issue. Angle brackets aren't metacharacters in any regex flavor, and needn't be escaped. Escaped angle brackets are interpreted as word boundaries by the GNU flavors, as an error by the POSIX flavors, and as literal angle brackets by all other flavors. That is how RegexBuddy has handled escaped angle brackets since support for the GNU flavors was added. Test results in RegexBuddy match those of the actual regex engines.
But on the Create tab, RegexBuddy flags angle brackets as an error for flavors that treat \< and \> as literal characters rather than word boundaries. The idea is that since escaping angle brackets with these flavors is pointless, you might be trying to use word boundaries with flavors that don't support this syntax for word boundaries.
RegexBuddy 3.2.1 now shows two warnings for \< and \> on the Create tab. The first warning is the same warning about lack of support for these word boundaries. Double-clicking this warning replaces the escaped angle bracket with \m or \M, or with a lookaround if those word boundaries aren't supported. The second warning is new. It explains that escaping angle brackets is not necessary. Double-clicking this warning will remove the backslash.
XML-style character class subtraction, which is supported by the JGsoft, .NET, XML Schema, and XPath flavors, was not always handled correctly. On the Use tab, the regex lost the -[ characters that begin the character class subtraction. Character class subtraction following an unescaped hyphen was not handled correctly on the Create tab. The JGsoft flavor treats the first hyphen in [a--[x]] as a literal. The .NET and XML flavors treat the first hyphen as an error, because the range is incomplete. A character class with two subtracted classes such as [x-[x]-[x]] caused RegexBuddy to get stuck on access violations if the current flavor supports character class subtraction. The second subtraction is now correctly marked as an error.
In replacement text using the .NET and JavaScript replacement text flavors, RegexBuddy did not always treat backslashes as ordinary characters that don't escape anything, as these flavors do. E.g. \$1 was treated as 3 literal characters instead of a literal backslash and backreference one. In replacement text for Python, RegexBuddy interpreted \0 as a token for the overall regex match. Python does not support \0. \g<0> is the only way to include the overall regex match in the replacement text in Python.
When converting a regex to Perl, RegexBuddy now always escapes $ signs in character classes to prevent unintended variable interpolation in Perl. Perl tries to be clever about interpreting $ signs in regular expressions. Depending on where a $ occurs, it will be treated as the start of a variable name, or as the regex anchor $. An escaped $ is always treated as a literal $. Therefore, RegexBuddy can't just escape all $ signs when formatting a regex as a Perl m// operator. Escaping $ signs in character classes when converting the regex to the Perl flavor works around this limitation.
When using a portable installation of RegexBuddy on various computers, it is quite likely that your USB stick or other portable device gets a different drive letter on different PCs. Now, RegexBuddy automatically detects that the drive letter has changed. The file histories under the Open buttons on the Test and Library tabs, as well as the last opened test file and library file, now automatically adapt to the new drive letter.
A series of other minor improvements and fixes were made as well. Please see http://www.regexbuddy.com/history.html for a complete version history.
If you have already purchased or upgraded to RegexBuddy 3, you can download this free update for free at http://www.regexbuddy.com/download.html
Version 3 is a major upgrade. If you own RegexBuddy 2, go to http://www.regexbuddy.com/upgradenow.html to purchase this new version at a significant discount (US$ 19.95 for a single user license). If you did not yet buy RegexBuddy, you can get your copy now at http://www.regexbuddy.com/buynow.html for US$ 39.95.
Updated: TPerlRegEx for Delphi 2009
TPerlRegEx is a Delphi VCL component wrapper around the open source PCRE library. The source code snippets RegexBuddy generates for Delphi (Win32) are based on TPerlRegEx.
Five months ago we released TPerlRegEx for Delphi 2009 which enables PCRE’s UTF-8 support when compiled with Delphi 2009, so you can use TPerlRegEx with Unicode strings. TPerlRegEx still supports Delphi 2007 and earlier using Ansi strings.
Unfortunatety, until this month's new release, TPerlRegEx for Delphi 2009 had a rather embarrasing bug: it didn’t actually enable the UTF-8 support in PCRE if you did not set the Options property to something different than the default. The new release of TPerlRegEx fixes this by adding these five lines to the constructor:
{$IFDEF UNICODE}
pcreOptions := PCRE_UTF8 or PCRE_NEWLINE_ANY;
{$ELSE}
pcreOptions := PCRE_NEWLINE_ANY;
{$ENDIF}
This fix only applies to Delphi 2009. Delphi 2009 has the UNICODE compiler define, which previous versions don't. When TPerlRegEx is compiled with Delphi 2009, it uses UTF8String. In Delphi 2009, assigning a string to UTF8String will actually cause the string to be stored as UTF-8. When TPerlRegEx passes this string to PCRE, it needs to tell PCRE it's using UTF-8 by setting the PCRE_UTF8 option.
When you compile TPerlRegEx with Delphi 2007 and earlier, it uses AnsiString. PCRE then operates in 8-bit mode, treating each byte as one character. PCRE does not support multi-byte Ansi character sets. UTF-8 and plain 8-bit are the only options. Note that in Delphi 2007, there's no difference between UTF8String and AnsiString. Manually defining the UNICODE directive in Delphi 2007 will not make TPerlRegEx support Unicode. You'd have to add explicit calls to UTF8Encode and UTF8Decode to do the conversions.
When using the buggy version of TPerlRegEx with Delphi 2009, this subject:
PerlRegEx1.Subject := '€';
will match ^.{3}$ but not ^.$
When using the corrected version of TPerlRegEx, ^.{3}$ fails while ^.$ matches.
You can download the TPerlRegEx source code at http://www.regular-expressions.info/download/TPerlRegEx.zip under the MPL 1.1 license.
That's it for this month. Thank you for using our software, and see you next month!
Kind regards,
Jan Goyvaerts Subscribe Please type in your email address below if you wish to subscribe to the Just Great Software Newsletter. If you have previously subscribed but your email address has changed, please type in both your old and new email address so we can properly update our database. |