I’m the first to admit—I love regular expressions. It’s kind of a hammer and nail situation. I see text, I immediately think:
using System.Text.RegularExpressions;
They’re just so useful. How can you not like them? Okay, they’re a bit obfuscated and horrid to debug (and that’s even true for the person who writes them). I’m always thinking of my coworkers, so here’s a few regurgitated thoughts about how to improve readability and maintainability of Regular Expressions.
Since Regular Expressions can be so difficult to understand, it helps to properly document the individual tokens in the pattern. There are a few reasons behind this:
- Check your work up front
- Clearly state what the expression intends to match
- Clearly state how the expression intends to match
I have this new friend—uh, buddy… his name is RegexBuddy. Damn, I love this tool. You can type in an expression, some input text, and see real time results. But, there’s something invaluable about its presentation. As you write an expression, it generates this great explanation. The best part is that it is in plain English. Looking at this got me thinking a little.

RegexBuddy even allows you to export the explanation to various places. Seems like the clipboard could come in handy. Wait, what if we could get this kind of information into comments? Maybe it could be pasted and massaged into comments to look something like this:
//---------------------------------------------------------------------
///
/// This Regular Expression can be used to extract
/// a customer name from the salutation of a form letter.
///
#region Regex Explanation
// Expression:
// Dear (?<name>[A-Za-z ]*),
//
// Explanation:
// Match the characters “Dear ” literally «Dear »
// Match the regular expression below and capture its match into backreference with name “name” «(?<name>[A-Za-z ]*)»
// Match a single character present in the list below «[A-Za-z ]*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// A character in the range between “A” and “Z” «A-Z»
// A character in the range between “a” and “z” «a-z»
// The character “ ” « »
// Match the character “,” literally «,»
//
// Sample Input:
// "Dear Loyal Reader,
//
// Thanks for reading John Coder!
"
//
// Matches:
// "Loyal Reader"
//
// Created with RegexBuddy
#endregion
public const string GetCustomer = @"Dear (?<name>[A-Za-z ]*),";
Is it overkill? Maybe. Although, there are a couple of things to note. Jamming all of this “stuff” into the Xml comment tag will inevitably break it. Plus, you probably don’t want that much information to pop up in a tooltip anyway. So, I wrapped it in a region to make it collapsible. The summary is really a summary, and the drawn out explanation is deferred to a less-obtrusive location.
Is this easy enough to understand? Leave a comment and let me know what you think.