Regular Expressions


Regular expressions

A regular expression (or short: regex) is a pattern that is used to identify certain lines or parts of character-strings.

Regular Expressions (or regex) are a basic concept of the information technology. They let you define parts of strings. These definitions can have variable parts. This makes them so interesting.

Many unix-programs and programming languages use regex for searching (aka pattern matching) and, if applicable, for replacing. The following list is far from complete: vi, grep, sed, Perl, PHP, and many more.

As you can see in the table below, the regular expression syntax is quite similar in grep, sed and Perl. Bash has a much more limited syntax, and uses different characters for the same function (? vs. .) and the same character for a slightly different function (* vs. .*). This probably has to do with the usefullness of the dot in filenames, not only to mark file-extensions.

Regular expressions are also used in text-editors or other applications for searching and replacing. The syntax may vary from program to program, but the basic functions remain the same. (feel free to add your favorite application :)

match… Bash-glob grep sed Perl
…any one character ? .
…any characters (or no character) * .* (see repetition)
…one character from a set [ ]
…anything but the given set
[^ ]
…ranges in the set (e.g. a to z) [a-z]
repetitions Bash-glob grep sed Perl
zero or more of the preceding match n.a. *
zero or one of the preceding match n.a. \? ?
one or more of the preceding match n.a. \+ +
exactly x times n.a. \{x\} {x}
x times or more n.a. \{x,\} {x,}
x through y times n.a. \{x,y\} {x,y}
positions Bash-glob grep sed Perl
Beginning of line Beg. of expr.
^
End of line End of expr. $
Word boundary ‘ ‘ \b


Links

http://www.weitz.de/regex-coach/ – An interactive program that let’s you analyze regular expressions.

http://www.regular-expressions.info/ – Probably more than you want to know about regex…

http://www.dotnetcoders.com/web/Learning/Regex/RegexTester.aspx: an alternate tester, specifically for .Net, but seems to work “normally” and also *checks “groups”* (parens)

Regular Expression Tester


Examples

  • In this UGU-admin-tip, there are regexs to match the hidden files. Since ‘.*’ is not good (it also matches ‘..’ and thus goes up in the directory-hirarchy when used in recursive commands), my suggestion is be ‘.[^.]*’ (shell-glob) or ‘^\.[^\.]’ (sed, grep and the like). This matches everything that starts with a dot and is followed by a ‘non-dot’.


No comments yet.

Leave a Reply

Comments links could be nofollow free.