Regular expressions
A regular expression (or short: regex) is a pattern that is used to identify certain lines or parts of character-strings.
Regular Expressions (or regex) are a basic concept of the information technology. They let you define parts of strings. These definitions can have variable parts. This makes them so interesting.
Many unix-programs and programming languages use regex for searching (aka pattern matching) and, if applicable, for replacing. The following list is far from complete: vi, grep, sed, Perl, PHP, and many more.
As you can see in the table below, the regular expression syntax is quite similar in grep, sed and Perl. Bash has a much more limited syntax, and uses different characters for the same function (? vs. .) and the same character for a slightly different function (* vs. .*). This probably has to do with the usefullness of the dot in filenames, not only to mark file-extensions.
Regular expressions are also used in text-editors or other applications for searching and replacing. The syntax may vary from program to program, but the basic functions remain the same. (feel free to add your favorite application
| match… | Bash-glob | grep | sed | Perl | |
|---|---|---|---|---|---|
| …any one character | ? | . | |||
| …any characters (or no character) | * | .* (see repetition) | |||
| …one character from a set | [ ] | ||||
| …anything but the given set |
[^ ] |
||||
| …ranges in the set (e.g. a to z) | [a-z] | ||||
| repetitions | Bash-glob | grep | sed | Perl | |
| zero or more of the preceding match | n.a. | * | |||
| zero or one of the preceding match | n.a. | \? | ? | ||
| one or more of the preceding match | n.a. | \+ | + | ||
| exactly x times | n.a. | \{x\} | {x} | ||
| x times or more | n.a. | \{x,\} | {x,} | ||
| x through y times | n.a. | \{x,y\} | {x,y} | ||
| positions | Bash-glob | grep | sed | Perl | |
| Beginning of line | Beg. of expr. |
^ |
|||
| End of line | End of expr. | $ | |||
| Word boundary | ‘ ‘ | \b | |||
Links
http://www.weitz.de/regex-coach/ – An interactive program that let’s you analyze regular expressions.
http://www.regular-expressions.info/ – Probably more than you want to know about regex…
http://www.dotnetcoders.com/web/Learning/Regex/RegexTester.aspx: an alternate tester, specifically for .Net, but seems to work “normally” and also *checks “groups”* (parens)
Examples
- In this UGU-admin-tip, there are regexs to match the hidden files. Since ‘.*’ is not good (it also matches ‘..’ and thus goes up in the directory-hirarchy when used in recursive commands), my suggestion is be ‘.[^.]*’ (shell-glob) or ‘^\.[^\.]’ (sed, grep and the like). This matches everything that starts with a dot and is followed by a ‘non-dot’.
- …
No comments yet.