Regular Expressions

Overview

Regular expressions is a way to express complex string patterns to search in a text. When we need to search a text to find the word 'House' in it, this word is a simple search pattern. But what to do if we want to search for the words 'House', 'Home', 'Homeland' at the same time? This is a much more complex case. Regular expressions permit us to order a search routine to search a text for something much more complex than a simple word pattern. For this purpose special symbols are used.

CalcIt supports regular expressions in commands REFind and REReplace. These commands are regular expression versions of Find and Replace commands. When we need to search for a simple word pattern, commands Find and Replace are better suited because are much faster. REFind and REReplace work in two phases. In the first phase they evaluate the regular expression and in the second they use the result of this evaluation to accomplish their search operation. In a repeated execution of the same pattern in a loop, it is evaluated only once.

The regular expression for the above example is: 'House|Home|Homeland'. The symbol | plays the role of an OR operator.

Regular Expressions Syntax

Character Description
\ Special characters defined in this table can be searched as normal characters if preceded by this character. For example \( permits to search (. To search for \ we use \\.
* Matches the preceding character zero or more times. For example, 'zo*' matches either 'z' or 'zoo'.
+ Matches the preceding character one or more times. For example, 'zo+' matches 'zoo' but not 'z'.
! Matches the preceding character zero or one time. For example, 'a!ve!' matches the 've' in 'never'.
. Matches any single character.
x|y Matches either x or y. For example, 'z|food' matches 'z' or 'food'. '(z|f)ood' matches 'zood' or 'food'. We can use many consecutive | as needed.
{n} n is a nonnegative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o' in 'Bob,' but matches the first two o's in 'foooood'.
{n,} n is a nonnegative integer. Matches at least n times. For example, 'o{2,}' does not match the 'o' in 'Bob' and matches all the o's in 'foooood'. 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.
{n,m} m and n are nonnegative integers. Matches at least n and at most m times. For example, 'o{1,3}' matches the first three o's in 'fooooood'. 'o{0,1}' is equivalent to 'o!'.
[xyz] A character set. Matches any one of the enclosed characters. For example, '[abc]' matches the 'a' in 'plain'.
[^xyz] A negative character set. Matches any character not enclosed. For example, '[^abc]' matches the 'p' in 'plain'.
[a-z] A range of characters. Matches any character in the specified range. For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' through 'z'. We can use more than a range separating them with comma characters. E.g. [a-z,A-Z,0-9]
[^m-z] A negative range characters. Matches any character not in the specified range. For example, '[m-z]' matches any character not in the range 'm' through 'z'. We can use more than a range separating them with comma characters. E.g. [^a-z,A-Z,0-9]
(pattern) Parentheses are used as as mean to create a complex regular expression combining a number of independent to each other regular sub expressions. Parentheses can be nested. To match parentheses characters ( ), use '\(' or '\)'. For example '(aa|bb)cc' matches 'aacc' or 'bbcc'. In this example if parentheses was missing then the regular expression will match 'aa' or 'bbcc'.
?u Denotes no case sensitive search. Can be put only in the start of the regular expression. Also can be used in the start of a sub expression enclosed in parentheses. e.g.

s:='1234.34 USD and 12456.77 EUR';
set d=REFind('?u[0-9]+\.[0-9]+ usd|[0-9]+\.[0-9]+ eur',s);
PrintArr(d);

prints:

1|1234.34 USD
17|12456.77 EUR

?c Denotes case sensitive search. Case sensitive search is used by default. This command is used the same way as ?u. Is useful when an expression has started with ?u (no case) and is desired a sub expression to be case sensitive e.g.

s:='The text is ABCefg and abcEFG';
set d=REFind('?uabc(?cefg)',s);
PrintArr(d);

prints

13|ABCefg