Regular expression: From Wikipedia, the free encyclopedia
In computing, regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) provide a concise and flexible means for identifying text of interest, such as particular characters, words, or patterns of characters. Regular expressions are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.
Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on patterns. For example, Perl and Tcl have a powerful regular expression engine built directly into their syntax. Several utilities provided by Unix distributions—including the editor ed and the filter grep—were the first to popularize the concept of regular expressions.
7.1 Exercise – Fill in the following chart:
Google Search
Number of Hits
Regex
1,39,00,000
Regex Linux
1,15,00,000
Regex Windows
1,24,00,000
Regex Perl
57,10,000
Regex Java
2,31,00,000
Regex Vi
13,80,000
Regex egrep
1,28,000
Regex POSIX
4,62,000
In these notes we concentrate on POSIX regular expressions using egrep.
Wildcards:
Assume we have a directory with the following contents:
Using “wild cards:
7.2 The Grep Family
The UNIX grep utility marked the birth of a global regular expression print(GREP) tools. Searching for patterns in text is important operation in a number ofdomains, including program comprehension and software maintenance, structuredtext databases, indexing file systems, and searching natural language texts. Such awide range of uses inspired the development of variations of the original UNIXgrep. This variations range from adding new features, to employing fasteralgorithms, to changing the behaviour of pattern matching and printing. This
survey presents all the major developments in global regular expression printtools, namely the UNIX grep family, the GNU grep family, agrep, cgrep, sgrep,nrgrep, and Perl regularexpressions. Taken from man grep:
7.3 Regexs: Some Examples
Some Examples:We start with several simple examples. Assume we have a file fruits:
Matching characters “strings” by example:
Metacharacters: Metacharacters are characters that have ‘special’ meaning. Here are the metacharacters that are defined.
. Matches anycharacter.
* “character*” specifies that the character can be matched zero or more times.
+ “character+” Matches that character one or more times. Pay careful attention to the
difference between * and +; * matches zero or more times, so whatever’s being repeatedmay not be present at all, while + requires at least one occurrence. To use a similar example, ca+t will match “cat” (1 “a”), “caaat” (3 “a”‘s), but won’t match “ct”.
? “character?” Matches that character either once or zero times; you can think of it as marking something as being optional. For example, home-?brew matches either “homebrew” or “home-brew”.
Examples:
If you wish to search for a metacharacter the metacharacter must be escaped by preceding it with the backslash “”.As an example let’s assume we have a file such as:
And we wish to fine “209.204.146.22”. egrep ‘209.204.146.22’ ip will NOT work. We must escape the “.” character.
Anchors:
Using ^
and $
you can force a regex to match only at the start or end of a line, respectively.
^ Match at the start of a line
$ Match at the end of a line
As you can see, this regex fails to match both apple and grape, since neither starts with a ‘p’. The fact that they contain a ‘p’ elsewhere is irrelevant. Similarly, the regex e$ only matches apple, orange and grape:So ^cat
matches only those lines that start with cat, and cat$
only matches lines ending with cat.
Mind the quotes though! In most shells, the dollar-sign has a special meaning. By putting the regex in single-quotes (not double-quotes or back-quotes), the regex will be protected from the shell, so to speak. It’s generally a good idea to single-quote your regexes.
Moving on, ^cat$ only matches lines that contain exactly cat. You can find empty lines in a similar way with ^$. If you’re having trouble understanding that last one, just apply the definitions. The regex basically says: “Match a start-of-line, followed by an end-of-line”.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more