HOME  |    TRAINING  |   FREE TUTORIALS   |   JOBS
Find out more about our new RSS feed.
FREE Tutorial
BEGINNING PERL PART 3 - REPETITION

CATEGORY
SEARCH OUR OTHER TUTORIALS

DESCRIPTION

We've now moved from matching a specific character to a more general type of character - when we don't know (or don't care) exactly what the character will be. Now we're going to see what happens when we want to talk about a more general quantity of characters: more than three digits in a row; two to four capital letters, and so on.


This free tutorial is a sample from the book Beginning Perl.


The metacharacters that we use to deal with a number of characters in a row are called quantifiers.

Indefinite Repetition

The easiest of these is the question mark. It should suggest uncertainty - something may be there, or it may not. That's exactly what it does: stating that the immediately preceding character(s) - or metacharacter(s) - may appear once, or not at all. It's a good way of saying that a particular character or group is optional. To match the word 'he or she', you can put:

> perl matchtest.plx
Enter some text to find: \bs?he\b
The text matches the pattern '\bs?he\b'.
>

To make a series of characters (or metacharacters) optional, group them in parentheses as before. Did he say 'what the Entish is' or 'what the Entish word is'? Either will do:

> perl matchtest.plx
Enter some text to find: what the Entish (word )?is
The text matches the pattern 'what the Entish (word )?is'.
>

Notice that we had to put the space inside the group: otherwise we end up with two spaces between 'Entish' and 'is', whereas our text only has one:

> perl matchtest.plx
Enter some text to find: what the Entish (word)? is
'what the Entish (word)? is' was not found.
>

As well as matching something one or zero times, you can match something one or more times. We do this with the plus sign - to match an entire word without specifying how long it should be, you can say:

> perl matchtest.plx
Enter some text to find: \b\w+\b
The text matches the pattern '\b\w+\b'.
>

In this case, we match the first available word - I.

If, on the other hand, you have something which may be there any number of times but might not be there at all - zero or one or many - you need what's called 'Kleene's star': the * quantifier. So, to find a capital letter after any - but possibly no - spaces at the start of the string, what would you do? The start of the string, then any number of whitespace characters, then a capital:

> perl matchtest.plx
Enter some text to find: ^\s*[A-Z]
'^\s*[A-Z]' was not found.

>

Of course, our test string begins with a quote, so the above pattern won't match, but, sure enough, if you take away that first quote, the pattern will match fine. Let's review the three qualifiers:

Novice Perl programmers tend to go to town on combinations of dot and star, and the results often surprise them, particularly when it comes to searching-and-replacing. We'll explain the rules of the regular expression matcher shortly, but bear the following in mind:

A regular expression should hardly ever start or finish with a starred character.

You should also consider the fact that .* and .+ in the middle of a regular expression will match as much of your string as they possibly can. We'll look more at this 'greedy' behavior later on.

Well-Defined Repetition

If you want to be more precise about how many times a character or roups of characters might be repeated, you can specify the maximum and minimum number of repeats in curly brackets. '2 or 3 spaces' can be written as follows:

> perl matchtest.plx
Enter some text to find: \s{2,3}
'\s{2,3}' was not found.
>

So we have no doubled or trebled spaces in our string. Notice how we construct that - the minimum, a comma, and the maximum, all inside braces. Omitting either the maximum or the minimum signifies 'or more' and 'or fewer' respectively. For example, {2,} denotes '2 or more', while {,3} is '3 or fewer'. In these cases, the same warnings apply as for the star operator.

Finally, you can specify exactly how many things are to be in a row by simply putting that number inside the curly brackets. Here's the five-letter-word example tidied up a little:

> perl matchtest.plx
Enter some text to find: \b\w{5}\b
'\b\w{5}\b' was not found.
>

Summary Table

To refresh your memory, here are the various metacharacters we've seen so far:

Continued...


NEXT PAGE



5 RELATED COURSES AVAILABLE
MICROSOFT VISUAL BASIC V6 INTRODUCTION
To go from the fundamentals of Visual Basic programming to the threshold of Advanced level. Gaining in depth prog....
MICROSOFT VISUAL BASIC 5.0 PROFESSIONAL INTRODUCTION
To provide readers with a solid foundation upon which to build Windows applications using Visual Basic 5. Readers....
MICROSOFT VISUAL BASIC 5.0 CLIENT SERVER DEVELOPMENT
This course teaches the skills required to develop client server applications using MS Visual Basic 5.0 Enterpris....
C++ PROGRAMMING
Object oriented programming is fast becoming the leading software design methodology, with C++ becoming ever more....
C PROGRAMMING
This course is design to provide non-C programmers with the essential skills and knowledge necessary to allow the....
 
0 RELATED JOBS AVAILABLE
CONTACT US
Monday 6th October 2008  © COPYRIGHT 2008 - VISUALSOFT