Wednesday, October 3, 2007

PERL Regular expressions

“Hello world” =~/world/ -> evaluates to true

“Helllo world”!~/world/ -> evaluates to false

$word= “world”
$_=”Hello world”
print “It matches “ if $world

“Hello world”=~m!world!
“Hello world”=~m{world}

Meta characters

{} [] () ^ $ . * + ? \

Escape sequences

\t \n \033 \x1B

Anchor meta characters

$ = end

^ = beginning

Character classes

/[b,c,r]at/ - matches bat, cat and rat

/yes/i - here i denote case insensivity

Special charactors for character classes are -]\^$

‘-‘ rage operator
/item[0-9]/

If – is first/last in the class it is treated normally

Perl abbreviation for common char classes

\d – digit [0-9]

\s – whitespaces [\ \t\n\t\f]

\w – wordchar [0-9a-zA-Z_]

\D – negate \d

\S – negate \s

\W – negate \w

. – matches any character but “\n”

Word anchor - \b

$x = “Housecat catenates house & cat”

$x = ~/\bcat/
$x = ~/cat\b/
$x = ~/\bcat\b/

Alternation meta character |

“cats and dogs”=~/cats|dogs/

Grouping hierarchical matching…

Grouping meta characters ‘()’

/house(cat(s|))/

Extracting matches

$time =~/(\d\d):(\d\d):(\d\d)

$hour=$1
$min = $2
$sec = $3

/(ab(cd|ef))((gi)|j)/

$1 $2 $3$4

Bock referencing (\1,\2…)

/(\w\w\w)\s\1/ - matches words like “the the”

Matching repetition

a? – 0 or 1 a’s

a* - 0 or more a’s

a+ - 1 or more a’s

a{n,m} – least n not more than m

a{n,} – at least n

a{n} – exactly n

xx/xx/o – row substitution

xx/xx/g – global

Search & replace

s/regex/replacement/modifiers

$y = “’quoted modifiers’”
$y=~ s/^’(.*)’$/$1/
$x=”I batted 4 for 4”
$x =~ s/4/four/g

#resulted in I batted four for four

s///, s!!!, s{}{}

s{}//, s’’’

The splitter operator

$x = “Calving and Hobbes”
@word = split/\s+/,$x
#$word[0]=’calving’
#$word[1]=’and’
#$word[2]=’Hobbes’

No comments: