{ Metacharacters and Grouping. }

Objectives:

By the end of this chapter, you should be able to:

  • Use common metacharacters to write more succinct regular expressions
  • Create word boundaries and groups for more efficient regular expressions
  • Write regular expressions for more complex types of validation and pattern matching

Metacharacters

So far we have seen how to match certain kinds of characters a specific number of times. While this is a good start, we can improve our usage of regular expressions by understanding metacharacters, which are certain characters that can be prefixed with a \. Let's take a look:

\d

This matches a digit character. Instead of using [0-9], we can use \d:

var simplePhoneRegex = /[0-9]{3}-[0-9]{3}-[0-9]{4} /
var betterPhoneRegex = /\d{3}-\d{3}-\d{4}/
var str = "My number is 201-867-5309"
str.match(betterPhoneRegex) // ["201-867-5309"]

\D

This matches a non-digit character. We will see that the capitalized versions of metacharacters are equivalent to the not of the lower case metacharacter.

var noNumbers = /\D+/g
var str = "H3ll0"
str.match(noNumbers) // ["H", "ll"]

\n, \r and \t

These characters match newlines, carriage return, and tab characters, respectively:

"this is \n a string \n on many \n lines".match(/\n/g) // returns an array of three newlines

\s

This character matches any whitespace character:

"please remove all the white space now".replace(/\s/g,'') // "pleaseremoveallthewhitespacenow"

\S

This character matches any non-whitespace character:

"please remove everything but the white space now".replace(/\S/g,'') // "       "

\w

This character matches any word character. Notice what is defined as a word character below. In particullar, numbers count as word characters!

"pl3ease r3mov3 ALL 12the 44word characters__. So what is left? Maybe [] or {} or () or [email protected]#$%^&*".replace(/\w/g,'') // "     .    ?  []  {}  ()  [email protected]#$%^&*"

\W

This characters matches any non-word character which includes spaces, special characters ([email protected]#$%*() and whitespace characters):

"j ".replace(/\W/g,'wow') // "jwow"

Special characters

starting - ^

If we want to match starting from the beginning of a string, we can use the ^ character:

"this is great".match(/^t.*/) // ["this is great"]
"now this is not great".match(/^t.*/) // null

ending - $

If we want to match something that specifically ends with a character we use $:

"first.test.js".match(/.*\.test.js$/) // ["first.test.js"]
"first.js".match(/.*\.test.js$/) // null

excluding ^ (inside [])

If we want to exclude something in a character set we use ^ inside []:

"let's get rid of everything that is not a vowel".replace(/[^aeiou]/gi,'') // "eeioeeiaioaoe"

or - |

If we want to handle multiple conditions we can use the or operator with a |. If you find yourself using multiple | operators, there is usually a better regular expression for the job.

"banana bread".match('bread|pancakes$') // ["bread"]
"banana pancakes".match('bread|pancakes$') // ["pancakes"]

word boundaries - \b

The metacharacter \b matches the boundary between a word and a non-word character. It is used commonly when capturing entire words between non character words. The pattern for that is /\w+\b/.

"my email is. . . . . . [email protected]".match(/\b/g).length // 12 - why does this return 12? Count each start and end of a word (between non character word)

// my
// email
// is
// elie
// infschool
// com

// => 6 * 2 = 12

// Now let's use word boundaries a bit better!

"my email is. . . . . . . [email protected]".match(/\w+\b/g) // ["my", "email", "is", "elie", "infschool", "com"]

"http://www.google.com".match(/\w+\b/g) // ["http", "www", "google", "com"]

Groupings - ()

A more advanced concept in regular expressions is the idea of creating groups which you can later access. To create a group, we use the () characters. We can then refer to these groups as $1, $2, and so on.

var tweet = "This is the best tweet #amazing #perfect #sogood";
var regex = /#([\S]+)/ig;

var matches = tweet.match(regex);

matches.map(v => v.replace(regex, 'hashtag: $1')) // ["hashtag: amazing", "hashtag: perfect", "hashtag: sogood"]

If we want to iterate over multiple groups, we can loop and continue to use the exec function. You can read more about that here

External Resources

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

https://regexone.com/lesson/introduction_abcs

When you're ready, move on to Regular Expressions Exercises

Continue

Creative Commons License