×
By the end of this chapter, you should be able to:
So far we have seen how to match certain kinds of characters a specific number of times. While this is a good start, we can improve our usage of regular expressions by understanding metacharacters, which are certain characters that can be prefixed with a \
. Let's take a look:
\d
This matches a digit character. Instead of using [0-9]
, we can use \d
:
var simplePhoneRegex = /[0-9]{3}-[0-9]{3}-[0-9]{4} / var betterPhoneRegex = /\d{3}-\d{3}-\d{4}/ var str = "My number is 201-867-5309" str.match(betterPhoneRegex) // ["201-867-5309"]
\D
This matches a non-digit character. We will see that the capitalized versions of metacharacters are equivalent to the not of the lower case metacharacter.
var noNumbers = /\D+/g var str = "H3ll0" str.match(noNumbers) // ["H", "ll"]
\n
, \r
and \t
These characters match newlines, carriage return, and tab characters, respectively:
"this is \n a string \n on many \n lines".match(/\n/g) // returns an array of three newlines
\s
This character matches any whitespace character:
"please remove all the white space now".replace(/\s/g,'') // "pleaseremoveallthewhitespacenow"
\S
This character matches any non-whitespace character:
"please remove everything but the white space now".replace(/\S/g,'') // " "
\w
This character matches any word character. Notice what is defined as a word character below. In particullar, numbers count as word characters!
"pl3ease r3mov3 ALL 12the 44word characters__. So what is left? Maybe [] or {} or () or [email protected]#$%^&*".replace(/\w/g,'') // " . ? [] {} () [email protected]#$%^&*"
\W
This characters matches any non-word character which includes spaces, special characters ([email protected]#$%*() and whitespace characters):
"j ".replace(/\W/g,'wow') // "jwow"
^
If we want to match starting from the beginning of a string, we can use the ^
character:
"this is great".match(/^t.*/) // ["this is great"] "now this is not great".match(/^t.*/) // null
$
If we want to match something that specifically ends with a character we use $
:
"first.test.js".match(/.*\.test.js$/) // ["first.test.js"] "first.js".match(/.*\.test.js$/) // null
^
(inside []
)If we want to exclude something in a character set we use ^
inside []
:
"let's get rid of everything that is not a vowel".replace(/[^aeiou]/gi,'') // "eeioeeiaioaoe"
|
If we want to handle multiple conditions we can use the or operator with a |
. If you find yourself using multiple |
operators, there is usually a better regular expression for the job.
"banana bread".match('bread|pancakes$') // ["bread"] "banana pancakes".match('bread|pancakes$') // ["pancakes"]
\b
The metacharacter \b
matches the boundary between a word and a non-word character. It is used commonly when capturing entire words between non character words. The pattern for that is /\w+\b/
.
"my email is. . . . . . [email protected]".match(/\b/g).length // 12 - why does this return 12? Count each start and end of a word (between non character word) // my // email // is // elie // infschool // com // => 6 * 2 = 12 // Now let's use word boundaries a bit better! "my email is. . . . . . . [email protected]".match(/\w+\b/g) // ["my", "email", "is", "elie", "infschool", "com"] "http://www.google.com".match(/\w+\b/g) // ["http", "www", "google", "com"]
()
A more advanced concept in regular expressions is the idea of creating groups which you can later access. To create a group, we use the ()
characters. We can then refer to these groups as $1
, $2
, and so on.
var tweet = "This is the best tweet #amazing #perfect #sogood"; var regex = /#([\S]+)/ig; var matches = tweet.match(regex); matches.map(v => v.replace(regex, 'hashtag: $1')) // ["hashtag: amazing", "hashtag: perfect", "hashtag: sogood"]
If we want to iterate over multiple groups, we can loop and continue to use the exec
function. You can read more about that here
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
When you're ready, move on to Regular Expressions Exercises