×
By the end of this chapter, you should be able to:
So far we have seen how to match certain kinds of characters a specific number of times. While this is a good start, we can improve our usage of regular expressions by understanding metacharacters, which are certain characters that can be prefixed with a \. Let's take a look:
\dThis matches a digit character. Instead of using [0-9], we can use \d:
var simplePhoneRegex = /[0-9]{3}-[0-9]{3}-[0-9]{4} / var betterPhoneRegex = /\d{3}-\d{3}-\d{4}/ var str = "My number is 201-867-5309" str.match(betterPhoneRegex) // ["201-867-5309"]
\DThis matches a non-digit character. We will see that the capitalized versions of metacharacters are equivalent to the not of the lower case metacharacter.
var noNumbers = /\D+/g var str = "H3ll0" str.match(noNumbers) // ["H", "ll"]
\n, \r and \tThese characters match newlines, carriage return, and tab characters, respectively:
"this is \n a string \n on many \n lines".match(/\n/g) // returns an array of three newlines
\sThis character matches any whitespace character:
"please remove all the white space now".replace(/\s/g,'') // "pleaseremoveallthewhitespacenow"
\SThis character matches any non-whitespace character:
"please remove everything but the white space now".replace(/\S/g,'') // " "
\wThis character matches any word character. Notice what is defined as a word character below. In particullar, numbers count as word characters!
"pl3ease r3mov3 ALL 12the 44word characters__. So what is left? Maybe [] or {} or () or [email protected]#$%^&*".replace(/\w/g,'') // " . ? [] {} () [email protected]#$%^&*"
\WThis characters matches any non-word character which includes spaces, special characters ([email protected]#$%*() and whitespace characters):
"j ".replace(/\W/g,'wow') // "jwow"
^If we want to match starting from the beginning of a string, we can use the ^ character:
"this is great".match(/^t.*/) // ["this is great"] "now this is not great".match(/^t.*/) // null
$If we want to match something that specifically ends with a character we use $:
"first.test.js".match(/.*\.test.js$/) // ["first.test.js"] "first.js".match(/.*\.test.js$/) // null
^ (inside [])If we want to exclude something in a character set we use ^ inside []:
"let's get rid of everything that is not a vowel".replace(/[^aeiou]/gi,'') // "eeioeeiaioaoe"
|If we want to handle multiple conditions we can use the or operator with a |. If you find yourself using multiple | operators, there is usually a better regular expression for the job.
"banana bread".match('bread|pancakes$') // ["bread"] "banana pancakes".match('bread|pancakes$') // ["pancakes"]
\bThe metacharacter \b matches the boundary between a word and a non-word character. It is used commonly when capturing entire words between non character words. The pattern for that is /\w+\b/.
"my email is. . . . . . [email protected]".match(/\b/g).length // 12 - why does this return 12? Count each start and end of a word (between non character word) // my // email // is // elie // infschool // com // => 6 * 2 = 12 // Now let's use word boundaries a bit better! "my email is. . . . . . . [email protected]".match(/\w+\b/g) // ["my", "email", "is", "elie", "infschool", "com"] "http://www.google.com".match(/\w+\b/g) // ["http", "www", "google", "com"]
()A more advanced concept in regular expressions is the idea of creating groups which you can later access. To create a group, we use the () characters. We can then refer to these groups as $1, $2, and so on.
var tweet = "This is the best tweet #amazing #perfect #sogood"; var regex = /#([\S]+)/ig; var matches = tweet.match(regex); matches.map(v => v.replace(regex, 'hashtag: $1')) // ["hashtag: amazing", "hashtag: perfect", "hashtag: sogood"]
If we want to iterate over multiple groups, we can loop and continue to use the exec function. You can read more about that here
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
When you're ready, move on to Regular Expressions Exercises