{ Introduction to Regular Expressions. }

Objectives:

By the end of this chapter, you should be able to:

  • Define what a regular expression is and explain why regular expressions are useful
  • Create regular expressions using the RegExp constructor
  • Use regular expressions with JavaScript string methods
  • Create basic patterns using common metacharacters and flags

What is a regular expression?

A regular expression is a sequence of characters that create a pattern. We can use regular expressions to search for and find these patterns in strings, which enables us to perform complex pattern-matching on strings.

Let's imagine we are given the following string: "The quick brown fox jumps over the lazy dog." If you were tasked with finding the word "jumps," that would not be too difficult. But what happens when you are tasked with finding three or four word characters that start with a vowel and contain at least one other vowel? Finding matching patterns like this requires something a bit more powerful: that's where regular expressions, or 'regex,' come in.

Regular expressions are commonly used to validate emails, phone numbers, zip codes, passwords, and much more. They are also used to find or replace characters in text files, which makes knowing them very helpful. So let's get started with creating a regular expression and seeing which JavaScript methods can help us with finding patterns.

Regular Expression syntax

One way to create a regular expression is to place the pattern between two forward slashes //. Do not worry too much about what patterns look like, we will start with a very simple one. Let's match an exact string of characters. Here is what that looks like:

var pattern = /Elie/;

Now that we have created our first pattern, let's get started with our first string method, match. Match will return an array of matches, or null if a match is not found.

Let's start with the following example

var str = "My name is Elie, is your name Elie?";
var matches = str.match(/Elie/);
matches; // // ["Elie"]

Notice here, that matches is an array of the pattern we are trying to match. But it only returns the first occurance of Elie to us! If we would like to match all occurances accross a string, we need to add what is called a flag. We place our flags after the closing / in a regular expression and then specify the type of flag with a single character. The flag we will be using is g, which is the global flag and finds all matches in the entire string.

var str = "My name is Elie, is your name Elie?";
var matches = str.match(/Elie/g);
matches; // ["Elie", "Elie"]

Much better! Often times, we may not care if the string is upper or lower cased, but regular expressions do! If we would like to make our search case insensitive, we can also add the i flag. To use multiple flags, just put them next to each other in the regular expression (after the closing forward slash). Let's see what that looks like.

var str = "My name is Elie, is your name Elie?";
var matches = str.match(/elie/gi);
matches // ["Elie", "Elie"]

Nice! Hopefully this gives you a good start with how the match function works. There are a couple other string methods we can use regular expressions with, but let's learn some more about regular expressions first.

The wildcard character

Sometimes, we have an idea of what characters we want to match in a string or word, but don't care about other characters in that string or word. If we want to match anything we can use the special character . - this is known as the "wildcard" character, and will match anything except for the newline character. Let's take a look at an example:

var str = "The cat in the hat deserves a pat";
var matches = str.match(/.at/g);
matches; // ["cat", "hat", "pat"]

We can also use the wildcard character multiple times in many different places, but we will soon see that there are better ways to do this. You can also see that whitespace is being matched in the case of " tape".

var str = "shape tape grape";
var matches = str.match(/..a.e/gi)
matches; // ["shape", " tape", grape"]

We can match numbers as well:

var numbers = '123 321 121 111 428 888';
var matches = numbers.match(/.2./g);
matches // ["123", "321", "121", "428"]

Character sets

Let's now imagine that we want to match a string that has four characters, but the first character has to be "a", "b", "c" or "d". We need some way of specifying all of those characters. This is where character sets come in - and they are denoted by placing characters inside of []. You can specify a range with character sets using the - character.

var str = 'amen bean cups deer pear';
var matches = str.match(/[a-d].../g); 
matches // ["amen", "bean", "cups", "deer"]

Greedy matching

Here are some special characters you'll find working with regular expressions.

? - This matches at most 1 of the previous match. In other words, it marks the previous the previous match as optional.

var match1 = "cookies".match(/cookies?/) // ['cookies']
var match2 = "cookie".match(/cookies?/) // ['cookie']
var match3 = "cookies".match(/cookiess?/) // ['cookies']
var match4 = "cookies".match(/cookiesss?/) // null

+ - This matches one or more of the previous match.

var match1 = "cookiessssssssss".match(/cookies+/) // ["cookiessssssssss"]
var match2 = "cookies".match(/cookies+/) // ['cookies']
var match3 = "cookie".match(/cookies+/) // null

* - This matches zero or more of the previous match.

var match1 = "cookiessssssssss".match(/cookies*/) // ["cookiessssssssss"]
var match2 = "cookies".match(/cookies*/) // ['cookies']
var match3 = "cookie".match(/cookies*/) // ['cookie']

We can also use the wildcard character . and the * character to match zero or more of anything.

// match anything that starts with, ends with or has the letter e inside of it

"elie".match(/.*e.*/gi); // ['elie']
"elephants are everywhere".match(/.*e.*/gi); // ['elephants are everywhere']
"can you think of a string containing almost all non-consonants?".match(/.*e.*/g); // null

Character ranges

When we want a specific quantity of characters we can use the character range which is denoted by {}. e{2} will match the letter 'e' exactly two times. You can even specify a range with a minimum and maximum value e{1,3} will match the letter 'e' one to three times. If you omit the second number in the character range, but include a ,, it will match an infinite amount. For example, e{2,} will match the character "e" two times or more.

We can combine this with other matching patterns form powerful regular expressions, but let's start with a simple one first.

// only match when there is more than 1 'l'
var str = "helo hello hellllo hellllllllllo"
str.match(/hel{2,}o/g) // ["hello", "hellllo", "hellllllllllo"]
// count how many words have two or more o's or two d's in the middle 
var str = "noodle caboodle testing fiddle person diddle muddle booooombox"
str.match(/[od]{2,}/g).length // 6

Escaping characters

So far we have seen special characters like {}, [], +, *, ?, and .. But what happens if we want to search for those actual characters in a string? For instance, take a look at the following example:

// let's try to find the number of periods in a sentence.
var str = "Hello. I'm Elie."
str.match(/./g) // ["H", "e", "l", "l", "o", ".", " ", "I", "'", "m", " ", "E", "l", "i", "e", "."] - think about why this might happen?

Rather than matching the two periods, in the above example our regex matched every character!

When we are trying to find special characters in a string, we need to escape them with a backslash (\) character. Here is what that would look like:

// let's try to find the number of periods in a sentence.
var str = "Hello. I'm Elie."
str.match(/\./g) // [".", "."] - much better!

replace, search,and split

The replace function in JavaScript can accept as its first parameter either a string or a regular expression. The second parameter (a string or callback function) will be what the text is replaced with.

var str = "awesome"
str.replace('e','z') // "awzsome" - it does not get the last e!

var str = "awesome"
str.replace(/e/g,'z') // "awzsomz" - much better!

// using a callback
var str = "awesome"
str.replace(/[aeiou]/g, function(match) {
    return match.toUpperCase();
}); // "AwEsOmE";

The search function in JavaScript can accept as its first parameter either a string or a regular expression. Similar to indexOf, the search function will return the first starting point of where a match is found or -1 if a match is not found.

var str = "awesome"
str.search('awe') // 0
str.search('z') // -1

// using a regular expression
var str = "awesome"
str.search(/..e/) // 0
str.search(/p/) // -1

The split function in JavaScript can accept as its first parameter either a string or a regular expression, which will be used as the string to split on (this is also known as the delimeter).

var str = "My name is elie"
str.split(/e/g);

Using the RegExp constructor

So far we have seen how to create a regular expression using the // notation. This notation is quite easy to use, but is a problem when we need to dynamically create it and we don't know what the pattern will be beforehand. Let's examine the following function countLetters, which accepts a word and letter and returns the number of times the letter appears.

function countLetters(word, letter){
    var matches = word.match(letter)
    if(matches){
        return matches.length
    }
    return 0;
}

countLetters('awesome', 'e') // 1

So what's wrong with our function? The problem here is that the regular expression that gets created does not have the g flag! If we want to dynamically create the regular expression, we can use the RegExp constructor function. This function accepts as its first parameter the pattern (what goes inside the //) and as a second parameter, a string of all the flags we want to pass in. Let's see how that works.

function countLetters(word, letter){
    var regex = new RegExp(letter, 'gi')
    var matches = word.match(regex)
    if(matches){
        return matches.length
    }
    return 0;
}

countLetters('awesome', 'e') // 2

Nice! We can now use our regular expression and check for all occurances and even case insensitivity for good measure.

Now it's time for some practice.

Exercises

Part I

Answer the following questions:

  • What is a regular expression? What are some use cases of regular expressions?
  • What are the two ways to create regular expression in JavaScript?
  • What is a flag?
  • What is the difference between ?, + *?
  • What is the difference between [] and {}?
  • What does the search function do?
  • What do the exec and test functions do (these functions exist on the RegExp prototype)?

Part II

countNumbers

Write a function called countNumbers which accepts a string of numbers and returns the count of numbers between 0 and 9.

countNumbers("321321dsadsa930-29d132b13a") // 16
countNumbers("this is so wonderful") // 0
countNumbers("this is so 1234") // 4
capitalSentence

Write a function called capitalSentence which accepts a string and returns another string with all the capital letters joined together.

capitalSentence("The Cat In The Hat") // "TCITH"
capitalSentence("And I Think to Myself What a Wonderful World") // "AITMWWW"
isValidPassword

Write a function caled isValidPassword, which accepts a string. If the string is longer than 7 characters and includes at least one special character (!,@,#, or $) , the function should return true. Otherwise, return false

isValidPassword('TacoCat') // false
isValidPassword('foo') // false
isValidPassword('awesome!') // true
isValidPassword('[email protected]') // false

When you're ready, move on to Metacharacters and Grouping

Continue

Creative Commons License