Have you ever wondered how search engines look up data in such a short amount of time? Or how text editors instantly find words when their search option is used? Or ever thought about how they deal with such a large amount of data and find only the relevant information?
Search engines or text editors ease this process by using regular expressions.
In this lesson, we will learn about regular expressions, with a focus on following key points:
- What are regular expressions and where are they used?
- Special characters and common patterns that are used in regular expressions.
- Implementing simple regular expressions in Python.
What are Regular Expressions?
Regular expressions, also known by the shorthand regex, are a special sequence of characters that define a certain pattern. This sequence is then used to match text according to the defined pattern.
You might be wondering-- why do we need another method to perform text matching, when we can already do it using:
- the
equal
operator (==
) - by indexing, or
- via string methods in any programming language?
It's because regular expressions are a more powerful (and much shorter!) method of performing the string matching operation, one where you can match strings even with custom-defined patterns, a functionality that you might not be able to perform using other methods.
Regular expressions can ease our day to day programming as well. Suppose you receive the data of personal information of people in your university, and you want to extract all the emails from it. If this is done manually, it would surely take quite some time. But don't worry! Regular expressions can make it easy, by solving this problem in a single line.

Build your intuition. Is this statement true or false?
Python provides support for regular expressions in the re
module. To use regex in a program, we need to import this module at the start of our program. This module provides helpful functions that make the matching process easier.
Let's review basic Python with a short question! Would import re
be a correct method to import the re module in Python?
Press true if you believe the statement is correct, or false otherwise.
We start our introduction to regular expressions in Python by using a basic search method to match strings provided by the re
module.
.search(regex, string)
takes a regular expression and a string as an input, and scans through the entire input string, and looks for a location where the given regex pattern finds a match.
Since this is an introduction to regular expressions, we will only use this method from
re
module. There are other methods from the module which provide more functionality. However, hold that thought for now.
Let's look at a simple example of this search method.
Since the string contained '123', the pattern '123' is instantly matched with the string s
in this example.
xxxxxxxxxx
// Adding a function to wrap the existing code
function findMatch() {
let s = 'hello123';
if (s.search('123') != -1) {
console.log("Found a match!");
} else {
console.log("Did not find a match.");
}
}
// Driver code to execute the function
findMatch();
Sure, the previous example could've also easily been replicated using a string method. The difference between normal search functions and regular expressions becomes much more evident when special characters (or metacharacters) are used in regular expressions. They provide a unique meaning to the expression when used. In this lesson, we will discuss some of the commonly used metacharacters.
The most basic among these is the use of square brackets ([ ]
) to define a character class. Any character inside the class is matched with the given string.
xxxxxxxxxx
let s = 'john479';
if (/[0-9]/.test(s))
console.log("Found a match!");
else
console.log("Did not find a match.");
This code snippet prints "Found a match!"
as a single-digit value (in the range of 0
to 9
) is present in the string. We can combine these square brackets to obtain more interesting results (such as matching consecutive characters) as illustrated below.

[a-z] matches any character between 'a' and 'z', [0-9] matches any character between '0' and '9'. Since they are placed right next to each other, the match is found consecutively.

Here the third part of the expression, [a-z] did not match the third consecutive character of the pattern. As a result, a matching string was not found.
Characters and digits can also be combined for matching, such as in the following example.

Let's see another metacharacter, a period (.
). A period matches any single character occurring at that specific place in a string (except newline character).
The regex 'chips.dip' matches any string which has any single character in-between 'chips' and 'dip', such as the one given in input 'chipsndip' or 'chipsmdip'.
xxxxxxxxxx
// Adding a function to wrap the existing code
function findMatch() {
let s = 'chipsndip';
if (/chips.dip/.test(s)) {
console.log("Found a match!");
} else {
console.log("Did not find a match.");
}
}
// Driver code to execute the function
findMatch();
A caret (^
) is another metacharacter. It matches characters at the start of the string. This is helpful when we need to match multiple strings that start with similar characters.
All the given strings begin from 'It', and hence are successfully matched with the given regular expression '^It'.
xxxxxxxxxx
// Adding a function to wrap the existing code
function findMatch() {
var s1 = 'It is rainy.';
var s2 = 'It is cloudy.';
var s3 = 'It is sunny.';
if (/^It/.test(s1) && /^It/.test(s2) && /^It/.test(s3)) {
console.log("Found a match!");
} else {
console.log("Did not find a match.");
}
}
// Driver code to execute the function
findMatch();
Let's look at a repetition based metacharacter, plus (+
). It checks if the previous character (from the position of +
) in the string appears one or more times from that position.
xxxxxxxxxx
// Adding a function to wrap the existing code
function findMatch() {
let s = "Zoooooootopia";
let pattern = /Zo+topia/;
if (pattern.test(s)) {
console.log("Found a match!");
} else {
console.log("Did not find a match.");
}
}
// Driver code to execute the function
findMatch();
In this example, it is important to write 'o' before '+', because this metacharacter checks if there is a preceding character present (there must be one of the repeating characters to check!). However, you could avoid this by using the asterisk ('*') metacharacter.
This matches the string perfectly without having to know about the preceding character beforehand.
xxxxxxxxxx
// Adding a function to wrap the existing code
function findMatch() {
let s = 'Zoooooootopia';
if (/Z*topia/.test(s)) {
console.log("Found a match!");
} else {
console.log("Did not find a match.");
}
}
// Driver code to execute the function
findMatch();
This lesson only lists down the basics of regular expressions. There are many other metacharacters available, and their combinations allow us to match complex patterns in texts.
If you are interested in learning more, below is a list of some more metacharacters (and the ones we studied in this lesson) used in regular expressions, along with their usage. Try experimenting with these metacharacters to create unique regular expressions.
Metacharacter | Character Name | Usage |
---|---|---|
[ ] | Square brackets | Matches set of characters specified within them |
. | Period | Matches any single character except newline |
^ | Caret | Matches the start of string |
$ | Dollar | Matches the end of string |
* | Asterik | Matches if there are zero or more repetitions |
+ | Plus | Matches if there are one or more repetitions |
\w | Lowercase w | Matches a single letter, digit, or underscore |
\W | Uppercase W | Matches any character which is not a part of \w |
\s | Lowercase s | Matches single whitespace character |
\S | Uppercase S | Matches any character which is not a part of \s |
\d | Lowercase d | Matches decimal digit in the range 0-9 |
\D | Uppercase D | Matches any character which is not a part of \d |
\t | Lowercase t | Matches tab |
\n | Lowercase n | Matches newline character |
One Pager Cheat Sheet
Search engines and text editors use **regular expressions** to quickly locate data and filter out the relevant information.
- Regular expressions, or regex, are a special sequence of characters that define a pattern to
match
text, which can be more powerful and shorter than the ‘equal’ operator (==
),indexing
, or otherstring methods
. - The
import
keyword is used toload
there
module
in Python, and thereforeimport re
is a valid statement. - The
re
module provides a basic search method,.search(regex, string)
, which scans through a string and looks for a match to the provided regular expression, as demonstrated in a simple example. - Using special characters (or
metacharacters
) in regular expressions adds unique meaning to the expression, typically demonstrated by defining a character class with square brackets ([ ]
). - Using
Square Brackets
,[a-z]
,[0-9]
, and.
we can createRegex
patterns to match character ranges, digits, and consecutive characters to achieve desired results. - The
^
metacharacter allows us to match multiple strings that start with the same characters. - The metacharacter
+
checks if the previous character appears one or more times. - By using the
*
metacharacter, one can avoid having to specify the preceding character when searching for a repeating character in a string. - The basics of regular expressions were just covered, but with
metacharacters
such asSquare Brackets
,Period
,Caret
,Dollar
,Asterisk
,Plus
,\w
,\W
,\s
,\S
,\d
,\D
,\t
and\n
, one canmatch complex patterns
in text.