Regex
Introduction
1. What is the purpose of regex?
2. List the common use cases and examples for regex.
- Text Search and Manipulation: Search for specific patterns or strings in text documents.
- Data Validation: Validate user input in forms or applications. (E.g. password creation)
- Data Extraction: Extract specific information from a larger dataset. (E.g. Extracting email addresses)
- Data Cleaning: Clean and preprocess data by removing or replacing unwanted patterns. (E.g. Fix formatting issues in datasets.)
3. What are the key steps to solve a regex problem?
- Understand the requirements - What needs to be included or excluded?
- Identify the patterns in the inclusion or exclusion list.
- Represent the patterns using regular expression.
Meta Characters
Meta-characters are special characters with a predefined meaning in regular expressions.
1. Syntax: .
2. Syntax: *
3. Syntax: +
4. Syntax: ?
5. Syntax: .*
.* means “match any sequence of characters, including an empty sequence.”
6. Syntax: .*?
.) any number of times (*), as few times as possible to make the regex match (?).
7. Syntax: \
\ is used as an escape character and to be placed before a special characters such as ^$*.{()\ such that regex can recognise the characters.
8. Syntax: ^pattern
9. Syntax: pattern$
10. Syntax: ^pattern$
^ and $ are used together, they ensure that the entire string conforms to the specified pattern, not just a part of it.
11. Syntax: |
12. Syntax: ()
13. Syntax: {}
14. Syntax: [ab]
a,b) is present.
15. Syntax: [^ab]
a and b.
16. Syntax: [a-c]
a and c.
17. Syntax: [a-cm]
To match for the range of characters..
- between
aandcor m.
18. Syntax: [a-cA-Cx]
To match for the range of characters,
- between
aandclowercase, - between
AandCuppercase or x.
Exercises
1. Match exactly 3 random digits pattern.
^[0-9]{3}$
2. Match exactly 3 random characters patterns.
^[.]{3}$
3. Match 4-6 random alphabets patterns.
[a-z]{4-6}
4. Match at least 4 ha.
(ha){4,}
5. Match less than or equals to 3 ha.
(ha){,3}
6. Match at least one a.
a+
7. Match zero or one a.
a?
8. Match either logwood or plywood.
(log|ply)wood
Special Sequences
Regex special sequences are sequences of characters with a special meaning when used in a regular expression. They are represented by a backslash () followed by a specific character.
1. Syntax: \A
2. Syntax: \Z
3. Syntax: \w
[a-zA-Z0-9_]).
4. Syntax: \W
[^a-zA-Z0-9_].
5. Syntax: \b
6. Syntax: \B
7. Syntax: \d
[0-9]
8. Syntax: \D
[^0-9].
9. Syntax: \s
10. Syntax: \S
Exercises
1. Given a string “This is a ball.” Use \b to match the word ball.
\bball\b
2. Provide the regex to find all words starting with 'b' or 'e' in a given string.
[be]\w+
3. Given a string “This is a baseball.” Will regex: \bball\b match the ball baseball?
4. Provide the regex to check if a string start with hi.
\Ahi
5. Provide the regex to find the whole word red.
\bred\b
6. Provide the regex to check if a string ends with bye.
bye\Z
7. Provide the regex to check if a string starts with exactly 2 digits.
\A[\d]{2}
8. Provide the regex to check if a string ends with exactly 2 non-digit characters.
[\D]{2}\Z
9. Provide the regex to match with the pattern: [1270X160 , 800X600, 1024X768].
\d{3,4}X\d{3}
10. Provide the regex to match with the pattern: [John Wallace, Steve King, Adam Smith].
([a-zA-Z]+)\s([a-zA-Z]+)
11. Provide the regex to match with the pattern: [7:32, 6.12, 12:23, 1.23].
(\d{1,2})[:.](\d{2})
12. Provide the regex to match with the pattern: [745.246.4369, 234.325.6543].
(\d{3})\.(\d{3})\.(\d{4})
13. Provide the regex to match with the pattern: [Jan 5th 1987, Aug 3rd 2009].
([a-zA-Z]{3})\s(\w{3,4})\s(\d{4})
14. Provide the regex to match with the pattern: [(745).246.4369, (234).325.6543].
\(\d{3}\)\.\d{3}\.\d{4}
Python Regex Functions
In Python, the re module provides support for regular expressions.
1.
findall() 2.
search() 3.
split() 4.
sub() 5.
match_object.string 6.
match_object.group 7.
match_object.span
Exercises
1. Search for the first white-space character in the string.
re.search(”\s”, txt)
2. Split at each white-space character from a string.
re.split('\s', txt)
3. Split at the first white-space character from a string.
re.split('\s', txt, 1)
4. Replace every white-space character with :.
re.sub('\s',':', str)
5. Replace the first and second white-space character with :.
re.sub('\s',':', str, 2)