A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
It is useful for mainly two tasks:
- Verifying that strings match a pattern (for instance, that a string has the format of an email address).
- Performing substitutions in a string (such as changing all American spelling to British ones).
Some basics[]
The below example checks whether the pattern "spam" matches the string and prints "Match if it does.
import re # Define a regular expression pattern = r"spam" # Run 're.match' function to determine whether it matches at the beginning of a string. if re.match(pattern, "spamspamspam"): print("Match") else: print("No Match") >>> Match >>>
Other functions we can use to match patterns are :
re.search: finds a match of a pattern anywhere in the string.
re.findall: returns a list of all substrings that match a pattern.
if re.match(pattern, "eggspamsausagespam"): print("Match") else: print("No match") if re.search(pattern, "eggspamsausagespam"): print("Match") else: print("No Match") print(re.findall(pattern, "eggspamsausagespam")) >>> No match Match ['spam', 'spam'] >>>
The regex search returns an object with several methods that give details about it.
group: returns the string matched.
start/end: returns the start and ending postions of the first match respectively.
span: returns the starts and end positions of the first match as a tuple.
import re pattern = r"pam" match = re.search(pattern, "eggspamsausage") if match: print(match.group()) print(match.start()) print(match.end()) print(match.span()) >>> pam 4 7 (4, 7) >>>
Search and Replace[]
One of the most important regular expressions is sub.
re.sub(<pattern>, <repl>, <string>, max=0)
pattern = the pattern you are matching.
repl = the thing you are replacing the matched pattern with.
string = The thing you are running the substitute on.
e.g:
import re my_string = "My name is Renee. Hi Renee." pattern = r"Renee" new_string = re.sub(pattern, "Daniel", my_string) print(new_string) >>> My name is Daniel. Hi Daniel. >>>