SoFunction
Updated on 2024-11-19

Python Regular Expressions Tutorial One: The Basics

preamble

Previously, someone proposed a requirement, I see this requirement with regular expressions is the most appropriate. Considering that every time I've used regular expressions before, I've done it on the fly, so this time I went through a systematic study of regular expressions while completing the task. The main reference is a video from PyCon 2016Regular Expressions

I will summarize regular expressions in several posts.

Here is the first part, the foundation:

basic part

Here is a summary of the most basic uses of regular expressions, most of which are common to me (and most programmers), so I'll just gloss over them and illustrate only a few of them with examples.

. All characters except newlines

^ Head of line

$ End of line

[abcd] abcd one of these characters

[^abcd] any character except abcd

[a-d] Equivalent to [abcd]

[a-dz] Equivalent to [abcdz]

\b Word boundaries

\w Alphanumeric or underscore Equivalent to [a-zA-Z0-9_]

\W Opposite of \w

\d Numbers, equivalent to [0-9]

\D Opposite of \d

\s Blank characters, equivalent to [ \t\n\r\f\v]

\S Opposite of \s

{5} The regular expression part before this (below) occurs exactly 5 times

{2,5} ~Two to five times.

{2,} ~ two or more times.

{,5} ~0 to 5 times.

* :: ~0 or more occurrences

? ~0 or 1 occurrence

+ ~ 1 or more occurrences

ABC|DEF Matches ABC or DEF.

\ Escape character, e.g. \ means match *, \$ means match $*.

\b,\ \ A brief illustration with the following examples:

     \b:

>>> (r'\bhello\b', 'hello')
<_sre.SRE_Match object; span=(0, 5), match='hello'>
>>> (r'\bhello\b', 'hello world')
<_sre.SRE_Match object; span=(0, 5), match='hello'>
>>> (r'\bhello\b', 'hello,world')
<_sre.SRE_Match object; span=(0, 5), match='hello'>
>>> (r'\bhello\b', 'hello_world') 
>>> 

In fact, here, \b is roughly the same as \W, but \b can match non-displayed characters such as the beginning and end of a line, while \W cannot.

     \:

>>> (r'\$100', '$100')
<_sre.SRE_Match object; span=(0, 4), match='$100'>
>>> (r'$100', '$100') 
>>> 

To match characters that have special meanings in regular expressions, such as $, ^, *, etc., you need to escape them with \.

raw string:

In addition, in the previous example, the pattern string (pattern) is preceded by an r, which means raw string, and the Pyhton interpreter does not need to escape the string that follows. Because, \ has a special meaning in Python strings and in regular expressions, so if it's not a raw string, then to express a \ character, four \s are needed (in the Python interpreter is first escaped once, 2 \s mean 1 \, leaving 2 \s, and then escaped again in the regular expression, leaving one \ in the end). Example:

>>> (r'\bhello\b', 'hello')
<_sre.SRE_Match object; span=(0, 5), match='hello'>
>>> ('\bhello\b', 'hello') 
>>> ('\\bhello\\b', 'hello')
<_sre.SRE_Match object; span=(0, 5), match='hello'>

>>> ('\\\\hello\\\\', '\\hello\\') 
<_sre.SRE_Match object; span=(0, 7), match='\\hello\\'>
>>> (r'\\hello\\', '\\hello\\') 
<_sre.SRE_Match object; span=(0, 7), match='\\hello\\'>
>>> print('\\hello\\')
\hello\

summarize

This is all about the basics of Python regular expressions, with this knowledge, the basic use of regular expressions will be no problem. For some special cases, you also need to master some other advanced use, please look forward to subsequent articles. I hope that the content of this article on everyone's learning or work can bring some help, if there are questions you can leave a message to exchange, if there are questions you can leave a message to exchange.