The role of the module is mainly used for string and text processing, find, search, replace, etc.
Let's review the basic regular expressions
. : matches any single character except a newline character
*: matches any character, one, zero, more than one can be matched Commonly known as greedy mode
+: matches one or more characters before the +.
|: matches the character before or after |.
^: match at beginning of line
$: match at end of line
? : matches zero or one character before ? : matches zero or one of the characters before ?
\: Indicates that the characters after \ are escaped.
[]: match any single character in [], [0-9] means match any number from 0 to 9.
(): treats the contents within () as a whole.
{}: match by the number of times in {}, 100[0-9]{3} means match any 3-digit number (100-999) after 100
Metacharacters in python that begin with \:
Special Sequence Symbols |
significance |
\A |
Match only at the beginning of the string |
\Z |
Match only at the end of the string |
\b |
Matches the empty string at the beginning or end |
\B |
Matches empty strings that do not begin or end with |
\d |
Equivalent to [0-9] |
\D |
Equivalent to [^0-9] |
\s |
Matches any blank character: [\t\n\r\r\v] |
\S |
Matches any non-white space character.[^\t\n\r\r\v] |
\w |
Match any numbers and letters:[a-zA-Z0-9] |
\W |
Match any non-number and letter:[^a-zA-Z0-9] |
Regular expression syntax table
grammatical | significance | clarification |
"." | arbitrary character | |
"^" | string inception | '^hello' matches 'helloworld'.' without matching 'aaaahellobbbb'' |
"$" | string ending | ditto |
"*" |
0 or more characters(greedy matching) |
<*> match<title> chinaunix</title> |
"+" |
1 or more characters(Greedy Matching)) |
ditto |
"?" |
0 or more characters(Greedy Matching)) |
ditto |
*?,+?,?? |
The above three take the first matching result (non-greedy matching)) |
<*> match <title> |
{m,n} |
Repeat m to n times for the previous character, {m} is also available |
a{6} matches 6 a, a{2,4}Match 2 to 4 a |
{m,n}? |
Repeat m to n times for the previous character and take as few as possible |
‘aaaaaa' in a{2,4} will only match 2 |
"\\" |
Special character escapes or special sequences | |
[] |
Indicates a character set | [0-9]、[a-z]、[A-Z]、[^0] |
"|" |
maybe | A|B, or operation |
(...) |
Match any expression in parentheses | |
(?#...) |
Note, can be ignored | |
(?=...) |
Matches if ... matches next, but doesn't consume the string. |
'(?=test)' Match hello in hellotest |
(?!...) |
Matches if ... doesn't match next. |
'(?!=test)' If hello is not followed by test, match hello. |
(?<=...) |
Matches if preceded by ... (must be fixed length). |
'(?<=hello)test' Match test in hellotest |
(?<!...) |
Matches if not preceded by ... (must be fixed length). |
'(?<!hello)test' does not match test in hellotest |
Signs and meanings of matches
symbolize | hidden meaning |
ignore capitals | |
Change the matching contents of \w,\w,\b,\b,\b,\s,\s according to the local settings. | |
Multi-line Matching Mode | |
Make "." metacharacter matches a newline character | |
Matches Unicode characters | |
Ignore spaces in patterns that need to be matched, and can be commented out with a "#" sign |
Text content (extracts password file under Linux)
man:x:6:12:man:/var/cache/man:/bin/nologin
re module has three search functions, each function accepts three parameters (match pattern, to match the string, to match the flag), if the match is returned to an object instance, no will return None.
findall(): Finds strings in a string that match a regular expression and returns a list of those strings.
search(): search the whole string, return object instance
match(): match only from the first character, no more matches after, return object instance
lovelinux@LoveLinux:~/py/boke$ cat text man:x:6:12:man:/var/cache/man:/bin/sh lovelinux@LoveLinux:~/py/boke$ cat #/usr/bin/env python #coding:utf-8 import re with open('text','r') as txt: f = () print ('bin',f) print ('bin',f).end() lovelinux@LoveLinux:~/py/boke$ python None 34 lovelinux@LoveLinux:~/py/boke$ vim lovelinux@LoveLinux:~/py/boke$ python None <_sre.SRE_Match object at 0x7f12fc9f9ed0>
Return is an object instance has 2 methods.
start(): return to record matching to the beginning of the character index
end(): return record matching to the end of the character index
lovelinux@LoveLinux:~/py/boke$ python None 31 34 lovelinux@LoveLinux:~/py/boke$ cat #/usr/bin/env python #coding:utf-8 import re with open('text','r') as txt: f = () print ('bin',f) print ('bin',f).start() print ('bin',f).end()