SoFunction
Updated on 2024-11-07

Example analysis of Python metacharacter usage

The backslash works:

To treat a metacharacter ^ as a normal character, add the backslash

Example:

>>>import re
>>>r=r'\^abc'
>>>(r,'^abc ^abc ^abc')
['^abc','^abc','^abc']

\d matches any decimal number, it is equivalent to class [0-9].
\D matches any non-numeric character, it is equivalent to class [^0-9]
\s matches any blank character, he is equivalent to the class [\t\n\r\f\v]
\S matches any non-whitespace character, it is equivalent to the class [^\t\n\r\f\v]
\w matches any alphanumeric character, it is equivalent to the class [a-zA-Z0-9_]
\W matches any non-alphanumeric character, it is equivalent to class [^a-zA-Z0-9_]

>>>r=r'[0-9]'
>>>(r,'1234567890')
['1','2','3','4','5','6','7','8','9','0']

>>>r=r'\d'
>>>(r,'1234567890')
['1','2','3','4','5','6','7','8','9','0']
>>>r=r'^010-\d\d\d\d\d\d\d\d'
>>>(r,'010-87654321')
['010-87654321']
>>>(r,'010-8765432')
[]

>>>r=r'^010-\d{8}'#Repeat eight times
>>>(r,'010-12345678')
['^010-12345678']

Role of asterisks: (*)

Matches a preceding character zero or more times.

>>>r=r'ab*'
>>>(r,'a')
['a']
>>>(r,'ab')
['ab']
>>>(r,'abbbbbb')
['abbbbbb']

Function of the plus sign: (+)

Indicates one or more matches.

>>>r=r'ab+'
>>>(r,'a')
[]
>>>(r,'ab')
['ab']
>>>(r,'abbbb')
['abbbb']

Problem of "-" in the middle of telephone numbers: (optional)

>>>r=r'^010-*\d{8}'
>>>(r,'010-12345678')
['010-12345678']
>>>(r,'01012345678')
['01012345678']
>>>(r,'010---12345678')
['010---12345678']

The role of question marks: (?)

Match once or zero times;

>>>r=r'^010-?\d{8}$'
>>>(r,'010--12345678')
[]
>>>(r,'010-12345678')
['010-12345678']
>>>(r,'01012345678')
['01012345678']

Do minimal pattern matching:

Greedy pattern matching is as follows:

>>>r=r'ab+'
>>>(r,'abbbbbbbbbbb')
['abbbbbbbbbbb']

Non-greedy pattern matching with a question mark as a minimum match is as follows:

>>>r=r'ab+?'
>>>(r,'abbbbbbbbbbb')
['ab']

>>>r=r'ab*?'
>>>(r,'abbbbbbbbbbbb')
['a']

Usage of parentheses: ({m,n})

where m and n are decimal integers. The qualifier means that there are at least m repetitions and at most n repetitions.

>>>r=r'a{1,3}'# indicates that a is repeated one to three times
>>>(r,'a')
['a']
>>>(r,'aa')
['aa']
>>>(r,'aaa')
['aaa']
>>>(r,'aaaa')
['aaa','a']

Grouping: "(" and ")"

>>> import re
>>> email=r'\w{3}@\w+(\.com|\.cn)'# Define regular, (\.com|\.cn) represents a grouping; do ** or ** operations in the grouping, either .com, or .cn
>>> (email,'www@')# Make a match
<_sre.SRE_Match object; span=(0, 13), match='www@'>
>>> (email,'www@')
<_sre.SRE_Match object; span=(0, 12), match='www@'>
>>> (email,'www@')
>>>  # Returns the null value
>>> (email,'www@')
['.com']      # Doing a match prioritizes the return of data in the grouping
>>> (email,'www@')
['.cn']
>>> 
>>> s='''
ajhfa kasjf owolf english=chinese yes  no print
lafl int=456 yes float
int=789 yes
owolf english=france yes  aklfl
'''#Define String
>>> r=r'owolf english=.+ yes' #define regular
>>> (r,s)    #MatchRegular
['owolf english=chinese yes', 'owolf english=france yes']     
>>> r=r'owolf english=(.+) yes'
>>> (r,s)
['chinese', 'france'] #Using grouping to prioritize the return of data in groups,Often used in crawlers

summarize

The above is the entire content of this article on the use of Python metacharacters example analysis, I hope to help you. Interested friends can continue to refer to other related topics on this site, if there are inadequacies, welcome to leave a message to point out. Thank you for your support of this site!