Python uses the re module to implement information filtering

This article example describes how Python uses the re module to realize information filtering. Shared for your reference, as follows:

contexts

usual work, we often deal with a lot of metadata (Raw Data), and the general file editor can only query a keyword at a time, it is difficult to analyze the metadata, such as analyzing the product log file (log), the log may include a lot of information level information, which we are generally not too concerned about, we are mainly concerned about some of the special Debug (Debug) level information, so there is a need to filter out the log file according to many keywords in our relationship with the information, so that the filtered log file not only has the continuity, and readability will be very good.

prescription

re is Python comes with a regular expression library file for string matching screening provides a great convenience, this article is to use re to log file information screening. First of all, a brief look at the main functions in re:

1. Compile(pattern, flag): Compile the regular expression, than check the correctness of the syntax. flag is the label for compilation, only DOTALL is introduced here, which means match all characters, including new lines.

>>> import re
>>> ('[abc]+')
('[abc]+')
>>> (test)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
NameError: name 'test' is not defined
>>>

2. match(): From the beginning of the target string to determine whether the regular expression matches, if it does not match return None, otherwise, return the matching object, including the start position, the end position, the content of the string

>>> import re
>>> test = ('[abc]+')
>>> ('dabc')
>>> ('babc')
<_sre.SRE_Match object; span=(0, 4), match='babc'>

test is a regular expression compiled object starting with a or b or c, and match is matched from the beginning of the target string, so the first target string "dabc" doesn't conform to the regular expression rule, so it returns None; the second target string can be matched normally with the output match object (start position, match content), since match matches from the beginning of the target string every time, so if there is any match string, its start position is always 0. position, match content), because match matches from the beginning of the target string every time, so if there is a match string, its start position is always 0.

3. search: Similar to match, search scans the full target string for regular expression matching.

>>> import re
>>> test = ('[abc]+')
>>> ('dabc')
<_sre.SRE_Match object; span=(1, 4), match='abc'>
>>>

Then use search to match strings starting with a, b, and c

4. findall: Finds all matches in the target string and returns them as a list

>>> test = ('\w+@')
>>> (r"[email protected]@@")
['alvin@', 'test1234@']

Of course, there are many other functions available to you in re, so you can check the official python documentation.

Next, a few common symbols for regular expressions are introduced:

1. *: means match zero or more of the preceding characters
2. .: Indicates that all characters beyond the new line are matched
3. |: Representation or operation
4. +: means that it matches its immediate neighbors one or more times
5. ?: Indicates 0 or 1 match

Other representations of regular expressions can also be found in the official website documentation.

Finally, on to this simple screening program:

import re
source = ''
target = ''
# Level 1 screening
raw_compile = (r"<g2s:g2sMessage.*?</g2s:g2sMessage>",)
# Secondary screening
messagelevel_compile = (r"<igtLicensing.*|<g2s:idReader.*",)
# Secondary screening
egmlevel_compile = (r"IGT_00012E2335AA.*",)
def FilterG2SMessage():
  fr = open(source)
  content = ()
  ()
  f = open(target,'w')
  g2sItems = raw_compile.findall(content)
  for g2s in g2sItems:
    iscaredG2S = messagelevel_compile.search(g2s)
    isCaredEGM = egmlevel_compile.search(g2s)
    if iscaredG2S and isCaredEGM:
      (g2s+'\n')
    else:
      pass
  ()
FilterG2SMessage()

The procedure is very simple, in the screening process you can first analyze the level of screening, you can screen step by step.

Summary:

re not only provides regular expression matching, but also provides functions for batch processing, such assplit，sub，subnAnd so on, all these functions can improve our fast processing of the contents of the file, saving time.

PS: Here are 2 more very convenient regular expression tools for your reference:

JavaScript regular expression online test tool:
http://tools./regex/javascript

Regular expression online generation tool:
http://tools./regex/create_reg

More about Python related content can be viewed on this site's topic: thePython Regular Expression Usage Summary》、《Python Data Structures and Algorithms Tutorial》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniques》、《Python introductory and advanced classic tutorialsand theSummary of Python file and directory manipulation techniques》

I hope that what I have said in this article will help you in Python programming.