Finding nestable groups of strings in python using regular expressions

I came across a small requirement online that needed to be handled with regular expressions. The original requirement is as follows:

Find out the text contains "because ...... so" sentence, and two words as the center of the alignment output before and after the three words, the middle of the full output, if "because" and "so" there is also "because" "so" in the middle, but also to find out, another line, the output format is: "because" and "so" in the middle. If "because" and "so" in the middle there is also "because" "so", but also to find out, another line, the output format:

Line number First 3 words *Because* All & So & Next 3 words (punctuation counts as one word)

2 Not yet *because* it's nice here, & so & nobody

The method of realization is as follows:

#encoding:utf-8
import os
import re
def getPairStriList(filename):
  pairStrList = []
  textFile = open(filename, 'r')
  pattern = (u'.{3}\u56e0\u4e3a.*\u6240\u4ee5.{3}') #u'\u56e0\u4e3a and u'\u6240\u4ee5' are the utf8 codes for "because" and "so" respectively.
  for line in textFile:
    utfLine = ('utf8')
    result = (utfLine)
    while result:
      resultStr = ()
      (resultStr)
      result = (resultStr,2,len(resultStr)-2)
  # Formatting and splicing each string
  for i in range(len(pairStrList)):
    pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'\u56e0\u4e3a',u' *\u56e0\u4e3a* ',1) + pairStrList[i][5:]
    pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'\u6240\u4ee5',u' &\u6240\u4ee5& ',1)
    pairStrList[i] = str(i+1) + ' ' + pairStrList[i]
  return pairStrList
  if __name__ == '__main__':
  pairStrList = getPairStriList('')
  for str in pairStrList:
    print str

PS: Here's a look at group nesting in python using regular expressions

Since a group is a complete regular expression in itself, groups can be nested within other groups to build more complex expressions. The following is an example of doing group nesting:

#python 3.6 
#Cai Junsheng
#/caimouse/article/details/51749579 
# 
import re 
def test_patterns(text, patterns): 
  """Given source text and a list of patterns, look for 
  matches for each pattern within the text and print 
  them to stdout. 
  """ 
  # Look for each pattern in the text and print the results 
  for pattern, desc in patterns: 
    print('{!r} ({})\n'.format(pattern, desc)) 
    print(' {!r}'.format(text)) 
    for match in (pattern, text): 
      s = () 
      e = () 
      prefix = ' ' * (s) 
      print( 
        ' {}{!r}{} '.format(prefix, 
                   text[s:e], 
                   ' ' * (len(text) - e)), 
        end=' ', 
      ) 
      print(()) 
      if (): 
        print('{}{}'.format( 
          ' ' * (len(text) - s), 
          ()), 
        ) 
    print() 
  return

Example:

#python 3.6 
#Cai Junsheng
#/caimouse/article/details/51749579 
# 
from re_test_patterns_groups import test_patterns 
test_patterns( 
  'abbaabbba', 
  [(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')], 
)

The resultant output is as follows:

'a((a*)(b*))' (a followed by 0-n a and 0-n b)
 'abbaabbba'
 'abb'    ('bb', '', 'bb')
   'aabbb'  ('abbb', 'a', 'bbb')
     'a' ('', '', '')

summarize

The above is a small introduction to the use of regular expressions in python to find nestable groups of strings, I hope to help you, if you have any questions please leave me a message, I will promptly reply to you. Here also thank you very much for your support of my website!