I came across a small requirement online that needed to be handled with regular expressions. The original requirement is as follows:
Find out the text contains "because ...... so" sentence, and two words as the center of the alignment output before and after the three words, the middle of the full output, if "because" and "so" there is also "because" "so" in the middle, but also to find out, another line, the output format is: "because" and "so" in the middle. If "because" and "so" in the middle there is also "because" "so", but also to find out, another line, the output format:
Line number First 3 words *Because* All & So & Next 3 words (punctuation counts as one word)
2 Not yet *because* it's nice here, & so & nobody
The method of realization is as follows:
#encoding:utf-8 import os import re def getPairStriList(filename): pairStrList = [] textFile = open(filename, 'r') pattern = (u'.{3}\u56e0\u4e3a.*\u6240\u4ee5.{3}') #u'\u56e0\u4e3a and u'\u6240\u4ee5' are the utf8 codes for "because" and "so" respectively. for line in textFile: utfLine = ('utf8') result = (utfLine) while result: resultStr = () (resultStr) result = (resultStr,2,len(resultStr)-2) # Formatting and splicing each string for i in range(len(pairStrList)): pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'\u56e0\u4e3a',u' *\u56e0\u4e3a* ',1) + pairStrList[i][5:] pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'\u6240\u4ee5',u' &\u6240\u4ee5& ',1) pairStrList[i] = str(i+1) + ' ' + pairStrList[i] return pairStrList if __name__ == '__main__': pairStrList = getPairStriList('') for str in pairStrList: print str
PS: Here's a look at group nesting in python using regular expressions
Since a group is a complete regular expression in itself, groups can be nested within other groups to build more complex expressions. The following is an example of doing group nesting:
#python 3.6 #Cai Junsheng #/caimouse/article/details/51749579 # import re def test_patterns(text, patterns): """Given source text and a list of patterns, look for matches for each pattern within the text and print them to stdout. """ # Look for each pattern in the text and print the results for pattern, desc in patterns: print('{!r} ({})\n'.format(pattern, desc)) print(' {!r}'.format(text)) for match in (pattern, text): s = () e = () prefix = ' ' * (s) print( ' {}{!r}{} '.format(prefix, text[s:e], ' ' * (len(text) - e)), end=' ', ) print(()) if (): print('{}{}'.format( ' ' * (len(text) - s), ()), ) print() return
Example:
#python 3.6 #Cai Junsheng #/caimouse/article/details/51749579 # from re_test_patterns_groups import test_patterns test_patterns( 'abbaabbba', [(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')], )
The resultant output is as follows:
'a((a*)(b*))' (a followed by 0-n a and 0-n b) 'abbaabbba' 'abb' ('bb', '', 'bb') 'aabbb' ('abbb', 'a', 'bbb') 'a' ('', '', '')
summarize
The above is a small introduction to the use of regular expressions in python to find nestable groups of strings, I hope to help you, if you have any questions please leave me a message, I will promptly reply to you. Here also thank you very much for your support of my website!