SoFunction
Updated on 2024-11-14

Python string and text mode methods in detail

I. You want to search and match the specified text pattern in a string

Missing point: the re module is actually an important tool to help us with string processing, I always thought of using the built-in functions before, but actually, if it's a complex text and data structure, the re module can help us process a lot of information.

For simple literal patterns, it's straightforward to use the () method, for example:

>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> ('yeah', 'yep')
'yep, but no, but yep, but no, but yep'
>>>

For complex patterns, use the sub() function in the re module. To illustrate this, suppose you want to change a date string of the form 11/27/2012 to 2012-11-27. Here's an example:

>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> import re
>>> (r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
'Today is 2012-11-27. PyCon starts 2013-3-13.'

Second, you need to search and replace text strings in a case-insensitive manner

In order to ignore case during text operations, you need to provide flag arguments to these operations when using the re module. For example:

>>> text = 'UPPER PYTHON, lower python, Mixed Python'
>>> ('python', text, flags=)
['PYTHON', 'python', 'Python']
>>> ('python', 'snake', text, flags=)
'UPPER snake, lower snake, Mixed snake'

That last example reveals a small flaw where the replacement string doesn't automatically match the case of the matched string. To fix this, you may need a helper function like the one below:

def matchcase(word):
  def replace(m):
    text = ()
    if ():
      return ()
    elif ():
      return ()
    elif text[0].isupper():
      return ()
    else:
      return word
  return replace

>>> ('python', matchcase('snake'), text, flags=)
'UPPER SNAKE, lower snake, Mixed Snake'

matchcase('snake') returns a callback function (the argument must be a match object), and the sub() function accepts a callback function in addition to the replacement string.

Third, you're trying to use regular expressions to match a large chunk of text that you need to match across multiple lines

>>> comment = (r'/\*(.*?)\*/')
>>> text1 = '/* this is a comment */'
>>> text2 = '''/* this is a
... multiline comment */
... '''
>>>
>>> (text1)
[' this is a comment ']
>>> (text2)

The () function accepts a flag parameter called , which is very useful here. It allows the dot (.) in a regular expression to match any character including newlines. For example:

>>> comment = (r'/\*(.*?)\*/', )
>>> (text2)
[' this is a\n multiline comment '] 

Four, you want to format the string with some kind of alignment

For basic string alignment, you can use the ljust(), rjust() and center() methods. For example:

>>> text = 'Hello World'
>>> (20)
'Hello World '
>>> (20)
' Hello World'
>>> (20)
' Hello World '
>>> (20,'=')
'=========Hello World'
>>> (20,'*')
'****Hello World*****'
>>>

The format() function can also be used to easily align strings. All you have to do is use the <,> or ^ characters followed by a specified width. For example:

>>> format(text, '>20')
' Hello World'
>>> format(text, '<20')
'Hello World '
>>> format(text, '^20')
' Hello World '
>>>

If you want to specify a non-space fill character, just write it in front of the alignment character:

>>> format(text, '=>20s')
'=========Hello World'
>>> format(text, '*^20s')
'****Hello World*****'
>>>

These format codes can also be used in the format() method when formatting multiple values. For example:

>>> x = 1.2345
>>> format(x, '>10')
' 1.2345'
>>> format(x, '^10.2f')
' 1.23 '
>>>

This is the whole content of this article.