Learning String Functions in Python 3.2 Summary

Sequence Types

sequenceThere are six types：strings, byte sequences (bytes objects), byte arrays(bytearray objects), list, tuple, range objects.

A generic operation that is supported by all sequence types:
Member check: in, not in
Connection: +
Reproduction:*
Subscript value: s[i]
Slice: s[i : j]
Length check: len(s)
Minimum: min(s)
Maximum value: max(s)
Index value: (i)
String statistics: (i)

String Methods

A judgment class method that usually returns a boolean value:

(suffix[, start[, end]])：

If or not the string ends with the specified suffix, return True or False. start and end specify the starting range of the judgment, default is full string. For example:

Copy Code The code is as follows.

'abcde'.endswith('de')   -->True

'abcde'.endswith('de', 0, 3)   -->Flase

(prefix[, start[, end]])：

In contrast to (), determines whether the string begins with the specified prefix.

()：

Determines whether all of the alphabetic characters in a string are lowercase, this method only determines the alphabetic characters in the string and disregards the other characters. The string must contain at least one alphabetic character, otherwise it returns False. such as:

Copy Code The code is as follows.

'China'.islower() -->False

'ab China'.islower() -->True

()：

In contrast to the () method, determines whether all alphabetic characters are all uppercase.

()：

Determines whether the first letter of each word of a string is capitalized. The string must contain at least one alphabetic character, otherwise it returns False. even if the first alphabetic character is preceded by a non-alphabetic character, such as a Chinese character, a number, an underscore, etc., it does not affect the judgment of the first alphabetic character.

Copy Code The code is as follows.

'China'.istitle() -->False //String does not contain letters, return False

'ChinaAbc'.istitle() -->True // Returns True even though the initial character A is preceded by a non-alphanumeric character

'-Abc xyz'.istitle() -->False // the first letter of the latter word is not capitalized, return False

()：

Determines whether a string contains only alphanumeric characters, the string contains only Chinese characters in accordance with the law. If the string contains spaces, underscores, ~ and other non-literal numeric characters, all return False. such as:

Copy Code The code is as follows.

‘3'.isalnum()   -->True

'China'.isalnum() -->True

‘-'.isalnum()   -->False

Note: alphanumberic is a special word that indicates that the string consists of either numeric or literal characters. For example, '3' includes a numeric character, 'a' includes a literal character, and '3a' includes both numeric and alphanumeric characters.

()：
Determine whether the string contains only literal characters, the string contains only Chinese characters in accordance with the law. As:

Copy Code The code is as follows.

'China'.isalpha() -->True

‘3'.isalpha()   -->False

()：

Determine whether the string is a legal identifier, the string contains only Chinese characters in accordance with the law, in fact, here to determine whether the variable name is legal. For example:

Copy Code The code is as follows.

‘_a'.isidentifier()   -->True

‘3a'.isidentifier()   -->False

'China'.isidentifier() -->True

()：

Determines whether all characters contained in the string are printable. A string containing non-printable characters, such as escape characters, will return False.

()：

Determines whether a string contains only spaces or tabs. Note: A space character is not the same as a blank, as in:

Copy Code The code is as follows.

''.isspace()   -->False

' '.isspace()   -->True

()：

Determines whether a string contains only decimal numeric characters, including multi-language decimal numeric character representations. Such as:

Copy Code The code is as follows.

‘3'.isdecimal()   -->True

‘\u0660'.isdeciaml()   -->True

References to decimal numeric forms in other languages:/info/unicode/category/Nd/

()：

Determines whether a string contains only digits, where digits include decimal numbers and other special numbers (such as superscript numbers, etc.). Generally, a number is a character with the following attribute value: Numeric_Type=Digit or Numeric_Type=Decimal.

()：

Determines whether a string contains only numeric characters. There is a wide range of numeric characters. In general, a numeric character is a character that has the following attribute values: Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.
Comparing isdecimal(), isdigit(), and isnumeric(), the range detected by several methods is expanded in turn.

Format class method that returns a formatted new string:

(encoding=”utf-8”, errors=”strict”)：

Encodes the string in utf-8 format.

()：

Converts all alphabetic characters to lowercase, ignoring other non-alphabetic characters. Strings with all non-alphabetic characters are also legal, but return the original string. E.g.:

Copy Code The code is as follows.

'China 123ABC'.lower() --> 'China 123abc'

'China123'.lower() -->'China123' //will not report error, return original string

()：

The opposite of (), which converts all alphabetic characters to uppercase. E.g.:

Copy Code The code is as follows.

'China 123abc'.upper() --> 'China 123ABC'

'China123'.upper() -->'China123'

()：

Swap upper and lower case letters in a string, converting upper case to lower case and lower case to upper case. Leave non-alphabetic characters alone. Such as:

Copy Code The code is as follows.

'China 123Ab'.swapcase() -->'China 123aB'

'China123'.swapcase() -->'China123' //will not report error, return original string

()：

The first character of the string is uppercase, the rest is lowercase. If the first character of the string is a non-alphabetic character, the original string is returned. Strings containing only non-alphabetic characters are legal, but the original string is returned. Such as:

Copy Code The code is as follows.

'ab cd'.capitalize() -->'Ab cd' //convert only the first letter of the string

'China ab 123cd'.capitalize() -->'China ab 123cd' // first character is a non-alphabetic character, return original string

'China 123'.capitalize() -->'China 123' //will not report an error, return original string

()：

The first letter of each word in a string is capitalized and the rest is lowercase. The fact that the first character of a word is a non-alphabetic character does not affect the conversion. Strings containing only non-alphabetic characters are legal, but return the original string. E.g.:

Copy Code The code is as follows.

'ab cd'.title() -->'Ab Cd' // capitalize the first letter of each word in the string

'China ab 123cd'.title() -->'China Ab 123Cd' //convert even if the first character is a non-alphabetic character

'China 123'.title() -->'China 123'

(width[,fillchar])：

Returns a new string centered on the original string with length width greater than len(str), otherwise return the original string, use fillchar to fill in the beginning and the end of the original string, default is space.
Note: When width is an even number, fillchar will fill evenly to the beginning and end of the original string; when it is an odd number, fillchar will preferentially fill the front. As:

Copy Code The code is as follows.

'abcd'.center(3)   -->'abcd'

'abcd'.center(8） -->'  abcd  '

'abcd'.center(8, *)   -->'**abcd**'

'abcd'.center(7, *)   -->'**abcd*'

(width[, fillchar])：

Returns a string of length width, left-justified, rightmost padding fillchar, defaults to space. width should be greater than len(str), otherwise the original string is returned. Such as:

Copy Code The code is as follows.

‘abcd'.ljust(10) -->'abcd ‘

(width[, fillchar])：

Similar to (), but it returns a right-aligned string with fillchar filled in at the far left.

([chars])：

Returns a new string with the leading characters removed. chars is a string containing the set of all characters to be removed. The default is spaces.
Note: There are a lot of articles on the lstrip function (including rstrip and strip), but they are not clear. What it actually means is that, from the original string to the left, matching chars contained in all the characters, until the first non-chars characters encountered, the original string to match all the characters are removed.

Copy Code The code is as follows.

‘'.lstrip(‘cmowz.') -->

Starts at the leftmost part of the string and matches until it encounters the non-chars character e. It matches three w characters and one . character, and ends with the e character.

Copy Code The code is as follows.

'xyxxyy testyx yx yyx'.lstrip('xy ') -->'testyx yx yyx'

Starting from the left side of the string, until the non-chars character t is encountered, a total of three x, three y, and a space, and t is encountered at the end of the match.

([chars])：
In contrast to (), match from the far right.

Copy Code The code is as follows.

'xyxxyy testyx yx yyx'.rstrip('xy ') -->'xyxxyy test'

([chars])：
Match from both ends of the string.

Copy Code The code is as follows.

'xyxxyy testyx yx yyx'.strip('xy ') -->test

([tabsize])：
Replace all tabs in the string with zero or more spaces, each tab is replaced with how many spaces, by the tabs in the string position and tabsize together to determine. tabsize specifies that each tab is replaced with the number of spaces, the default is 8. For example:

Copy Code The code is as follows.

'\t\t this\tis test.'.expandtabs(8)   -->'                 this    is test.'

In the above example, the first two \t, each replaced by 8 spaces, while the third \t seems to be replaced by only 4. In fact, it is not, because the tabs are counted from the beginning of each line, so the tab of the third tab is the 24th position from the beginning of the line, just before the i in is, not the 8th position from the end of this. This is called co-determination.

(width)：
Returns a numeric string of length width, leftmost padded with 0. If width is less than or equal to the original string length, then the original string is returned. Mainly used for formatting numeric strings. Such as:

Copy Code The code is as follows.

'abc'.zfill(5) --> '00abc' // generally don't do this formatting, doesn't make much sense

'123'.zfill(5)   --> '00123'

Find & replace class methods:

(sub[, start[, end]])：

Counts the number of substrings in a character. start and end specify the range of statistics, if not specified, the default is to count the whole string. For example:

Copy Code The code is as follows.

'abcdabac'.count('ab')   -->2

'abcdabac'.count('ab', 2,)   -->1

(sub[, start[, end]])：
Finds the first occurrence of the substring in the string. start and end specify a search range. Returns -1 if not found.

Copy Code The code is as follows.

'0123234'.find('23')   -->2

'0123234'.find('23', 1)   -->2

Note: 1, find is a sub-string in the full string to find the first position, match the string on the end of the search, regardless of the back of the string there is no match.
2, find is to find the substring in the full string appears in the first position, rather than the first position in the specified slice.
3, if you only want to determine whether the substring is in a string, with in judgment can be, without find.

(sub[, start[, end]])：
with the same method find, return to the index position of the specified substring, but rfind from the right side of the string to start looking for, can not find when the return -1. Note: from the right side to start looking for, but the index position is from the left side of the original string from the beginning of the calculation. For example:

Copy Code The code is as follows.

'ABCDEEF'.find('E') -->4 // start searching from the leftmost, from A to the end of the E after the first D, return index value 4

'ABCDEEF'.rfind('E') -->5 // start searching from the far right, from A to the end of the E before the first F, return index value 5

(*args, **kwargs)：
The string in which the fortmat method is called contains not only plain text, but also replacement fields that are included using the {} delimiter. Replacement fields can be either numeric indices of positional parameters or Key values of dictionaries or properties. All replacement fields in the string returned by this method are replaced by the values of the corresponding parameters. Such as:

Copy Code The code is as follows.

‘User ID: {0}'.format(‘root')   -->User ID: root

‘User ID: {UID}  Last login: {last_login}'.format(UID = ‘root', last_login = ‘5 Mar 2012')   -->User ID: root   Last login: 5 Mar 2012

(sub[, start[, end]])：

Similar to (), but returns raised ValueError if no substring is found.

(sub[, start[, end]])：

Similar to (), but returns raises ValueError if not found.

(old, new[, count])：
Returns a new string, the old in the original string is replaced with new, country specifies the number of replacements. Such as:

Copy Code The code is as follows.

'AAABBBCCC'.replace('A', 'D')   -->DDDBBBCCC

'AAABBBCCC'.replace('A', 'D', 2)   -->DDABBBCCC

static (x[, [y, z]])：
I don't really understand this method, especially since it also has a static modifier.
Roughly, its purpose is to return a conversion table for use by the () method, and the two methods are often used in conjunction.
As:

Copy Code The code is as follows.

table = ('cs', 'kz')
"please don't knock at my door!".translate(table) --> "pleaze don't knokk at my door!" //'c' is replaced with k, and 's' is replaced with z. Explain that the arguments can contain more than one character, but the number of characters contained in the first argument and the number of characters contained in the second argument must be equal. must be equal.

table = ('cs', 'kz', 'o')
"please don't knock at my door!".translate(table) --> "pleaze dn't knkk at my dr!" //If there are three arguments, the third argument means to delete the corresponding character in the original string.

(map)：

Works with the () function to replace the corresponding character.

Split & Combine class methods:

(sep)：

This method is used to split a string, returning a tuple with three elements. If Sep is not found in the original string, the three elements of the tuple are: the original string, the empty string, and the empty string; otherwise, the split starts from the first Sep character encountered in the original string, and the three elements of the tuple are: the string before Sep, the Sep character, and the string after Sep; as:

Copy Code The code is as follows.

'abcdee'.partition('f')   --> ('abcdee', '', '')

'abcdee'.partition('e')   --> ('abcd', 'e', 'e')

(sep)：

In contrast to (), it splits the original string starting from the rightmost part of the original string, but also returns a tuple containing three elements: the string before the penultimate Sep, the Sep character, and the string after Sep.
Note that "the string before the countdown Sep", this previous string, is counted from the leftmost part of the original string, not the rightmost part. For example:

Copy Code The code is as follows.

'abcdee'.rpartition('e') --> ('abcde', 'e', '') //The three elements of the split are: the element before the penultimate e, the e itself, and the element after the e, in addition to the space

'abcdee'.rpartition('f') --> ('', '', 'abcdee') //The three elements of the split are: spaces, whitespace, original string

([sep[, maxsplit]])：

Returns a Sep-separated list. maxsplit specifies the number of splits (so the number of elements in the list is maxsplit + 1). sep defaults to a space, and maxsplit does not limit the number of splits.
Note: 1) If Sep is not specified or Sep is specified as None (''), the space at the end of str will be discarded; if Sep is specified (regardless of whether Sep can be found in the original string), the space at the end of str will be retained
2) If Sep is not found in the original string, a list containing only one element, which is the original string, is returned.
As:

Copy Code The code is as follows.

' abcbdbee '.split() --> ['abcbdbee'] // unspecified Sep, returns a list with only one element, discarding spaces at both ends of str

' abcbdbee '.split('f') --> [' abcbdbee '] //specify f as Sep (though can't find f), return a list with only one element, keeping the spaces at both ends

' abcbdbee '.split('b') --> [' a', 'c', 'd', 'ee '] //specify b as Sep, no limit on the number of splits, spaces at the ends of str are preserved

' abcbdbee '.split('b', 2) --> [' a', 'c', 'dbee '] // split twice with b as separator

Note: It's a bit like (), but () returns a tuple and the separator Sep is an element in the tuple; whereas (0 returns a list and the separator Sep is not in the list

([sep[, maxsplit]])：

Similar to (), except that it splits from the far right. You will only see the effect if maxsplit is specified. E.g.:

Copy Code The code is as follows.

'abcbdbee'.rsplit('b') --> ['a', 'c', 'd', 'ee'] //Not specifying maxsplit returns the same result as ()

'abcbdbee'.rsplit('b', 2) --> ['abc', 'd', 'ee'] // you can see the difference with ('b', 2)

(iterable)：

Use the concatenator str to concatenate elements of an iterable object, returning a string consisting of the elements of the iterable object concatenated by str. If a non-iterable object is passed in, such as an integer, boolean, etc., a Type Error is returned. e.g.:

Copy Code The code is as follows.

'A B'. join(['1', '2', 'China']) -->1A B2A B China

'A B'.join('12 China') -->1A B2A B China

‘A B'.join(123)   -->Type Error

Note: iterable object or iterator type's most important feature is that it supports two functions: __iter__() and __next__(), although it is not very accurate, but you can simply think that the data types that support the use of for statements to take values one by one are iterator objects.
sequence type（six：strings、byte objects、byte arrays、lists、tuples、range objects）cap (a poem)dictionaryall belong toiterableboyfriend。

([keepends])：

Splits a string containing multiple lines, returning a list with one element per line. If the string is not multiline, the original string is returned. depends is either a True character or a non-zero integer indicating that the end-of-line flag is preserved. This method is mostly used for processing files. For example:

Copy Code The code is as follows.

Line = ‘AB
CD
EF'''
() -->['AB', 'CD', 'EF']

Line = 'AB\nCD\nEF'
() -->['AB', 'CD', 'EF']

Line = 'AB\nCD\nEF'
(True) --> ['AB\n', 'CD\n', 'EF']