In-depth summary of the use of strings in Python

Today we are going to learn about string datatypes, we will discuss how to declare string datatypes, the relationship between string datatypes and ASCII tables, the properties of string datatypes, and some important string methods and operations, super dry, not to be missed!

What is a Python string?

A string is an object containing a series of characters. A character is a string of length 1. In Python, a single character is also a string. Interestingly, however, there is no character datatype in the Python programming language, but there are character datatypes in other programming languages such as C, Kotlin, and Java.

We can declare Python strings using single quotes, double quotes, triple quotes, or the str() function. The following code snippet shows how to declare a string in Python:

# A single quote string
single_quote = 'a'  # This is an example of a character in other programming languages. It is a string in Python
# Another single quote string
another_single_quote = 'Programming teaches you patience.'
# A double quote string
double_quote = "aa"
# Another double-quote string
another_double_quote = "It is impossible until it is done!"
# A triple quote string
triple_quote = '''aaa'''
# Also a triple quote string
another_triple_quote = """Welcome to the Python programming language. Ready, 1, 2, 3, Go!"""
# Using the str() function
string_function = str(123.45)  # str() converts float data type to string data type
# Another str() function
another_string_function = str(True)  # str() converts a boolean data type to string data type
# An empty string
empty_string = ''
# Also an empty string
second_empty_string = ""
# We are not done yet
third_empty_string = """"""  # This is also an empty string: ''''''

Another way to get strings in Python is to use the input() function. input() allows us to use the keyboard to insert input values into the program. The inserted values are read as strings, but we can convert them to other data types:

# Inputs into a Python program
input_float = input()  # Type in: 3.142
input_boolean = input() # Type in: True
# Convert inputs into other data types
convert_float = float(input_float)  # converts the string data type to a float
convert_boolean = bool(input_boolean) # converts the string data type to a bool

We determine the datatype of an object in Python using the type() function, which returns the object's class. When the object is a string, it returns the class str. Similarly, it returns the dict, int, float, tuple, and bool classes when the object is a dictionary, integer, float, tuple, or boolean, respectively. Now let's use the type() function to determine the data type of the variable declared in the previous code snippet:

# Data types/ classes with type()
print(type(single_quote))
print(type(another_triple_quote))
print(type(empty_string))
print(type(input_float))
print(type(input_boolean))
print(type(convert_float))
print(type(convert_boolean))

ASCII Tables and Python String Characters

The American Standard Code for Information Interchange (ASCII) is designed to help us map characters or text to numbers because the set of numbers is easier to store in a computer's memory than text.ASCII encodes 128 characters primarily in English and is used to process information in computers and programming.The English characters encoded in ASCII include the lowercase letters (a-z), the uppercase letters (A-Z), the numbers ( ASCII encoded English characters include lowercase letters (a-z), uppercase letters (A-Z), numbers (0-9), and punctuation marks.

The ord() function converts a Python string of length 1 (one character) to its decimal representation on the ASCII table, and the chr() function converts the decimal representation back to a string. Example:

import string
# Convert uppercase characters to their ASCII decimal numbers
ascii_upper_case = string.ascii_uppercase  # Output: ABCDEFGHIJKLMNOPQRSTUVWXYZ
for one_letter in ascii_upper_case[:5]:  # Loop through ABCDE
    print(ord(one_letter))

Output:

65
66
67
68
69

# Convert digit characters to their ASCII decimal numbers
ascii_digits =   # Output: 0123456789
for one_digit in ascii_digits[:5]:  # Loop through 01234
    print(ord(one_digit))

Output:

48
49
50
51
52

In the code snippet above, we iterated through the strings ABCDE and 01234 and converted each character to their decimal representation in the ASCII table. We can also use the chr() function to perform the reverse operation, which converts the decimal numbers in the ASCII table to their Python string characters. For example:

decimal_rep_ascii = [37, 44, 63, 82, 100]
for one_decimal in decimal_rep_ascii:
    print(chr(one_decimal))

Output:

%
,
?
R
d

In the ASCII table, the string characters in the output of the above program are mapped to their respective decimal numbers

String Properties

Zero indexing: The first element in a string has an index of zero, and the last element has an index of len(string) - 1. For example:

immutable_string = "Accountability"
print(len(immutable_string))
print(immutable_string.index('A'))
print(immutable_string.index('y'))

Output:

14
0
13

Invariance: This means that we can't update characters in a string. For example we can't delete an element from the string or try to assign a new element at any of its index positions. If we try to update the string, it will throw a TypeError:

immutable_string = "Accountability"
# Assign a new element at index 0
immutable_string[0] = 'B'

Output:

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11336/ in
2
3 # Assign a new element at index 0
----> 4 immutable_string[0] = 'B'
TypeError: 'str' object does not support item assignment

But we can reassign strings to the immutable_string variable, though we should note that they're not the same string because they don't point to the same object in memory.Python doesn't update the old string object; it creates a new one, as we saw with the ids:

immutable_string = "Accountability"
print(id(immutable_string))
immutable_string = "Bccountability"
print(id(immutable_string)
test_immutable = immutable_string
print(id(test_immutable))

Output:

2693751670576
2693751671024
2693751671024

The two ids above are also not the same on the same computer, which means that both immutable_string variables point to different addresses in memory. We assign the last immutable_string variable to the test_immutable variable. You can see that the test_immutable variable and the last immutable_string variable point to the same address

Connections: Joins two or more strings together to get a new string with a + symbol. Example:

first_string = "Zhou"
second_string = "luobo"
third_string = "Learn Python"
fourth_string = first_string + second_string
print(fourth_string)
fifth_string = fourth_string + " " + third_string
print(fifth_string)

Output:

Zhouluobo
Zhouluobo Learn Python

Repeat: Strings can be repeated with the * symbol. Example:

print("Ha" * 3)

Output:

HaHaHa

Indexing and slicing: We've already established that strings are indexed from zero, and we can access any element of a string using its index value. We can also access a subset of the string by slicing between two index values. Example:

main_string = "I learned English and Python with ZHouluobo. You can do it too!"
# Index 0
print(main_string[0])
# Index 1
print(main_string[1])
# Check if Index 1 is whitespace
print(main_string[1].isspace())
# Slicing 1
print(main_string[0:11])
# Slicing 2:
print(main_string[-18:])
# Slicing and concatenation
print(main_string[0:11] + ". " + main_string[-18:])

Output:

I
True
I learned English
You can do it too!
I learned English. You can do it too!

string method

(sep=None, maxsplit=-1)： The string split method contains two attributes: sep and maxsplit, which, when called with their default values, split the string wherever there are spaces. This method returns a list of strings:

string = "Apple, Banana, Orange, Blueberry"
print(())

Output:

['Apple,', 'Banana,', 'Orange,', 'Blueberry']

We can see that the string is not well split because the split string contains the,. We can use thesep=',' possess, place for splitting:

print((sep=','))

Output:

['Apple', ' Banana', ' Orange', ' Blueberry']

This is better than the previous split, but we can see spaces before some split strings. It can be removed using (sep=', '):

# Notice the whitespace after the comma
print((sep=', '))

Output:

['Apple', 'Banana', 'Orange', 'Blueberry']

Now the string is nicely split. Sometimes we don't want to split the string the maximum number of times, we can use the maxsplit attribute to specify how many times we want to split it:

print((sep=', ', maxsplit=1))
print((sep=', ', maxsplit=2))

Output:

['Apple', 'Banana, Orange, Blueberry']
['Apple', 'Banana', 'Orange, Blueberry']

(keepends=False)： Sometimes we want to process a corpus with different line breaks ('\n', \n\n', '\r', '\r\n') at the boundaries. We want to split it into sentences rather than individual words. This can be done using the splitline method. When keepends=True, line breaks are included in the text; otherwise they are excluded

import nltk  # You may have to `pip install nltk` to use this library.
macbeth = ('')
print((keepends=True)[:5])

Output:

['[The Tragedie of Macbeth by William Shakespeare 1603]\n', '\n', '\n', 'Actus Primus. Scoena Prima.\n', '\n']

([chars])： We use the strip method to remove trailing spaces or characters from both sides of a string. For example:

string = "    Apple Apple Apple no apple in the box apple apple             "
stripped_string = ()
print(stripped_string)
left_stripped_string = (
    stripped_string
    .lstrip('Apple')
    .lstrip()
    .lstrip('Apple')
    .lstrip()
    .lstrip('Apple')
    .lstrip()
)
print(left_stripped_string)
capitalized_string = left_stripped_string.capitalize()
print(capitalized_string)
right_stripped_string = (
    capitalized_string
    .rstrip('apple')
    .rstrip()
    .rstrip('apple')
    .rstrip()
)
print(right_stripped_string)

Output:

Apple Apple Apple no apple in the box apple apple
no apple in the box apple apple
No apple in the box apple apple
No apple in the box

In the above code snippet, we have used lstrip and rstrip methods which remove trailing spaces or characters from the left and right side of the string respectively. We also used the capitalize method, which converts the string to sentence case(width)： The zfill method fills the string with a 0 prefix to obtain the specified width. Example:

example = "0.8"  # len(example) is 3
example_zfill = (5) # len(example_zfill) is 5
print(example_zfill)

Output:

000.8

()： This method returns True if all characters in the string are letters; otherwise it returns False:

# Alphabet string
alphabet_one = "Learning"
print(alphabet_one.isalpha())
# Contains whitspace
alphabet_two = "Learning Python"
print(alphabet_two.isalpha())
# Contains comma symbols
alphabet_three = "Learning,"
print(alphabet_three.isalpha())

Output:

True
False
False

() returns True if the string characters are alphanumeric, () returns True if the string characters are decimal, () returns True if the string characters are numeric, and () returns True if the string characters are numeric.

If all characters in the string are lowercase, the() Returns True; if all the characters in the string are uppercase, the() Returns True; if the first letter of each word is capitalized, the() Returns True:

# islower() example
string_one = "Artificial Neural Network"
print(string_one.islower())
string_two = string_one.lower()  # converts string to lowercase
print(string_two.islower())
# isupper() example
string_three = string_one.upper() # converts string to uppercase
print(string_three.isupper())
# istitle() example
print(string_one.istitle())

Output:

False
True
True
True

(suffix) Returns True is a string ending with the specified suffix. If the string starts with the specified prefix, the(prefix) Returns True:

sentences = ['Time to master data science', 'I love statistical computing', 'Eat, sleep, code']
# endswith() example
for one_sentence in sentences:
    print(one_sentence.endswith(('science', 'computing', 'Code')))

Output:

True
True
False

# startswith() example
for one_sentence in sentences:
    print(one_sentence.startswith(('Time', 'I ', 'Ea')))

Output:

True
True
True

(substring) If the substring exists in the string, it returns the lowest index; otherwise it returns -1.(substring) Returns the highest index. If found, the(substring) cap (a poem)(substring) also returns the lowest and highest indexes of the substring, respectively. If the substring does not exist in the string, a ValueError is raised

string = "programming"
# find() and rfind() examples
print(('m'))
print(('pro'))
print(('m'))
print(('game'))
# index() and rindex() examples
print(('m'))
print(('pro'))
print(('m'))
print(('game'))

Output:

6
0
7
-1
6
0
7
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11336/ in
11 print(('pro')) # Output: 0
12 print(('m')) # Output: 7
---> 13 print(('game')) # Output: ValueError: substring not found
ValueError: substring not found

(dict_map) Creates a table of translations from a dictionary map, (maketrans) replacing elements in the translation with their new values. For example:

example = "abcde"
mapped = {'a':'1', 'b':'2', 'c':'3', 'd':'4', 'e':'5'}
print(((mapped)))

Output:

12345

string operation

Loop over a string

Strings are iterable, so they support looping operations using for loops and enumerations:

# For-loop example
word = "bank"
for letter in word:
    print(letter)

Output:

b
a
n
k

# Enumerate example
for idx, value in enumerate(word):
    print(idx, value)

Output:

0 b
1 a
2 n
3 k

Strings and Relational Operators

When comparing two strings using the relational operators (>, <, ==, etc.), the elements of the two strings are compared indexed one by one by their ASCII decimal numbers. Example:

print('a' > 'b')
print('abc' > 'b')

Output:

False
False

In both cases, the output is False. The relational operator first compares the ASCII decimal numbers of the elements of the two strings at index 0. Since b is greater than a, it returns False; in this case, the other elements are irrelevant to the length of the string. Since b is greater than a, False is returned; in this case, the ASCII decimal number of the other element and the length of the string are irrelevant!

When strings are the same length, it compares the ASCII decimal number of each element starting at index 0 until it finds an element with a different ASCII decimal number. For example:

print('abd' > 'abc')

Output:

True

Checking String Membership

The in operator is used to check whether a substring is a member of a string:

print('data' in 'dataquest')
print('gram' in 'programming')

Output:

True
True

Another way to check string membership, replace substrings, or match patterns is to use regular expressions

import re
substring = 'gram'
string = 'programming'
replacement = '1234'
# Check membership
print((substring, string))
# Replace string
print((substring, replacement, string))

Output:

pro1234ming

string format

The f-string and () methods are used to format strings. Both use curly brace {} placeholders. For example:

monday, tuesday, wednesday = "Monday", "Tuesday", "Wednesday"
format_string_one = "{} {} {}".format(monday, tuesday, wednesday)
print(format_string_one)
format_string_two = "{2} {1} {0}".format(monday, tuesday, wednesday)
print(format_string_two)
format_string_three = "{one} {two} {three}".format(one=tuesday, two=wednesday, three=monday)
print(format_string_three)
format_string_four = f"{monday} {tuesday} {wednesday}"
print(format_string_four)

Output:

Monday Tuesday Wednesday
Wednesday Tuesday Monday
Tuesday Wednesday Monday
Monday Tuesday Wednesday

f-strings are more readable, and they are faster to implement than the () method. Therefore, f-string is the preferred method of string formatting

Handling quotes and apostrophes

An apostrophe (') represents a string in Python. In order for Python to know that we're not dealing with a string, we have to use the Python escape character (). So an apostrophe is represented as ' in Python. Unlike dealing with apostrophes, there are a number of ways to deal with quotes in Python. They include the following:

# 1. Represent string with single quote (`""`) and quoted statement with double quote (`""`)
quotes_one =  '"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun'
print(quotes_one)
# 2. Represent string with double quote `("")` and quoted statement with escape and double quote `(\"statement\")`
quotes_two =  "\"Friends don\'t let friends use minibatches larger than 32\" - Yann LeCun"
print(quotes_two)
# 3. Represent string with triple quote `("""""")` and quoted statment with double quote ("")
quote_three = """"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun"""
print(quote_three)

Output:

"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun

The above is a deep summary of the use of strings in Python in detail, more information about Python strings please pay attention to my other related articles!