SoFunction
Updated on 2024-11-21

Python's method for merging spliced strings

concern

You want to merge several small strings into one large string

prescription

If the strings you want to merge are in a sequence or iterable, the fastest way to do so is to use the join() method. For example:

>>> parts = ['Is', 'Chicago', 'Not', 'Chicago?']
>>> ' '.join(parts)
'Is Chicago Not Chicago?'
>>> ','.join(parts)
'Is,Chicago,Not,Chicago?'
>>> ''.join(parts)
'IsChicagoNotChicago?'
>>>

At first glance, this syntax can look odd, but join() is specified as a method on a string. Part of the reason for this is that the objects you want to join may come from a variety of different data sequences (such as lists, tuples, dictionaries, files, collections, or generators), and defining a join() method on all of them would obviously be redundant. So you just specify the split string you want and call its join() method to combine the text fragments.

If you're only merging a few strings, using the plus sign (+) is usually enough:

>>> a = 'Is Chicago'
>>> b = 'Not Chicago?'
>>> a + ' ' + b
'Is Chicago Not Chicago?'
>>>

The plus (+) operator also usually works well when used as an alternative to some complex string formatting, for example:

>>> print('{} {}'.format(a,b))
Is Chicago Not Chicago?
>>> print(a + ' ' + b)
Is Chicago Not Chicago?
>>>

If you want to merge two literal strings together in the source code, you simply put them together without the plus sign (+). For example:

>>> a = 'Hello' 'World'
>>> a
'HelloWorld'
>>>

talk over

String merging may not seem like it needs to be discussed in an entire section. But it shouldn't be underestimated; programmers often suffer serious performance losses to their applications due to poor choices when formatting strings.

The most important thing to draw attention to is that it is very inefficient when we use the plus (+) operator to concatenate a large number of strings, because plus concatenation causes memory copying as well as garbage collection operations. In particular, you should never write string concatenation code like the following:

s = ''
for p in parts:
  s += p

This writeup will run slower than using the join() method because a new string object is created each time the += operation is performed. You're better off collecting all the string fragments before joining them.

A relatively clever trick is to merge strings while converting data to strings using generator expressions (refer to subsection 1.19), for example:

>>> data = ['ACME', 50, 91.1]
>>> ','.join(str(d) for d in data)
'ACME,50,91.1'
>>>

The same goes for unnecessary string concatenation operations. Sometimes programmers make unnecessary string concatenation operations when there is no need to do so. For example, when printing:

print(a + ':' + b + ':' + c) # Ugly
print(':'.join([a, b, c])) # Still ugly
print(a, b, c, sep=':') # Better

When mixing I/O operations with string concatenation operations, it is sometimes necessary to take a closer look at your program. For example, consider the following code snippet on both ends:

# Version 1 (string concatenation)
(chunk1 + chunk2)

# Version 2 (separate I/O operations)
(chunk1)
(chunk2)

If the two strings are very small, then the first version will perform better because I/O system calls are inherently slow. On the other hand, if the two strings are very large, then the second version may be more efficient because it avoids creating a very large temporary result and copying a large block of memory. As always, sometimes it's a matter of deciding which solution to use based on the characteristics of your application.

As a final note, if you're going to write output code that builds a lot of small strings, you might want to consider using generator functions that utilize yield statements to produce output fragments. For example:

def sample():
  yield 'Is'
  yield 'Chicago'
  yield 'Not'
  yield 'Chicago?'

An interesting aspect of this approach is that it makes no assumptions about how the output fragments are actually going to be organized. For example, you could simply use the join() method to combine the fragments:

text = ''.join(sample())

Or you can redirect string fragments to I/O:

for part in sample():
  (part)

Then again you could write some hybrid solution that combines I/O operations:

def combine(source, maxsize):
  parts = []
  size = 0
  for part in source:
    (part)
    size += len(part)
    if size > maxsize:
      yield ''.join(parts)
      parts = []
      size = 0
    yield ''.join(parts)

# Combined file operations
with open('filename', 'w') as f:
  for part in combine(sample(), 32768):
    (part)

The key point here is that the original generator function doesn't need to know the details of its use; it's just responsible for generating the string fragment.

Above is the details of Python merge splice string method, more information about Python merge splice string please pay attention to my other related articles!