Summary of code samples implementing various sorting algorithms in Python

In Python practice, we often encounter sorting problems, such as in the sorting of scoring search results (without sorting there would be no search engines such as Google), and of course, there are countless examples of this. Data Structures also spends a lot of time on sorting. Some time ago, due to the need, I reviewed sorting algorithms and implemented various sorting algorithms in Python, which I put here as a reference.

There are three simplest sorts: Insertion Sort, Selection Sort, and Bubble Sort. These three kinds of sorting is relatively simple, their average time complexity are O(n^2), here on the principle will not repeat. Post the source code.

Insertion Sort:

def insertion_sort(sort_list):
  iter_len = len(sort_list)
  if iter_len < 2:
    return sort_list
  for i in range(1, iter_len):
    key = sort_list[i]
    j = i - 1
    while j >= 0 and sort_list[j] > key:
      sort_list[j+1] = sort_list[j]
      j -= 1
    sort_list[j+1] = key
  return sort_list

Bubble Sort:

def bubble_sort(sort_list):
  iter_len = len(sort_list)
  if iter_len < 2:
    return sort_list
  for i in range(iter_len-1):
    for j in range(iter_len-i-1):
      if sort_list[j] > sort_list[j+1]:
        sort_list[j], sort_list[j+1] = sort_list[j+1], sort_list[j]
  return sort_list

Selection Sorting:

def selection_sort(sort_list):
  iter_len = len(sort_list)
  if iter_len < 2:
    return sort_list
  for i in range(iter_len-1):
    smallest = sort_list[i]
    location = i
    for j in range(i, iter_len):
      if sort_list[j] < smallest:
        smallest = sort_list[j]
        location = j
    if i != location:
      sort_list[i], sort_list[location] = sort_list[location], sort_list[i]
  return sort_list

Here we can see sentences like this:

sort_list[i], sort_list[location] = sort_list[location], sort_list[i]
Students who don't know Python may find it strange, yes, this is the practice of exchanging two numbers, usually in other languages if you want to exchange the value of a and b, you often need an intermediate variable temp, first assign a to temp, then assign b to a, and finally assign temp to b. But in python you can write it like this: a, b = b, a, which is actually is because the left and right sides of the assignment symbol are tuples (it should be emphasized here that in python, tuples are actually defined by commas ",", not parentheses).

The algorithms that have an average time complexity of O(nlogn) are: subsumption sort, heap sort and quick sort.

Summation Sorting. For a subsequence, split into two, compare the first element of the two, the smaller one pops out, and then repeat the process. For the sequence to be sorted, split into left and right sequences with the median value, and then call it recursively again for each subsequence. The source code is as follows, written as a callable class because of the tool function:

class merge_sort(object):
  def _merge(self, alist, p, q, r):
    left = alist[p:q+1]
    right = alist[q+1:r+1]
    for i in range(p, r+1):
      if len(left) > 0 and len(right) > 0:
        if left[0] <= right[0]:
          alist[i] = (0)
        else:
          alist[i] = (0)
      elif len(right) == 0:
        alist[i] = (0)
      elif len(left) == 0:
        alist[i] = (0)

  def _merge_sort(self, alist, p, r):
    if p<r:
      q = int((p+r)/2)
      self._merge_sort(alist, p, q)
      self._merge_sort(alist, q+1, r)
      self._merge(alist, p, q, r)

  def __call__(self, sort_list):
    self._merge_sort(sort_list, 0, len(sort_list)-1)
    return sort_list

Heap sort, is built on a data structure, the heap. The basic concepts of the heap, and how the heap is stored are not described here. Here a list is used to store the heap (similar to using an array), for the element at position i, 2i + 1 position is its left child, 2i + 2 is its right child, similarly you can derive the element's parent element.

First we write a function that, for some subtree, starts at the root node and exchanges its value if it is less than the value of a child node. Use this method to recurse its subtrees. Next, we call the previously described function for all non-leaf nodes of the heap, from the bottom up, to get a tree which, for each node (non-leaf node), is greater than its children. (In fact, this is the process of establishing the largest heap) After the completion of the list of the first element and the last element to switch the order, so that the last bit of the list is the largest number, and then in the list of 0 to n-1 part of the list and then call the above process of establishing the largest heap. Finally, we get the heap sorted list. The following is the source code:

class heap_sort(object):
  def _left(self, i):
    return 2*i+1
  def _right(self, i):
    return 2*i+2
  def _parent(self, i):
    if i%2==1:
      return int(i/2)
    else:
      return i/2-1

  def _max_heapify(self, alist, i, heap_size=None):
    length = len(alist)

    if heap_size is None:
      heap_size = length

    l = self._left(i)
    r = self._right(i)

    if l < heap_size and alist[l] > alist[i]:
      largest = l
    else:
      largest = i
    if r < heap_size and alist[r] > alist[largest]:
      largest = r

    if largest!=i:
      alist[i], alist[largest] = alist[largest], alist[i]
      self._max_heapify(alist, largest, heap_size)

  def _build_max_heap(self, alist):
    roop_end = int(len(alist)/2)
    for i in range(0, roop_end)[::-1]:
      self._max_heapify(alist, i)

  def __call__(self, sort_list):
    self._build_max_heap(sort_list)
    heap_size = len(sort_list)
    for i in range(1, len(sort_list))[::-1]:
      sort_list[0], sort_list[i] = sort_list[i], sort_list[0]
      heap_size -= 1
      self._max_heapify(sort_list, 0, heap_size)

    return sort_list

The last swap sort algorithm to illustrate (all of the above algorithms are swap sorts, for the reason that they all require swapping order by comparing two by two) is naturally the classic quick sort.

First to explain the principle. The first to use is the partition tool function (partition), for a given list (array), we first select the benchmark element (here I choose the last element), by comparing the last makes the position of the element, so that the end of this run of the new list (in-place run) all the number of elements on the left of the benchmark element is smaller than the benchmark element, and the number of the number of the right side of it is greater than it. Then we use the partition function for the list to be ranked to find the position, divide the list into left and right lists (ideally), and then call the partition function recursively on it until the length of the subsequence is less than or equal to 1.

Here is the source code for quick sort:

class quick_sort(object):
  def _partition(self, alist, p, r):
    i = p-1
    x = alist[r]
    for j in range(p, r):
      if alist[j] <= x:
        i += 1
        alist[i], alist[j] = alist[j], alist[i]
    alist[i+1], alist[r] = alist[r], alist[i+1]
    return i+1

  def _quicksort(self, alist, p, r):
    if p < r:
      q = self._partition(alist, p, r)
      self._quicksort(alist, p, q-1)
      self._quicksort(alist, q+1, r)

  def __call__(self, sort_list):
    self._quicksort(sort_list, 0, len(sort_list)-1)
    return sort_list

Careful friends may find a problem here, if the sequence to be arranged is exactly in order, the whole recursion will reach the maximum recursion depth (the length of the sequence). In fact, when the length of the list is greater than 1000 (the theoretical value), the program will interrupt and report an error that the maximum recursion depth exceeded. After checking the information, we know that Python, by default, the maximum recursion depth is 1000 (theoretical value, in fact, the real situation, only 995 or so, the size of this value varies from system to system). There are two solutions to this problem, 1) reset the maximum recursion depth, using the following method to set:

import sys
(99999)

2) The second approach is to use another version of the partition function called the randomized partition function. Since all our previous choices were the last number of the subsequence, it is much less robust to special cases. Now we randomize the selection of the base element from the subsequence, which reduces the error rate for special cases. The new randomize partition function is as follows:

def _randomized_partition(self, alist, p, r):
  i = (p, r)
  alist[i], alist[r] = alist[r], alist[i]
  return self._partition(alist, p, r)

The complete randomize_quick_sort code is as follows (here I am directly inheriting from the previous quick_sort class):

import random
class randomized_quick_sort(quick_sort):
  def _randomized_partition(self, alist, p, r):
    i = (p, r)
    alist[i], alist[r] = alist[r], alist[i]
    return self._partition(alist, p, r)

  def _quicksort(self, alist, p, r):
    if p<r:
      q = self._randomized_partition(alist, p, r)
      self._quicksort(alist, p, q-1)
      self._quicksort(alist, q+1, r)

The discussion about quick sorting is not over yet. We all know that Python is a very elegant language, and Python writes code that is quite concise and extremely readable. Here's another way to write fast-ranking, which can be done in just three lines, but without losing readability. (Of course, to read is required a certain Python foundation) code is as follows:

def quick_sort_2(sort_list):
  if len(sort_list)<=1:
    return sort_list
  return quick_sort_2([lt for lt in sort_list[1:] if lt<sort_list[0]]) + \
      sort_list[0:1] + \
      quick_sort_2([ge for ge in sort_list[1:] if ge>=sort_list[0]])

How about reading this code from the Python cookbook, second edition, which demonstrates the expressive power of list derivation.

For the comparative sorting algorithm, we know that all possible cases can be drawn as a binary tree (decision tree model), for a list of n lengths, the height of the decision tree is h, and the leaf nodes are all the possibilities for this list to be disordered as n!, and we know that the leaf nodes of this binary tree won't be more than 2^h, so there are 2^h>=n!, and by taking logarithms, you can know that h& gt;=logn!, and this is approximated as O(nlogn). This means that the best performance of the comparative sorting algorithm is O(nlgn).

So are there any algorithms with linear time, i.e., time complexity O(n)? The answer is yes. However, since sorting is actually a very complex algorithm in practice. Here we just discuss the linear sorting algorithm in some special cases. The main linear sorting algorithms in special cases are counting sort, bucket sort and base sort. Here we will only briefly talk about counting sort.

The counting order is based on the assumption that the sequence to be sorted is a positive integer. First, declare a new sequence, list2, whose length is the largest number in the sequence to be ranked. Iterate through the to-be-arranged sequence, and for each number, set its size to i, list2[i]++, which is equivalent to counting the number of times the number of size i appears. Then, apply for a list, the length is equal to the length of the sequence to be arranged (this is the output sequence, which can be seen counting sort is not in-place sorting algorithm), traverse the sequence to be arranged in reverse order (the reason for the reverse order is to maintain the stability of the sorting, and the size of the same two numbers in the order of the position will not be switched after the sorting), assuming that the current number of the size of the i, list[list2[i]-1] = i, while list2[i] = i, and list2[i] = i. At the same time, list2[i] is reduced by 1 (this is because a number of this size has already been output, so the size should be reduced). So the source code for counting sort is as follows.

class counting_sort(object):
  def _counting_sort(self, alist, k):
    alist3 = [0 for i in range(k)]
    alist2 = [0 for i in range(len(alist))]
    for j in alist:
      alist3[j] += 1
    for i in range(1, k):
      alist3[i] = alist3[i-1] + alist3[i]
    for l in alist[::-1]:
      alist2[alist3[l]-1] = l
      alist3[l] -= 1
    return alist2

  def __call__(self, sort_list, k=None):
    if k is None:
      import heapq
      k = (1, sort_list)[0] + 1
    return self._counting_sort(sort_list, k)

After the introduction of the various sorting algorithms (all the above code passed the unit tests I wrote), let's go back to the topic of Python. Python has actually changed its built-in sorting algorithms several times since its earliest versions. From starting with the qsort routine provided by the C library (which has quite a few problems), to starting to implement its own algorithms on its own, including a hybrid of sampling sort and fold-and-half insertion sort before version 2.3, and the latest adapted sort algorithms, the code has also gone from 800 to 1200 lines of code in C, and beyond. From all this we know that it is impractical to use classical sorting algorithms in real production environments, and that they can only be used for learning and research purposes. Instead, the more recommended approach in practice should follow the following two points:

When sorting is required, try to try to use the sort method of the built-in Python list.
When searching is required, try to try to use the built-in dictionary.
I wrote test functions to compare the superiority of the built-in sort method compared to the above methods. The length of the test sequence is 5000, and each function is tested 3 times to get the average value, you can get the following test results:

(530×165)

As you can see, the Python built-in functions are of great advantage. Therefore, in practice, we should try to use the built-in sort method.

From this, we lead to another problem. How to determine whether a sequence of repeated elements, if there is a return True, no return False. someone will say, it is not very simple, directly write two nested iteration, traversal is. Code written down should be so.

def normal_find_same(alist):
  length = len(alist)
  for i in range(length):
    for j in range(i+1, length):
      if alist[i] == alist[j]:
        return True
  return False

This approach is very costly (average time complexity is O(n^2), and reaches the worst case when there are no duplicate elements in the list), and from previous experience, we can think of a way to take advantage of the extremely fast experience with the built-in sort method by doing the following: first sort the list, and then iterate through it to see if there are duplicate elements. Including the complete test code is as follows:

import time
import random

def record_time(func, alist):
  start = ()
  func(alist)
  end = ()

  return end - start

def quick_find_same(alist):
  ()
  length = len(alist)
  for i in range(length-1):
    if alist[i] == alist[i+1]:
      return True
  return False

if __name__ == "__main__":
  methods = (normal_find_same, quick_find_same)
  alist = range(5000)
  (alist)

  for m in methods:
    print 'The method %s spends %s' % (m.__name__, record_time(m, alist))

After running it my data is that for a list of 5000 length with no duplicate elements, the normal method takes about 1.205 seconds, while the fast lookup method takes only 0.003 seconds. This is an example of sorting in action.