Python Advanced Tutorial on Threads Processes and Concurrencies Code Analysis

step

A process is an application program that is running in the system and is the smallest working unit of the CPU.

The 5 basic states of a process

A process has at least five basic states: initial state, ready state, waiting (blocking) state, execution state, and termination state.

Initial state: the process has just been created, because other processes are occupying CPU resources, so it can not be executed, only in the initial state.
Ready state: only those that are in the ready state can be scheduled to the execution state
Waiting state: the process waits for something to complete
Execution state: there can only be one process in execution state at any given time (for a single-core CPU).
Stop state: process terminated

Characteristics of the process

Dynamic: A process is a single execution of a program that arises and dies dynamically.
Independence: A process is a basic unit that can run independently. It is the basic unit of the system that allocates resources and scheduling.
Concurrency: any process can execute concurrently with other processes.
Structural: A process consists of three parts: program, data and process control block.

multiprocessing is a more advanced library than fork, and using multiprocessing makes it easier to implement multiprocessing programs.

#!/usr/bin/env python
# -*- coding:utf-8 -*- 
from multiprocessing import Process
import threading
import time
def foo(i):
    print 'say hi',i
for i in range(10):
    p = Process(target=foo,args=(i,))
    ()

Note: Creating a process requires a very large overhead since each process needs to hold a copy of the data between them. And python cannot create processes under Windows!

When using multiprocessing, it is best to create a number of processes equal to the number of CPU cores.

Inter-process data sharing

Processes in a system share CPU and main memory resources with other processes, and in order to better manage the main memory, the operating system provides an abstraction of the main memory as virtual memory (VM). It is also an abstraction that provides an illusion that each process is using the main memory exclusively.

Virtual memory provides three main capabilities:

Think of the main memory as a cache stored on disk, keeping only the active region in the main memory and transferring data back and forth between the disk and the main memory as needed, and in this way, using the main memory more efficiently
Simplifies memory management by providing a consistent address space for each process
Protecting the address space of each process from being corrupted by other processes

Since processes have their own exclusive virtual address space and the CPU translates virtual addresses into real physical addresses through address translation, each process can only access its own address space. Therefore, data cannot be shared between processes without the aid of other mechanisms (inter-process communication).

Processes each hold a copy of the data, the default can not share data. By default processes are independent of each other, if you want to share data between processes, you need a special data structure, which can be interpreted as he has the ability to walk through walls If you can walk through walls then both sides can be used

#!/usr/bin/env python
#coding:utf-8
from multiprocessing import Process
from multiprocessing import Manager
import time
li = []
def foo(i):
    (i)
    print 'say hi',li
for i in range(10):
    p = Process(target=foo,args=(i,))
    ()
print 'ending',li

Use special data types for wall penetration:

# Through special data structures: arrays (Array)
from multiprocessing import Process,Array
# Create an array (called a list in python) of only numeric types
# and arrays are immutable, in C or other languages arrays are immutable, then in python arrays (lists) can become
# Of course, other languages provide mutable arrays as well.
# In C arrays and strings are the same, if you define a list, if it can be increased, then I need to open up another space behind your memory address, so how much do I reserve for you?
#lists in python might be done with a chained list, I keep track of who's in front of you and who's behind you. Lists are not contiguous, arrays are contiguous
'''
The above is not a list, it is an "array". Arrays are immutable, and the additional content is for better understanding of arrays!
'''
temp = Array('i', [11,22,33,44]) # The i here is the data structure in C through which you define the type of content you want to share! Click in to see~
def Foo(i):
    temp[i] = 100+i
    for item in temp:
        print i,'----->',item
for i in range(2):
    p = Process(target=Foo,args=(i,))
    ()
The second method：
#Method II: () Sharing data
from multiprocessing import Process,Manager  # This special data type Manager
manage = Manager()
dic = () # The call here uses a dictionary, which is used in the same way as our python!
def Foo(i):
    dic[i] = 100+i
    print ()
for i in range(2):
    p = Process(target=Foo,args=(i,))
    ()
    ()

Since sharing data between processes is possible, if multiple processes modify this data at the same time wouldn't that result in dirty data? Wouldn't that require locks!

Process locks and thread locks are used in very much the same way knowledge they are used classes are in different places.

process pool

The process pool maintains a sequence of processes, and when used, a process is fetched from the pool. If there is no process in the pool sequence that can be used, the program waits until a process is available in the pool.

There are two methods in the process pool:

apply
apply_async

#!/usr/bin/env python
# -*- coding:utf-8 -*-
from  multiprocessing import Process,Pool
import time
def Foo(i):
    (2)
    return i+100
def Bar(arg):
    print arg
pool = Pool(5) # Create a process pool
#print (Foo,(1,))# Go to the process pool and request a process to execute the Foo method.
#print pool.apply_async(func =Foo, args=(1,)).get()
for i in range(10):
    pool.apply_async(func=Foo, args=(i,),callback=Bar)
print 'end'
()
()# Processes in the process pool are shut down after they have finished executing, if commented, then the program is shut down directly.
'''
apply takes the initiative to execute
pool.apply_async(func=Foo, args=(i,),callback=Bar) equivalent to asynchronous, when applying for a thread, the implementation of the FOO method does not care about the implementation of the end of the execution of the callback after the implementation of the end of the implementation of a method to tell me that the implementation is complete!
callback has a function, this function is the operation of the return value of the Foo function!
'''

Disadvantages of the process

Tasks that can't be completed on-the-fly have significant context-switching costs and time costs.

Process context: When a process is executing, the values in all the CPU registers, the state of the process, and the contents of the stack are referred to as the context of that process.

Context switching: When the kernel needs to switch to another process, it needs to save all the states of the current process, i.e., save the context of the current process, so that when the process is executed again, it can get the state at the time of switching and execute it.

threading

Definition of a thread

In computing, a process is an instance of an executing computer program. Any process has 3 basic components:

An executable program.
Relevant data required by the program (variables, workspace, buffers, etc.)
Execution context of the program (process state)

A thread is an entity in a process that can be scheduled for execution. In addition, it is the smallest processing unit that can be executed in the OS (operating system).

In short, a thread is a series of such instructions in a program that can be executed independently of other code. For simplicity, you can assume that threads are just a subset of processes!

Thread Thread Control Block (TCB) contains all this information:

Thread Identifier: Assigns a unique id (TID) to each new thread.
Stack pointer: points to the stack of a thread in a process. The stack contains thread-scoped local variables.
Program Counter: a register that holds the address of the instruction currently being executed by the thread.
Thread state: can be running, ready, waiting, starting or done.
Thread's register set: registers allocated to the thread to perform calculations.
Parent Process Pointer: A pointer to the process control block (PCB) of the process in which the thread resides.

Multithreading is defined as the ability of a processor to execute multiple threads simultaneously.

In a simple single-core CPU, it is accomplished by frequent switching between threads. This is called a context switch. In a context switch, whenever any interrupt occurs (due to I/O
or set manually), it saves the state of one thread and loads the state of the other. Context switches occur so frequently that all threads seem to be running in parallel (this is called multitasking).

In Python, the threading module provides a very simple and intuitive API for spawning multiple threads in a program.

Simple Example of Using the Threading Module

Let's consider a simple example using the Threads module:

# Python program illustrating the concept of threading
# Importing the Threads module
import threading
def print_cube(num):
    """
    A function that prints a given number cubed
    """
    print("cube: {}".format(num * num * num))
def print_square(num):
    """
    Functions that print the square of a given number
    """
    print("square (as in square foot, square mile, square root): {}".format(num * num))
if __name__ == "__main__":
    # creating thread
    t1 = (target=print_square, args=(10,))
    t2 = (target=print_cube, args=(10,))
    # starting thread 1
    ()
    # starting thread 2
    ()
    # Wait until thread 1 is fully executed
    ()
    # Wait until thread 2 is fully executed
    ()
    # Two threads fully executed
    print("Done!")

Square: 100
Cubic: 1000
Done!

code resolution

Let's try to understand the above code:

To import the threading module, we do this:

import threading

To create a new thread, we create an object of class Thread. It takes the following parameters:
target : Functions to be executed by the thread
args: the parameters to be passed to the objective function

In the above example, we have created 2 threads with different objective functions:

t1 = (target=print_square, args=(10,)) 
t2 = (target=print_cube, args=(10,))

To start a thread, we use the start method of the Thread class.

() 
()

Once the thread is started, the current program (which you can think of as a main thread) also continues to execute. To stop the execution of the current program before the thread finishes, we use the join method.

() 
()

As a result, the current program will first wait for t1 to complete, then t2 . Once they have completed, the remaining statements of the current program are executed.

concurrent program

Coroutines (also known as microthreads and fibers) are a more lightweight presence than threads; instead of being managed by the operating system kernel, a coroutine is completely controlled by the program.

We are all familiar with functions, also known as subroutines, procedures, sub-processes, etc. A function is a sequence of instructions packaged into a unit to perform a specific task. When the logic of a complex function is divided into separate steps that are functions in their own right, these are called auxiliary functions or subroutines.

Subroutines in Python are called by the main function that coordinates the use of those subroutines. Subroutines have only one entry point. Co-processes are a generalization of subroutines. They are used in collaborative multitasking, where one process voluntarily relinquishes (gives up) control periodically or at idle to allow multiple applications to run simultaneously. The difference between a concurrent program and a subroutine is:

Unlike subroutines, concurrent programs have a number of entry points for suspending and resuming execution. A concurrent program can pause its execution and transfer control to other concurrent programs, and it can resume execution from the point of interruption.
Unlike a subroutine, there is no main function that can call the co-programs in a particular order and coordinate the results. Co-programs are collaborative, which means they are linked together to form a pipeline. One co-program may use input data and send it to other co-programs that process it. Finally, there may be a concatenation that displays the results.

Concatenation and Threading

Now you may be wondering how a concatenation is different from a thread, both seem to be doing the same job.
In the case of threads, it is the operating system (or runtime environment) that switches between threads based on the scheduler. Whereas in the case of a concurrent program, it is the programmer and the programming language that decide when to switch the concurrent program. Co-programming allows for collaborative multitasking by the programmer pausing and resuming at set points.

Python Concurrency

In Python, a concatenation is similar to a generator, but with few additional methods and a subtle change in the way we use yield statements. Generators generate data for iteration, whereasConcurrent programs can also use data。
In Python 2.5, a slight modification to the yield statement was introduced, and now yield can also be used as an expression. For example, on the right-hand side of the assignment - the

line = (yield)

Any value we send to the concatenation is captured by the (yield) expression and returned.

Values can be sent to a concatenation via the send() method. For example, consider this concatenation, which prints out the name with the prefix "Dear". We will use the send() method to send the name to the concatenation.

# Python3 program to demonstrate concurrent execution
def print_name(prefix):
    print("Searching prefix:{}".format(prefix))
    while True:
        name = (yield)
        if prefix in name:
            print(name)
# Calling a coprogram, nothing happens
corou = print_name("Dear")
# This will start executing the concatenation and print the first line "Searching prefix..."
# and advances execution to the first yield expression
corou.__next__()
# Send input
("Haiyong")
("Dear Haiyong")

Output:

Searching prefix:Dear
Dear Haiyong

Execution of concurrent programs

The execution of a concatenation is similar to a generator. When we call a concatenation, nothing happens, it only executes in response to thenext()cap (a poem)send ()method when it is run. This can be clearly seen in the above example, as it is only run when calling the__next__()method before our concatenation begins to execute. After this call, execution advances to the first yield expression, and now execution pauses and waits for the value to be sent to the corou object. When the first value is sent to it, it checks for a prefix and prints the name if one exists. After printing the name, it iterates through the loop until it encounters the name = (yield) expression again.

Shutting down a coprogram

Concurrent programs may run indefinitely, shutting down concurrent programs uses theclose()method. When the concatenation closes, it generates theGeneratorExitexception, which can be caught in the same way it would normally be caught. After closing the coprogram, if we try to send the value, it will raise theStopIterationException. Here is a simple example:

# Python3 program for demonstrating
# closing a coroutine
def print_name(prefix):
    print("Searching prefix:{}".format(prefix))
    try :
        while True:
                name = (yield)
                if prefix in name:
                    print(name)
    except GeneratorExit:
            print("Shut down the concurrent program!!!")
corou = print_name("Dear")
corou.__next__()
("Haiyong")
("Dear Haiyong")
()

Output:

Search prefix: Dear
Dear Haiyong
Close the concurrent program!

Linking Concurrent Programs to Create Pipelines

Concurrent programs can be used to set up pipelines. We can use the send() method to link concurrent processes together and push data through the pipeline. Pipelines are required:

The initial source (producer) derives the entire pipeline. The producer is usually not a coprocessor, it is just a simple method.
A sink, which is the endpoint of the pipeline. The receiver may collect all the data and display it.

Here is an example of a simple link

# Python program to demonstrate concurrent linking

def producer(sentence, next_coroutine):
    '''
    producer It's just a matter of splitting the string and
    give pattern_filter concurrent program
    tokens = (" ")
    for token in tokens:
        next_coroutine.send(token)
    next_coroutine.close()
def pattern_filter(pattern="ing", next_coroutine=None):
    Search for patterns in received tokens，If the pattern matches，
    Send it to print_token() concurrent program进行打印
    print("Searching for {}".format(pattern))
    try:
        while True:
            token = (yield)
            if pattern in token:
                next_coroutine.send(token)
    except GeneratorExit:
        print("Filtering complete!!!")
def print_token():
    act as a receiver，Simply print the received token
    print("I'm sunk, I'll print the token.")
            print(token)
        print("Printing complete!")
pt = print_token()
pt.__next__()
pf = pattern_filter(next_coroutine = pt)
pf.__next__()
sentence = "Haiyong is running behind a fast moving car"
producer(sentence, pf)

Output:

I'm sunk. I'll print the token.
Searching for ing
running
moving
Filtering complete!
Printing is complete!

summarize

1. Threads and co-threads are recommended for IO-intensive tasks (such as network calls) and perform poorly in CPU-intensive tasks.
2. For CPU-intensive tasks, multiple processes are required to bypass the GIL limitations and utilize all available CPU cores to increase efficiency.
3. The best practice under high concurrency is multi-process + co-processing, not only to take full advantage of the multi-core, but also to give full play to the high efficiency of the co-processing, you can get very high performance.

CPU-intensive: multi-process
IO-intensive: multi-threaded (concurrent threads are expensive to maintain and do not provide significant efficiency gains in reading and writing files)
CPU-intensive and IO-intensive: multiprocessing + concatenation

To this point, this article on Python Advanced Tutorial of the thread process and concurrent code analysis of the article is introduced to this, more related to Python thread process and concurrent content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!