Python's Processes and Process Pools in Detail

step

A process is the basic unit of the operating system that allocates resources and serves as a boundary for program isolation.

Processes and procedures

A program is just a collection of instructions, it doesn't have any meaning of running by itself, it is static.

The execution instance of a process program, which is dynamic, has its own life cycle, is created and revoked, and its existence is temporary.

Processes and programs are not one-to-one; a program can correspond to multiple processes, and a process can execute one or more programs.

We can understand this in the following way: the finished code, when not running, is called a program, and the code that is running, starts one (or more) processes.

The state of the process

When our operating systems work, the number of tasks is often greater than the number of cpu cores, i.e., there will always be some tasks being performed and others waiting for the cpu, resulting in processes having different states.

Ready state: waiting for the cpu to perform operations after the conditions have been met.
Execution status: The cpu is executing.
Waiting state: Waiting for some condition to be fulfilled, e.g., a program sleeps, then it is in the waiting state.

Processes in Python

In Python, processes are created using the multiprocessing module, which provides a Process class to create process objects.

Create sub-process

Process syntax structure:

Process(group, target, name, args, kwargs)

groupThe process group is the group of processes specified in the following table, which is not available in most cases
target: indicates the object of the call, i.e. the task to be performed by the child process
name: The name of the child process, which can be left unset
args: Pass the arguments to the function specified by target as a tuple.
kwargs: pass named parameters to the function specified by target

Process Common Methods

() Start the process and call the () method in that child process
(timeout)The main process waits for sub-process to finish execution before terminating. timeout is an optional timeout.
is_alive(): Determine whether process sub-process is alive or not
() The method that runs when the process starts is exactly what it does to call the function specified by target
() Immediately terminate sub-processes

Frequently Used Attributes of Instance Objects Created by Process

name: the alias of the current process, default is Process-N, N is an integer incremented from 1

pid: pid (process number) of the current process

import multiprocessing
import os
import time
def work(name):
    print("Subprocess work is running ......")
    (0.5)
    print(name)
    # Get the name of the process
    print("Child process name", multiprocessing.current_process())
    # Get the pid of the process
    print("Child process pid", multiprocessing.current_process().pid, ())
    # Get the pid of the parent process
    print("Parent process pid", ())
    print("End of sub-process run ......")
if __name__ == '__main__':
    print("Master process initiated")
    # Get the name of the process
    print("Master process name", multiprocessing.current_process())
    # Get the pid of the process
    print("Master process pid", multiprocessing.current_process().pid, ())
    # Create processes
    p = (group=None, target=work, args=("tigeriaf", ))
    # Start the process
    ()
    print("Main process terminated.")

With the above code we find that theHelp us to create a child process, and successfully run, but we find that the main process is dead before the child process is finished executing, then this child process is an orphan process after the main process is finished, so can we let the main process wait for the child process to finish before ending it? The answer is yes. That is through (), join () is to allow the main process to wait for the child process to finish execution before exiting.

import multiprocessing
import os
import time
def work(name):
    print("Subprocess work is running ......")
    (0.5)
    print(name)
    # Get the name of the process
    print("Child process name", multiprocessing.current_process())
    # Get the pid of the process
    print("Child process pid", multiprocessing.current_process().pid, ())
    # Get the pid of the parent process
    print("Parent process pid", ())
    print("End of sub-process run ......")
if __name__ == '__main__':
    print("Master process initiated")
    # Get the name of the process
    print("Master process name", multiprocessing.current_process())
    # Get the pid of the process
    print("Master process pid", multiprocessing.current_process().pid, ())
    # Create processes
    p = (group=None, target=work, args=("tigeriaf", ))
    # Start the process
    ()
    ()
    print("Main process terminated.")

Run results:

As you can see, the master process is terminated after the child process is terminated.

Global Variable Problems

Global variables are not shared among multiple processes; data between processes are independent and do not affect each other by default.

import multiprocessing
# Define global variables
num = 99
def work1():
    print("work1 is running ......")
    global num   # Declare the use of the global variable num inside the function.
    num = num + 1  # The num value is performed as +1.
    print("work1 num = {}".format(num))
def work2():
    print("work2 is running ......")
    print("work2 num = {}".format(num))
if __name__ == '__main__':
    # Create process p1
    p1 = (group=None, target=work1)
    # Start process p1
    ()
    # Create process p2
    p2 = (group=None, target=work2)
    # Start the process p2
    ()

Run results:

As can be seen from the results, the changes made to the global variable num by the work1() function are not captured in work2, but remain at 99. Therefore, there is not enough sharing of variables between processes.

daemon

As mentioned above, you can use () to make the main process wait for the child process to end before ending it, so is it possible to make the child process end when the main process ends? The answer is yes. We can use = True or () to set this:

import multiprocessing
import time
def work1():
    print("work1 is running ......")
    (4)
    print("Work 1 completed.")
def work2():
    print("work2 is running ......")
    (10)
    print("Work2 is complete.")
if __name__ == '__main__':
    # Create process p1
    p1 = (group=None, target=work1)
    # Start process p1
    ()
    # Create process p2
    p2 = (group=None, target=work2)
    # Set up the p2 daemon
    # First method
    # = True Set before start() or an exception will be thrown.
    # Start the process p2
    ()
    (2)
    print("The main process is running!")
    # A second way
    ()

The results of the implementation are as follows:

Since p2 is set up to daemonize the master process, when the master process finishes running, the p2 sub-processes end, the work2 task stops, and work1 continues to run until it finishes.

process pool

When the number of sub-processes to be created is small, you can simply use theMultiple processes can be generated dynamically, but if many processes are to be created manually, the amount of work involved would be very large, so you can use themultiprocessingPool provided by the module to create a process pool.

Frequently Used Functions:

apply_async(func, args, kwds)The args are the list of arguments to be passed to the func, and the kwds are the list of keyword arguments to be passed to the func.
apply(func, args, kwds)The following is an example of a blocking method: If you call func in blocking mode, you have to wait for the previous process to finish executing its task before executing the next process, so it is hardly necessary.
close(): Close the Pool so that it will not accept new tasks.
terminate(): Whether the task is completed or not, it ends immediately.
join(): Main process blocks and waits for sub-process to exit, must be used after close or terminate.

When the pool is initialized, a maximum number of processes can be specified. When a new task is submitted to the pool, if the pool is not yet full, a new process will be created to execute the task, but if the pool is full (the number of processes in the pool has reached the specified maximum), the task will wait until a process in the pool has finished before a new process is created to execute it.

from multiprocessing import Pool
import time
def work(i):
    print("work'{}'under implementation......".format(i), multiprocessing.current_process().name, multiprocessing.current_process().pid)
    (2)
    print("work'{}'executed......".format(i))
if __name__ == '__main__':
    # Create process pools
    # Pool(3) indicates the creation of a process pool with a capacity of 3 processes.
    pool = Pool(3)
    for i in range(10):
        # Work tasks are performed synchronously using a process pool, and processes in the pool wait for the previous process to complete its task before performing the next process.
        # (work, (i, ))
        # Work tasks are performed using asynchronous methods.
        pool.apply_async(work, (i, ))
    # No new requests accepted after the process pool is closed
    ()
    # Wait for all sub-processes in po to finish, must be placed after close(), if work task is executed using asynchronous method, main thread will not wait for sub-thread to finish execution before exiting!
    ()

The implementation results are:

From the result we can see that there are only 3 child processes executing the task, here we are using an asynchronous method（pool.apply_async(work, (i, ))）Work tasks are performed if they are performed in a synchronous manner.（(work, (i, ))）To execute, the processes in the process pool wait for the previous process to finish its task before executing the next process.

summarize

This article only introduces what is a process, the relationship between processes and programs, the creation and use of processes, the creation of process pools, etc., and does not introduce process synchronization and process communication, etc., will be introduced in the next article.