The so-called module import (import
), is an operation that uses code from another module in one module, which facilitates code reuse.
The import keyword is used in Python to accomplish this, but it's not the only way; there's theimportlib.import_module()
cap (a poem)__import__()
etc.
Maybe you saw the title and said how could I post something so basic?
Contrary to this. I think this article can be considered as an advanced Python skill, which will go deeper into Python import Hook and explain it with real examples.
Of course, in order to make the article more systematic and comprehensive, there will be a small space in the front to explain the basic knowledge points, but please have the patience to read backward, because the latter is the essence of this article, I hope you do not miss it.
1. Basis for importing the system
1.1 Composition of the introductory module
There are various kinds of import units, which can be modules, packages and variables.
For these basic concepts, it is still necessary for newbies to be introduced to the differences.
module: A file like *.py, *.pyc, *.pyd, *.so, *.dll, which is the smallest unit of the Python code carrier.
Packages can also be subdivided into two types.
__init__.py
Some of you may be unfamiliar with Namespace packages, so here's an excerpt from the official documentation to explain.
Namespace packages are made up of multiple parts, each of which adds a child to the parent package. The parts may be in different locations on the filesystem. Sections may also be in zip files, on the network, or elsewhere that Python can search for during import. Namespace packages do not necessarily correspond directly to objects in the filesystem; they may be virtual modules with no physical representation.
of the namespace package__path__
attribute does not use a normal list. Instead, it uses a customized iterable type, and if the path to its parent package (or the highest-level package's ) changes, such an object automatically performs a new search for the package part on the next import attempt within that package.
The namespace package does not haveparent/__init__.py
file. In fact, multiple parent directories may be found during the import search, each provided by a different section. Thus parent/one is not necessarily physically located next to parent/two. In this case, Python will create a namespace package for the top-level parent package, whether it is itself or one of its children being imported.
1.2 Relative/absolute import
When we import import modules or packages, Python offers two ways to import them:
- Relative import: from . import B or from . A import B, where . A imports B, where . A is the upper level module
- absolute import: import or form foo import bar
You can choose as you see fit, but it's important to note that in earlier versions (before Python 2.6), Python used relative imports by default. In later versions (after Python 2.6), absolute imports were the default.
There are advantages and disadvantages to using absolute and relative paths:
- When you are developing and maintaining your own project, you should use relative paths to import so that you can avoid the trouble of hard-coding.
- Using absolute paths, on the other hand, will give you a clearer structure for module import, and it also avoids import errors due to package conflicts with renamed packages.
1.3 Standardized write-up of the import
There are requirements for module import in PEP8, and adhering to the PEP8 specification can make your code more readable, which I'll list on my end as well:
The import statement should be written on separate lines
# bad import os,sys # good import os import sys
The import statement should use absolute import.
# bad from ..bar import Bar # good from import test
- The import statement should be placed in the header of the file, after the module description and docstring, and before the global variables.
- The import statements should be listed in order, with each group separated by a space, in the order in which the built-in modules, third-party modules, and your own modules are called, and in alphabetical order within each group.
# Built-in modules import os import sys # Third-party modules import flask # Local modules from foo import bar
1.4 A few useful sys variables
You can list the directories in which Python modules are looked up.
>>> import sys >>> from pprint import pprint >>> pprint() ['', '/Library/Frameworks//Versions/3.6/lib/', '/Library/Frameworks//Versions/3.6/lib/python3.6', '/Library/Frameworks//Versions/3.6/lib/python3.6/lib-dynload', '/Users/MING/Library/Python/3.6/lib/python/site-packages', '/Library/Frameworks//Versions/3.6/lib/python3.6/site-packages'] >>>
sys.meta_path stores all the finders.
>>> import sys >>> from pprint import pprint >>> pprint(sys.meta_path) [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
sys.path_importer_cache
is a bit bigger than that, because it records a finder for all the directories where the code is loaded. This includes package subdirectories, which are usually not present in the
>>> import sys >>> from pprint import pprint >>> pprint(sys.path_importer_cache) {'/Library/Frameworks//Versions/3.6/lib/python3.6': FileFinder('/Library/Frameworks//Versions/3.6/lib/python3.6'), '/Library/Frameworks//Versions/3.6/lib/python3.6/collections': FileFinder('/Library/Frameworks//Versions/3.6/lib/python3.6/collections'), '/Library/Frameworks//Versions/3.6/lib/python3.6/encodings': FileFinder('/Library/Frameworks//Versions/3.6/lib/python3.6/encodings'), '/Library/Frameworks//Versions/3.6/lib/python3.6/lib-dynload': FileFinder('/Library/Frameworks//Versions/3.6/lib/python3.6/lib-dynload'), '/Library/Frameworks//Versions/3.6/lib/python3.6/site-packages': FileFinder('/Library/Frameworks//Versions/3.6/lib/python3.6/site-packages'), '/Library/Frameworks//Versions/3.6/lib/': None, '/Users/MING': FileFinder('/Users/MING'), '/Users/MING/Library/Python/3.6/lib/python/site-packages': FileFinder('/Users/MING/Library/Python/3.6/lib/python/site-packages')}
2. _ import_ effective use
The use of the import keyword is the foundation of the foundation, so to speak.
But that's not the only way for modules to work, there's also importlib.import_module() and __import__().
Unlike import, __import__ is a function, and it is for this reason that the use of __import__ is more flexible and is often used in frameworks for dynamic loading of plugins.
In fact, when we call import to import a module, we are also calling __import__ internally, see the following two import methods, they are equivalent.
# Use import import os # Use __import__ os = __import__('os')
By way of example, the following two methods are equally equivalent.
# Use import ... as ... import pandas as pd # Use __import__ pd = __import__('pandas')
I said above that __import__ is often used for plugin dynamics, and in fact it's the only thing that can do that (as opposed to import).
Plug-ins are usually located in a specific folder, and you may not use all of them, or you may add new plug-ins.
If you use the import keyword this hard-coded way, obviously too inelegant, when you want to add/modify plug-ins, all need you to change the code. A more appropriate approach is to write these plug-ins as a configuration in the configuration file , and then the code to read your configuration , dynamic import you want to use the plug-in , that is, flexible and convenient , but also less prone to error.
If I have a project, there are plugin01, plugin02, plugin03, plugin04 four plug-ins, these plug-ins under the implementation of a core method run(). But sometimes I do not want to use all the plug-ins, only want to use plugin02, plugin04, then I will write in the configuration file I want to use the two plug-ins.
# custom_plugins=['plugin02', 'plugin04']
So how do I use dynamic loading and run them?
# for plugin in conf.custom_plugins: __import__(plugin) [plugin].run()
3. Understanding module caching
Repeated references to the same module within a module will not actually be imported twice, because when importing a module with the keyword import, it will first retrieve whether the module has already been loaded, and if it has been loaded, it will not be imported again, and if it doesn't, it will be retrieved for importing the module.
To experiment, in my_mod02 module, I import twice my_mod01 module, according to the logic of each import will be once in my_mod01 code (i.e., print in mod01), but the verification results are, only printed once.
$ cat my_mod01.py print('in mod01') $ cat my_mod02.py import my_mod01 import my_mod01 $ python my_mod02.py in mod01
The phenomenon is explained by the presence of the
is a dictionary (key: module name, value: module object) that holds all imported module objects in the current namespace.
# test_module.py import sys print(('json', 'NotFound')) import json print(('json', 'NotFound'))
The result is as follows. You can see that the json module object is available only after the json module is imported.
$ python test_module.py NotFound <module 'json' from 'C:\Python27\lib\json\__init__.pyc'>
The presence of a cache makes it impossible to reload a module.
But if you want to do the opposite, you can do it with the help of importlib, an amazing library. For example, in code debugging, after finding an exception in the code and modifying it, we usually have to restart the service to load the program again. At this point, if you have a module reload, it is incredibly convenient, after modifying the code also do not need to restart the service, you can continue to debug.
Still using the example above, my_mod02.py is rewritten as follows
# my_mod02.py import importlib import my_mod01 (my_mod01)
Using python3 to execute this module, unlike above, my_mod01.py is executed twice.
$ python3 my_mod02.py in mod01 in mod01
4. Finders and loaders
If the module with the specified name is not found, a call to Python's import protocol is initiated to find and load the module.
The protocol consists of two conceptual modules, the finder and the loader.
The import of a Python module can actually be subdivided into two processes:
- Module lookup implemented by a finder
- Module loading implemented by loaders
4.1 What is a Finder?
Finder, to put it simply, a finder defines a module finding mechanism that lets a program know how to find the corresponding module.
Python actually has several default finders built in, which exist in sys.meta_path.
But these finders are not so important to the user, so before Python 3.3, the Python interpretation hid them, which we call implicit finders.
# Python 2.7 >>> import sys >>> sys.meta_path [] >>>
Since this is not conducive to a deeper understanding of the import mechanism, as of Python 3.3, all module imports are exposed via sys.meta_path, and there are no more implicit imports.
# Python 3.6 >>> import sys >>> from pprint import pprint >>> pprint(sys.meta_path) [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
Looking at Python's default finder, you can see that there are three kinds:
- A way to know how to import built-in modules
- A way to know how to import frozen modules
- A module that knows how to import modules from the import path (i.e. path based finder).
Can we define our own finder? Of course you can, you just need to
- Define a class that implements the find_module method (both py2 and py3) or the find_loader method (py3 only), if you find a module, you need to return a loader object or a ModuleSpec object (we'll talk about that later), if you don't find it, you need to return None.
- Once defined, to use this finder, you must register it by inserting it at the top of the sys.meta_path so that it takes precedence.
import sys class MyFinder(object): @classmethod def find_module(cls, name, path, target=None): print("Importing", name, path, target) # Will be defined later return MyLoader() # Since the finder reads sequentially, it must be inserted first sys.meta_path.insert(0, MyFinder)
Finders can be categorized into two types:
object +-- Finder (deprecated) +-- MetaPathFinder +-- PathEntryFinder
Note that before version 3.4, the finder returned the Loader object directly, whereas after version 3.4, the finder returns the ModuleSpec, which contains the loader.
And for more information on what a loader is and what the module specification says, read on.
4.2 What is a loader?
The finder is only responsible for finding and locating the module, and it is the loader that is really responsible for loading the module.
A normal loader must define a method called load_module().
The reason why we say general here is that there are different kinds of loaders:
object +-- Finder (deprecated) | +-- MetaPathFinder | +-- PathEntryFinder +-- Loader +-- ResourceLoader --------+ +-- InspectLoader | +-- ExecutionLoader --+ +-- FileLoader +-- SourceLoader
A look at the source code shows that the abstraction methods vary from loader to loader.
A loader is usually returned by a finder. See PEP 302 for details, and for abstract base classes see .
So how do we customize our own loader?
All you have to do is
- Define a class that implements the load_module method.
- Validation of import-related attributes (click for details)
- Creates a module object and binds all import-related property variables to the module.
- Save this module into (order is important to avoid recursive import)
- Then load the module (this is the core)
- Need to be able to handle throwing exceptions if there is a loading error (ImportError)
- If loading is successful, the module object is returned
- If you want to see specific examples, you can read on.
4.3 Module specification
The import mechanism uses a variety of information about each module during import, especially before loading. Most of this information is common to all modules. The purpose of the module specification is to encapsulate this import-related information on a per-module basis.
The module specification is exposed to the public as the __spec__ attribute of the module object. See ModuleSpec for details on module specifications.
After Python 3.4, instead of returning the loader, the finder returns a ModuleSpec object, which stores more information
- module name
- loaders
- Module Absolute Path
How can I see the ModuleSpec of a module?
Here's an example.
$ cat my_mod02.py import my_mod01 print(my_mod01.__spec__) $ python3 my_mod02.py in mod01 ModuleSpec(name='my_mod01', loader=<_frozen_importlib_external.SourceFileLoader object at 0x000000000392DBE0>, origin='/home/MING/my_mod01.py')
As you can see from ModuleSpec, the loader is included, so if we want to reload a module, don't we have another idea?
Come along and verify.
There are now two files:
One is my_info.py
# my_info.py name='wangbm'
The other one is:
# import my_info print(my_info.name) # Add a breakpoint import pdb;pdb.set_trace() # Load it up again my_info.__spec__.loader.load_module() print(my_info.name)
I added a breakpoint at the point where I changed the name in my_info.py to ming in order to verify that the overloading was working when I reached the breakpoint.
$ python3 wangbm > /home/MING/(9)<module>() -> my_info.__spec__.loader.load_module() (Pdb) c ming
From the results, it appears that overloading works.
4.4 What is an importer?
The importer, you may have seen it in other articles, but it's not really a new thing.
It's just an object that implements both the finder and loader interfaces, so you can say that the importer is a finder or that it's a loader.
5. Remote import module
Because Python's default finder and loader only supports local module imports, it does not support the implementation of remote module imports.
To give you a better understanding of the Python Import Hook mechanism, I'll demonstrate below, by way of example, how to implement an importer for remotely importing modules yourself.
5.1 Hands-on implementation of the importer
When importing a package, the Python interpreter first gets the list of finders from sys.meta_path.
The default order is: Built-in Module Finder -> Frozen Module Finder -> Third-Party Module Path (Local) Finder
If, after these three finders, the required module is still not found, an ImportError exception is thrown.
So to implement a remote import module, there are two ways to think about it.
- One is to implement your own meta-path importer;
- The other is to write a hook that is added to sys.path_hooks that recognizes a specific directory naming pattern.
I've chosen the first method here as an example.
To implement the importer, we need separate finders and loaders.
First the finder.
From the source code, we know that there are two types of path finders
- MetaPathFinder
- PathEntryFinder
MetaPathFinder is used here for finder writing.
Prior to Python version 3.4, the finder had to implement the find_module() method, and with Python version 3.4+, the find_spec() method is recommended, but that doesn't mean you can't use find_module(), but in the absence of the find_spec() method, the import protocol will still try the find_ module() method.
I'll start with an example of what to write using find_module().
from importlib import abc class UrlMetaFinder(): def __init__(self, baseurl): self._baseurl = baseurl def find_module(self, fullname, path=None): if path is None: baseurl = self._baseurl else: # Returns a non-existent url if it's not originally defined if not (self._baseurl): return None baseurl = path try: loader = UrlMetaLoader(baseurl) loader.load_module(fullname) return loader except Exception: return None
If you use find_spec(), be aware that this method needs to be called with two or three arguments.
The first is the fully qualified name of the module being imported, e.g. . The second argument is the path entry to be used for the module search. For top-level modules, the second argument is None, but for submodules or subpackages, the second argument is the value of the parent package __path__ attribute. If the corresponding __path__ attribute is not accessible, a ModuleNotFoundError is raised. The third argument is an existing module object that will be targeted for later loading. The import system will only pass in a target module during reload.
from importlib import abc from import ModuleSpec class UrlMetaFinder(): def __init__(self, baseurl): self._baseurl = baseurl def find_spec(self, fullname, path=None, target=None): if path is None: baseurl = self._baseurl else: # Returns a non-existent url if it's not originally defined if not (self._baseurl): return None baseurl = path try: loader = UrlMetaLoader(baseurl) return ModuleSpec(fullname, loader, is_package=loader.is_package(fullname)) except Exception: return None
Next is the loader
From the source code, we know that there are three types of path finders
- FileLoader
- SourceLoader
It stands to reason that both loaders can do what we want, but I'm going to use SourceLoader for my demonstration.
In the abstract class SourceLoader, there are a few important methods that you need to be aware of when writing your implementation of the loader
module.__dict__
In some of the older blog posts, you'll often see that the loader implements load_module(), which was deprecated in Python 3.4, but of course, for compatibility reasons, it's fine if you use load_module().
from importlib import abc class UrlMetaLoader(): def __init__(self, baseurl): = baseurl def get_code(self, fullname): f = (self.get_filename(fullname)) return () def load_module(self, fullname): code = self.get_code(fullname) mod = (fullname, imp.new_module(fullname)) mod.__file__ = self.get_filename(fullname) mod.__loader__ = self mod.__package__ = fullname exec(code, mod.__dict__) return None def get_data(self): pass def execute_module(self, module): pass def get_filename(self, fullname): return + fullname + '.py'
When you implement your own loading using this old pattern, you need to be aware of two important points:
- execute_module must be overloaded and should not have any logic, even if it is not an abstract method.
- load_module, which you need to execute manually in the finder to enable module loading.
As a replacement, you should use execute_module() and create_module(). Since execute_module and create_module() are already implemented in the base class and fulfill our usage scenario. I don't need to implement them again. There is also no need to manually execute execute_module() in the setup finder compared to the old pattern.
import as urllib2 class UrlMetaLoader(): def __init__(self, baseurl): = baseurl def get_code(self, fullname): f = (self.get_filename(fullname)) return () def get_data(self): pass def get_filename(self, fullname): return + fullname + '.py'
With the finder and loader in place, don't forget to register our custom finder (UrlMetaFinder) with sys.meta_path.
def install_meta(address): finder = UrlMetaFinder(address) sys.meta_path.append(finder)
Once all the code is parsed, we organize it in a module (my_importer.py)
# my_importer.py import sys import importlib import as urllib2 class UrlMetaFinder(): def __init__(self, baseurl): self._baseurl = baseurl def find_module(self, fullname, path=None): if path is None: baseurl = self._baseurl else: # Returns a non-existent url if it's not originally defined if not (self._baseurl): return None baseurl = path try: loader = UrlMetaLoader(baseurl) return loader except Exception: return None class UrlMetaLoader(): def __init__(self, baseurl): = baseurl def get_code(self, fullname): f = (self.get_filename(fullname)) return () def get_data(self): pass def get_filename(self, fullname): return + fullname + '.py' def install_meta(address): finder = UrlMetaFinder(address) sys.meta_path.append(finder)
5.2 Build Remote Server
At the very beginning I said to implement a way to import modules remotely.
I'm still missing a server on the remote end to host my modules, and to make it easier, I'm using python's own Modules with a single command.
$ mkdir httpserver && cd httpserver $ cat>my_info.py<EOF name='wangbm' print('ok') EOF $ cat my_info.py name='wangbm' print('ok') $ $ python3 -m 12800 Serving HTTP on 0.0.0.0 port 12800 (http://0.0.0.0:12800/) ... ...
Everything is ready for us to verify.
>>> from my_importer import install_meta >>> install_meta('http://localhost:12800/') # Go to sys.meta_path to register finder >>> import my_info # Print ok, indicating that the import was successful ok >>> my_info.name # Verify that the variables can be obtained 'wangbm'
At this point, I have implemented a simple importer that can import modules from a remote server.
reference document
/zh-cn...
/zh-cn...
...