關於 Python 的 import

2022-07-14 18:02:19

好久以前就被 Python 的相對與絕對匯入所困擾。去年粗淺探究後自以為完全理解,近來又因 sys.path[0]os.getcwd() 的不一致而重新整理了認知...

Python 官方檔案 5. The import system — Python 3.10.5 documentation 當然是最好的學習指南,但全部看完對我來說還是有點難度。這裡只選擇一些要點討論。

from import

import Aimport A as Bfrom A import B 結構中,A 最小隻能到 module。因此,只有使用 from import 結構才可以單獨獲取 module 裡的屬性。另外,相對參照必須使用 from import 結構。

from module import * 將匯入 module 中的所有成員(有單雙下劃線前導的成員除外)。對於 package 可在 __init__.py 中定義 __all__ = ["module", "module", ...] 來手動控制的實際匯入內容。

Package 與 __init__.py

Python 3.3 以後的 package 不再硬性需要 __init__.py,普通資料夾等同於 __init__.py 留空的 namespace package。(關於 regular package 和 namespace package 的區別,參見 5. The import system — Python 3.10.5 documentation

__init__.py 的作用在於當我們直接匯入一個 package 的時候,實際上是執行了 __init__.py。換句話說,直接匯入一個 package 就是把它看做一個邏輯寫在 __init__.py 裡的 module。

需要注意的是,對於形如 A.B.C 的匯入,AA.BA.B.C 對應的 __init__.py 都會被執行。也就是說,只要匯入路徑經過該 package,該 package 的 __init__.py 就會被執行。

Submodules

When a submodule is loaded using any mechanism (e.g. importlib APIs, the import or import-from statements, or built-in __import__()) a binding is placed in the parent module’s namespace to the submodule object. For example, if package spam has a submodule foo, after importing spam.foo, spam will have an attribute foo which is bound to the submodule.

...

Given Python’s familiar name binding rules this might seem surprising, but it’s actually a fundamental feature of the import system. The invariant holding is that if you have sys.modules['spam'] and sys.modules['spam.foo'] (as you would after the above import), the latter must appear as the foo attribute of the former.

5. The import system — Python 3.10.5 documentation

這是說,import 進來的 module 會被掛載到本 module 上作為其屬性。

這個性質可以弄出來很多看上去很奇怪的玩意兒,比如說自己匯入自己後可以 me.me.me.me... 無限巢狀之類的...


另外,對於形如 import A.B.C 的匯入,AA.BA.B.C 都會被掛載到本 module 上。然而,from A.B import C 卻只會掛載 C,而 import A.B.C as D 也只會掛載 D ,即使 AA.B 都被執行且都在 sys.modules 裡。

sys.path

A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of PYTHONPATH.

A program is free to modify this list for its own purposes. Only strings and bytes should be added to sys.path; all other data types are ignored during import.

sys — System-specific parameters and functions — Python 3.10.5 documentation

sys.path 是 Python 搜尋 module 的基準目錄(即絕對匯入)。其由環境變數 PYTHONPATH 和一些預設路徑(和安裝環境有關,參見 PYTHONHOME)組成,而在執行 script 時,script 的所在目錄會被臨時加入 sys.path[0]。如果執行的並不是 script(例如是互動式執行或從 stdin 中讀取指令碼程式碼),sys.path[0] 則被設定為空字串,代表當前工作目錄

sys.path 有優先順序,排在前面的優先順序高。


需要特別注意的是,script 的所在目錄不是當前工作目錄。例如,在 D:\test 下執行

python path/to/file.py

時,sys.path[0]D:\test\path\to\file.py,而當前工作目錄則是 D:\test(也即 os.getcwd())。

當前工作目錄是 Python 尋找其他檔案時的基準路徑,而所有絕對匯入操作都只與 sys.path 有關,兩者是完全不同的。

python -m 的情況稍有不同,參見後文。

python -m

Search sys.path for the named module and execute its contents as the __main__ module.

Since the argument is a module name, you must not give a file extension (.py). The module name should be a valid absolute Python module name, but the implementation may not always enforce this (e.g. it may allow you to use a name that includes a hyphen).

Package names (including namespace packages) are also permitted. When a package name is supplied instead of a normal module, the interpreter will execute <pkg>.__main__ as the main module. This behaviour is deliberately similar to the handling of directories and zipfiles that are passed to the interpreter as the script argument.

Note

This option cannot be used with built-in modules and extension modules written in C, since they do not have Python module files. However, it can still be used for precompiled modules, even if the original source file is not available.

If this option is given, the first element of sys.argv will be the full path to the module file (while the module file is being located, the
first element will be set to "-m"). As with the -c option, the current directory will be added to the start of sys.path.

-I option can be used to run the script in isolated mode where sys.path contains neither the current directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too.

Many standard library modules contain code that is invoked on their execution as a script. An example is the timeit module:

python -m timeit -s 'setup here' 'benchmarked code here'
python -m timeit -h # for details

Raises an auditing event cpython.run_module with argument module-name.

See also

runpy.run_module()

Equivalent functionality directly available to Python code

PEP 338 – Executing modules as scripts

Changed in version 3.1: Supply the package name to run a __main__ submodule.

Changed in version 3.4: namespace packages are also supported

1. Command line and environment — Python 3.10.5 documentation

sys.path 指定的目錄中尋找 module 並以 __main__ module 的身份執行指定 module。

注意不要在名字後面加 .py,因為我們已經把執行的檔案當作 module 來看待。

如果指定的是一個 Package name(即目錄名),將會執行 <pkg>.__main__(即 <pkg>/__main__.py)。

另外,如果使用 python -m a.b.modulesys.argv 的首位將被設定為被執行 module 檔案的完整路徑(與之相對,python a/b/module.pysys.argv[0] 將會是相對當前工作目錄的路徑,即 a/b/module.py);同時,當前工作目錄會被加入 sys.path 的首位。


python -m A.B.module 將順次執行 AA.B__init__.py,即使該 module 沒有任何匯入行為。

python -m 對於直接執行 package 內部的程式碼是必要的。若直接以 script 方式執行,一旦涉及到任何高於該 script 所在目錄(含該目錄)的相對匯入,Python 就會丟擲如下錯誤:

ImportError: attempted relative import with no known parent package

而一個 module 也不能匯入超過 python -m 引數指定的最頂層結構的 module,否則會丟擲錯誤:

ImportError: attempted relative import beyond top-level package

sys.modules

The first place checked during import search is sys.modules. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths. So if foo.bar.baz was previously imported, sys.modules will contain entries for foo, foo.bar, and foo.bar.baz. Each key will have as its value the corresponding module object.

During import, the module name is looked up in sys.modules and if present, the associated value is the module satisfying the import, and the process completes. However, if the value is None, then a ModuleNotFoundError is raised. If the module name is missing, Python will continue searching for the module.

sys.modules is writable. Deleting a key may not destroy the associated module (as other modules may hold references to it),
but it will invalidate the cache entry for the named module, causing Python to search anew for the named module upon its next
import. The key can also be assigned to None, forcing the next import of the module to result in a ModuleNotFoundError.

Beware though, as if you keep a reference to the module object, invalidate its cache entry in sys.modules, and then re-import the named module, the two module objects will not be the same. By contrast, importlib.reload() will reuse the same module object, and simply reinitialise the module contents by rerunning the module’s code.

5. The import system — Python 3.10.5 documentation

sys.modules 是一個 dict,Python 在匯入之前會去檢查 sys.module 裡是否已經存有需要的 module 的 module object。如果有,就直接用這個;如果值為 None(意思是以前找過但沒找到),就直接報錯;如果該鍵值對不存在,就繼續搜尋過程。總之,sys.modules 扮演了一個類似 cache 的角色。

對於形如 A.B.C 的匯入,Python 會順次匯入 AA.BA.B.C 並把他們加入 sys.modules

參考