Python：對程式做效能分析及計時統計

1.對整個程式的效能分析

如果只是想簡單地對整個程式做計算統計，通常使用UNIX下的time命令就足夠了。

(base) ➜  Learn-Python time python someprogram.py       
python someprogram.py  0.10s user 0.01s system 98% cpu 0.117 total

由於我用的是Mac系統，和Linux系統的輸出可能有不同，不過關鍵都是這三個時間：

user: 執行使用者態程式碼所花費的時間，也即CPU實際用於執行該程序的時間，其他程序和程序阻塞的時間不計入此數位。
system: 在核心中執行系統呼叫（如I/O呼叫）所花費的CPU時間。
total(Linux下應該是real):即掛鐘時間(wall-clock time)，也稱響應時間(response time)、消逝時間(elapsed time)，是程序執行開始到結束所有經過的時間，包括了程序使用的時間片和程序阻塞的時間（例如等待I/O完成）。

請注意，若user + system > total，可能存在多個處理器並行工作；
若user + system < total，則可能在等待磁碟、網路或其它裝置的響應。

也就說上面這個程式的掛鐘時間為0.251s，CPU實際用於執行該程序的時間為0.24s，用於系統呼叫的時間為0.01s。

再來看看另外一個極端，如果想針對程式的行為產生一份詳細的報告，那麼可以使用cProfile模組：

(base) ➜  Learn-Python python -m cProfile someprogram.py
         7 function calls in 0.071 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002    0.071    0.071 someprogram.py:1(<module>)
        1    0.039    0.039    0.068    0.068 someprogram.py:1(func1)
        1    0.029    0.029    0.029    0.029 someprogram.py:3(<listcomp>)
        1    0.000    0.000    0.001    0.001 someprogram.py:7(func2)
        1    0.000    0.000    0.000    0.000 someprogram.py:9(<listcomp>)
        1    0.000    0.000    0.071    0.071 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

可見我們上述程式碼的熱點是在於func1函數。

這裡再多說幾句，這裡傳入的-m -cProfile可選引數意為將Python的cPofile模組做為指令碼執行，實際上等價於：

python /Users/orion-orion/miniforge3/lib/python3.9/cProfile.py someprogram.py

當然，中間那個路徑取決於大家各自的環境。這也就是說我們將some_program.py做為cProfile.py程式的輸入引數，目的就是對其進行效能分析。

2.對特定程式碼段做效能分析

2.1 分析函數和語句塊

不過對於做程式碼效能分析而言，更常見的情況則處於上述兩個極端情況之間。

比如，我們可能已經知道了程式碼把大部分執行時間都花在幾個某幾個函數上了。要對函數進行效能分析，使用裝飾器就能辦到。範例如下：

import time
from functools import wraps

def timethis(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        r = func(*args, **kwargs)
        end = time.perf_counter()
        print("{}.{} : {}".format(func.__module__, func.__name__, end - start))
        return r
    
    return wrapper

要使用這個裝飾器，只要簡單地將其放在函數定義之前，就能得到對應函數的計時資訊了。範例如下：

@timethis
def countdown(n):
    while n > 0:
        n -= 1

countdown(10000000)

控制檯列印輸出：

__main__.countdown : 0.574160792

請注意，在進行效能統計時，任何得到的結果都是近似值。我們這裡使用的函數time.perf_counter()是能夠提供給定平臺上精度最高的計時器，它返回一個秒級的時間值。但是，它計算的仍然是掛鐘時間（牆上時間），這會受到許多不同因素的影響（例如機器當前的負載），且它會將程式等待中斷的sleep（休眠）時間也計算在內。

如果相對於掛鐘時間，我們更感興趣的是程序時間（包括在核心態和使用者態中所花費的CPU時間），那麼可以使用time.process_time()來替代。範例如下：

def timethis(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.process_time()
        r = func(*args, **kwargs)
        end = time.process_time()
        print("{}.{} : {}".format(func.__module__, func.__name__, end - start))
        return r
    
    return wrapper

接下來我們看如何對語句塊進行計算統計，這可以通過定義一個上下文管理器來實現。範例如下：

from contextlib import contextmanager

@contextmanager
def timeblock(label):
    start = time.perf_counter()
    try:
        yield
    finally:
        end = time.perf_counter()
        print("{} : {}". format(label, end - start))

下面這個例子演示了這個上下文管理器是如何工作的：

with timeblock("counting"):
    n = 10000000
    while n > 0:
        n -= 1

控制檯列印輸出如下所示：

counting : 0.7888195419999999

最後，我們來看一種一勞永逸的方案：在time模組中的函數之上構建一個更高層的介面來模擬秒錶，從而解決對函數、對程式碼塊的計時問題。

import time

class Timer:
    def __init__(self, func=time.perf_counter):
        self.elapsed = 0.0
        self._func = func
        self._start = None
    
    def start(self):
        if self._start is not None:
            raise RuntimeError("Already started!")
        self._start = self._func()
    
    def stop(self):
        if self._start is None:
            raise RuntimeError("Not started!")
        end = self._func()
        self.elapsed += end - self._start
        self._start = None
    
    def reset(self):
        self.elapsed = 0.0
    
    @property
    def running(self):
        return self._start is not None

    def __enter__(self):
        self.start()
        return self
    
    def __exit__(self, *args):
        self.stop()

這個類定義了一個定時器，可以根據使用者的需要啟動、停止和重置它。Timer類將總的花費時間記錄在elapsed屬性中。下面的範例展示瞭如何使用這個類：

t = Timer()

# Use 1: Explicit start/stop
t.start()
countdown(1000000)
t.stop()
print(t.elapsed)
# 0.058305625

# Use 2: As a context manager
with t:
    countdown(1000000)
print(t.elapsed)
# 0.11482683300000004

with Timer() as t2:
    countdown(1000000)
print(t2.elapsed)
# 0.056095916999999995

如同前面所展示的，由Timer類記錄的時間是掛鐘時間，其中包含了所有的sleeping時間。如果僅想獲取程序的CPU時間（包括在使用者態和核心態中的時間），可以用time.process_time()取代。範例如下：

t = Timer(time.process_time)
with t:
    countdown(1000000)
print(t.elapsed)
# 0.05993699999999999

2.2 分析單條程式碼片段

如果要對短小的程式碼片段做效能統計，timeit模組會很有幫助。範例如下：

from timeit import timeit

print(timeit("math.sqrt(2)", "import math"))
# 0.07840395799999997

print(timeit("sqrt(2)", "from math import sqrt"))
# 0.05943025000000002

timeit會執行第一個引數中指定的語句一百萬次，然後計算時間。第二個引數是一個設定字串，在執行測試之前會先執行以設定好環境。如果要修改需要迭代的次數，只需要提供一個number引數即可：

print(timeit("math.sqrt(2)", "import math", number=10000000))
# 0.7569702089999999

print(timeit("sqrt(2)", "from math import sqrt", number=10000000))
# 0.5865757500000002

最後但同樣重要的是，如果打算進行詳細的計時統計分析，請確保先閱讀time、timeit以及其他相關模組的檔案。這樣才能理解不同系統平臺之間的重要差異以及其他一些缺陷。

參考

[1] https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1
[2] https://stackoverflow.com/questions/7335920/what-specifically-are-wall-clock-time-user-cpu-time-and-system-cpu-time-in-uni
[3] https://stackoverflow.com/questions/22241420/execution-of-python-code-with-m-option-or-not
[4] https://stackoverflow.com/questions/66036844/time-time-or-time-perf-counter-which-is-faster
[5] Martelli A, Ravenscroft A, Ascher D. Python cookbook[M]. " O'Reilly Media, Inc.", 2015.