import requests import time def download_one(url): resp = requests.get(url) print('Read {} from {}'.format(len(resp.content), url)) def download_all(sites): for site in sites: download_one(site) def main(): sites = [ 'http://c.biancheng.net', 'http://c.biancheng.net/c', 'http://c.biancheng.net/python' ] start_time = time.perf_counter() download_all(sites) end_time = time.perf_counter() print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time)) if __name__ == '__main__': main()輸出結果為:
Read 52053 from http://c.biancheng.net
Read 30718 from http://c.biancheng.net/c
Read 34470 from http://c.biancheng.net/python
Download 3 sites in 0.3537296 seconds
這種方式應該是最直接也最簡單的:注意,此程式中,requests 模組需單獨安裝,可通過執行 pip install requests 命令進行安裝。
import concurrent.futures import requests import threading import time def download_one(url): resp = requests.get(url) print('Read {} from {}'.format(len(resp.content), url)) def download_all(sites): with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: executor.map(download_one, sites) def main(): sites = [ 'http://c.biancheng.net', 'http://c.biancheng.net/c', 'http://c.biancheng.net/python' ] start_time = time.perf_counter() download_all(sites) end_time = time.perf_counter() print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time)) if __name__ == '__main__': main()執行結果為:
Read 52053 from http://c.biancheng.net
Read 30718 from http://c.biancheng.net/c
Read 34470 from http://c.biancheng.net/python
Download 3 sites in 0.1606366 seconds
上面兩段程式碼中,多執行緒版本和單執行緒版的主要區別在於如下程式碼:注意,雖然執行緒的數量可以自己定義,但是執行緒數並不是越多越好,因為執行緒的建立、維護和刪除也會有一定的開銷,所以如果設定的很大,反而可能會導致速度變慢。我們往往需要根據實際的需求做一些測試,來尋找最優的執行緒數量。
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: executor.map(download_one, sites)這裡建立了一個執行緒池,總共有 5 個執行緒可以分配使用。executer.map() 與前面所講的 Python 內建的 map() 函數類似,表示對 sites 中的每一個元素並行地呼叫函數 download_one()。
with futures.ThreadPoolExecutor(workers) as executor #=> with futures.ProcessPoolExecutor() as executor:這部分程式碼中,函數 ProcessPoolExecutor() 表示建立進程池,使用多個進程並行的執行程式。不過,這裡通常省略引數 workers,因為系統會自動返回 CPU 的數量作為可以呼叫的進程數。
import concurrent.futures import requests import time def download_one(url): resp = requests.get(url) print('Read {} from {}'.format(len(resp.content), url)) def download_all(sites): with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: to_do = [] for site in sites: future = executor.submit(download_one, site) to_do.append(future) for future in concurrent.futures.as_completed(to_do): future.result() def main(): sites = [ 'http://c.biancheng.net', 'http://c.biancheng.net/c', 'http://c.biancheng.net/python' ] start_time = time.perf_counter() download_all(sites) end_time = time.perf_counter() print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time)) if __name__ == '__main__': main()執行結果為:
Read 52053 from http://c.biancheng.net
Read 34470 from http://c.biancheng.net/python
Read 30718 from http://c.biancheng.net/c
Download 3 sites in 0.2275894 seconds