問題描述
我正在試驗 Python 3.2 中引入的新的閃亮 concurrent.futures 模塊,并且我注意到,幾乎使用相同的代碼,使用 concurrent.futures 中的 Pool 比使用 方式.html#multiprocessing.pool.Pool">multiprocessing.Pool.
I was experimenting with the new shiny concurrent.futures module introduced in Python 3.2, and I've noticed that, almost with identical code, using the Pool from concurrent.futures is way slower than using multiprocessing.Pool.
這是使用多處理的版本:
This is the version using multiprocessing:
def hard_work(n):
# Real hard work here
pass
if __name__ == '__main__':
from multiprocessing import Pool, cpu_count
try:
workers = cpu_count()
except NotImplementedError:
workers = 1
pool = Pool(processes=workers)
result = pool.map(hard_work, range(100, 1000000))
這是使用concurrent.futures:
And this is using concurrent.futures:
def hard_work(n):
# Real hard work here
pass
if __name__ == '__main__':
from concurrent.futures import ProcessPoolExecutor, wait
from multiprocessing import cpu_count
try:
workers = cpu_count()
except NotImplementedError:
workers = 1
pool = ProcessPoolExecutor(max_workers=workers)
result = pool.map(hard_work, range(100, 1000000))
使用從 Eli Bendersky 文章,這些是我電腦上的結果(i7、64 位、Arch Linux):
Using a na?ve factorization function taken from this Eli Bendersky article, these are the results on my computer (i7, 64-bit, Arch Linux):
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:10] $ time python pool_multiprocessing.py
real 0m10.330s
user 1m13.430s
sys 0m0.260s
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:29] $ time python pool_futures.py
real 4m3.939s
user 6m33.297s
sys 0m54.853s
我無法使用 Python 分析器分析這些,因為我遇到了 pickle 錯誤.有什么想法嗎?
I cannot profile these with the Python profiler because I get pickle errors. Any ideas?
推薦答案
當使用 concurrent.futures
中的 map
時,每個元素都來自可迭代的 單獨提交給執行器,執行器創建一個Future
對象每次通話.然后它返回一個迭代器,該迭代器產生期貨返回的結果.未來
對象是相當重量級的,它們做了很多工作來允許它們提供的所有功能(如回調、取消能力、檢查狀態......).
When using map
from concurrent.futures
, each element from the iterable is submitted separately to the executor, which creates a Future
object for each call. It then returns an iterator which yields the results returned by the futures.
Future
objects are rather heavyweight, they do a lot of work to allow all the features they provide (like callbacks, ability to cancel, check status, ...).
與此相比,multiprocessing.Pool
的開銷要少得多.批量提交作業(減少IPC開銷),直接使用函數返回的結果.對于大批量的工作,多處理絕對是更好的選擇.
Compared to that, multiprocessing.Pool
has much less overhead. It submits jobs in batches (reducing IPC overhead), and directly uses the result returned by the function. For big batches of jobs, multiprocessing is definitely the better options.
Future 非常棒,如果您想匯總開銷不那么重要的長時間運行的作業,您希望通過回調收到通知或不時檢查它們是否已完成或能夠取消單獨執行.
Futures are great if you want to sumbit long running jobs where the overhead isn't that important, where you want to be notified by callback or check from time to time to see if they're done or be able to cancel the execution individually.
個人筆記:
我真的想不出太多使用 Executor.map
的理由——它沒有給你任何期貨的特性——除了指定超時的能力.如果您只對結果感興趣,最好使用 multiprocessing.Pool
的映射函數之一.
I can't really think of much reasons to use Executor.map
- it doesn't give you any of the features of futures - except for the ability to specify a timeout. If you're just interested in the results, you're better off using one of multiprocessing.Pool
's map functions.
這篇關于來自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!