問題描述
我編寫了一個小腳本來在 4 個線程之間分配工作負載并測試結果是否保持有序(相對于輸入的順序):
I have written a little script to distribute workload between 4 threads and to test whether the results stay ordered (in respect to the order of the input):
from multiprocessing import Pool
import numpy as np
import time
import random
rows = 16
columns = 1000000
vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns)
def worker(arr):
time.sleep(random.random()) # let the process sleep a random
for idx in np.ndindex(arr.shape): # amount of time to ensure that
arr[idx] += 1 # the processes finish at different
# time steps
return arr
# create the threadpool
with Pool(4) as p:
# schedule one map/worker for each row in the original data
q = p.map(worker, [row for row in vals])
for idx, row in enumerate(q):
print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1]))
對我來說,這總是會導致:
For me this always results in:
[00]: 1 - 1000000
[01]: 1000001 - 2000000
[02]: 2000001 - 3000000
[03]: 3000001 - 4000000
[04]: 4000001 - 5000000
[05]: 5000001 - 6000000
[06]: 6000001 - 7000000
[07]: 7000001 - 8000000
[08]: 8000001 - 9000000
[09]: 9000001 - 10000000
[10]: 10000001 - 11000000
[11]: 11000001 - 12000000
[12]: 12000001 - 13000000
[13]: 13000001 - 14000000
[14]: 14000001 - 15000000
[15]: 15000001 - 16000000
問題:那么,Pool
在q<中存儲每個
map
函數的結果時,是否真的保持原始輸入的順序?/代碼>?
Question: So, does Pool
really keep the original input's order when storing the results of each map
function in q
?
旁注:我問這個,因為我需要一種簡單的方法來并行處理多個工人的工作.在某些情況下,排序無關緊要.但是,在某些情況下(如 q
中的結果)必須以原始順序返回,因為我使用了一個依賴于有序數據的附加 reduce 函數.
Sidenote: I am asking this, because I need an easy way to parallelize work over several workers. In some cases the ordering is irrelevant. However, there are some cases where the results (like in q
) have to be returned in the original order, because I'm using an additional reduce function that relies on ordered data.
性能:在我的機器上,這個操作比在單個進程上的正常執行快了大約 4 倍(正如預期的那樣,因為我有 4 個內核).此外,所有 4 個內核在運行時均處于 100% 的使用率.
Performance: On my machine this operation is about 4 times faster (as expected, since I have 4 cores) than normal execution on a single process. Additionally, all 4 cores are at 100% usage during the runtime.
推薦答案
Pool.map
結果是有序的.如果您需要訂購,很好;如果你不這樣做,池.imap_unordered
可能是一個有用的優化.
Pool.map
results are ordered. If you need order, great; if you don't, Pool.imap_unordered
may be a useful optimization.
請注意,雖然您從 Pool.map
接收結果的順序是固定的,但它們的計算順序是任意的.
Note that while the order in which you receive the results from Pool.map
is fixed, the order in which they are computed is arbitrary.
這篇關于Python 3:Pool 是否保持傳遞給 map 的原始數據順序?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!