問題描述
多處理是一個很棒的工具,但使用大內存塊并不是那么直接.您可以在每個進程中加載??塊并將結果轉儲到磁盤上,但有時您需要將結果存儲在內存中.最重要的是,使用花哨的 numpy 功能.
Multiprocessing is a wonderful tool but is not so straight forward to use large memory chunks with it. You can load chunks in each process and dump results on disk but sometimes you need to store the results in the memory. And on top, use the fancy numpy functionality.
我已經閱讀/谷歌了很多,并想出了一些答案:
I have read/googled a lot and came up with some answers:
在共享內存中使用numpy數組進行多處理
在多處理進程之間共享大型只讀 Numpy 數組
Python 多處理全局 numpy 數組
如何如何在 python 子進程之間傳遞大型 numpy 數組而不保存到磁盤?
等等等等等等.
它們都有缺點:不太主流的庫(sharedmem
);全局存儲變量;不太容易閱讀代碼、管道等.
They all have drawbacks: Not-so-mainstream libraries (sharedmem
); globally storing variables; not so easy to read code, pipes, etc etc.
我的目標是在我的工作人員中無縫使用 numpy,而不用擔心轉換和其他東西.
My goal was to seamlessly use numpy in my workers without worrying about conversions and stuff.
經過多次試驗,我想出了 this.它適用于我的 ubuntu 16、python 3.6、16GB、8 核機器.與以前的方法相比,我做了很多捷徑".沒有全局共享狀態,沒有需要在 worker 內部轉換為 numpy 的純內存指針,作為進程參數傳遞的大型 numpy 數組等.
After much trials I came up with this. And it works on my ubuntu 16, python 3.6, 16GB, 8 core machine. I did a lot of "shortcuts" compared to previous approaches. No global shared state, no pure memory pointers that need to be converted to numpy inside workers, large numpy arrays passed as process arguments, etc.
Pastebin 鏈接上面,但我會在這里放幾個片段.
Pastebin link above, but I will put few snippets here.
一些進口:
import numpy as np
import multiprocessing as mp
import multiprocessing.sharedctypes
import ctypes
分配一些共享內存并將其包裝到一個 numpy 數組中:
Allocate some shared mem and wrap it into an numpy array:
def create_np_shared_array(shape, dtype, ctype)
. . . .
shared_mem_chunck = mp.sharedctypes.RawArray(ctype, size)
numpy_array_view = np.frombuffer(shared_mem_chunck, dtype).reshape(shape)
return numpy_array_view
創建共享數組并在其中放入一些東西
Create shared array and put something in it
src = np.random.rand(*SHAPE).astype(np.float32)
src_shared = create_np_shared_array(SHAPE,np.float32,ctypes.c_float)
dst_shared = create_np_shared_array(SHAPE,np.float32,ctypes.c_float)
src_shared[:] = src[:] # Some numpy ops accept an 'out' array where to store the results
產生進程:
p = mp.Process(target=lengthly_operation,args=(src_shared, dst_shared, k, k + STEP))
p.start()
p.join()
以下是一些結果(完整參考請參見 pastebin 代碼):
Here are some results (see pastebin code for full reference):
Serial version: allocate mem 2.3741257190704346 exec: 17.092209577560425 total: 19.46633529663086 Succes: True
Parallel with trivial np: allocate mem 2.4535582065582275 spawn process: 0.00015354156494140625 exec: 3.4581971168518066 total: 5.911908864974976 Succes: False
Parallel with shared mem np: allocate mem 4.535916328430176 (pure alloc:4.014216661453247 copy: 0.5216996669769287) spawn process: 0.00015664100646972656 exec: 3.6783478260040283 total: 8.214420795440674 Succes: True
我還做了一個 cProfile
(為什么在分配共享內存時要多花 2 秒?)并意識到有一些對 tempfile.py
、{ 的調用'_io.BufferedWriter' 對象的 'write' 方法}
.
I also did a cProfile
(why 2 extra seconds when allocating shared mem?) and realized that there are some calls to the tempfile.py
, {method 'write' of '_io.BufferedWriter' objects}
.
問題
- 我做錯了嗎?
- (大型)陣列是否來回腌制而我沒有獲得任何加快速度的東西?請注意,第二次運行(使用常規 np 數組未通過正確性測試)
- 有沒有辦法進一步改進時序、代碼清晰度等?(針對多處理范例)
備注
- 我不能使用進程池,因為 mem 必須在 fork 處繼承,而不是作為參數發送.
推薦答案
共享數組的分配很慢,因為顯然是先寫入磁盤,所以可以通過mmap共享.有關參考,請參閱 heap.py 和 sharedctypes.py.這就是 tempfile.py
出現在分析器中的原因.我認為這種方法的優點是共享內存在崩潰的情況下會被清理干凈,而 POSIX 共享內存無法保證這一點.
Allocation of the shared array is slow, because apparently it's written to disk first, so it can be shared through a mmap. For reference see heap.py and sharedctypes.py.
This is why tempfile.py
shows up in the profiler. I think the advantage of this approach is that the shared memory is cleaned up in case of a crash, and this cannot be guaranteed with POSIX shared memory.
感謝 fork,您的代碼不會發生酸洗,正如您所說,內存是繼承的.第二次運行不起作用的原因是因為不允許子進程寫入父進程的內存.相反,私有頁面是動態分配的,只有在子進程結束時才會被丟棄.
There is no pickling happening with your code, thanks to fork and, as you said, the memory is inherited. The reason the 2nd run doesn't work is because the child processes are not allowed to write in the memory of the parent. Instead, private pages are allocated on the fly, only to be discared when the child process ends.
我只有一個建議:你不必自己指定ctype,可以通過np.ctypeslib._typecodes
從numpy dtype中找出正確的類型.或者只是對所有內容使用 c_byte
并使用 dtype itemsize 來計算緩沖區的大小,無論如何它都會被 numpy 強制轉換.
I only have one suggestion: You don't have to specify a ctype yourself, the right type can be figured out from the numpy dtype through np.ctypeslib._typecodes
. Or just use c_byte
for everything and use the dtype itemsize to figure out the size of the buffer, it will be casted by numpy anyway.
這篇關于共享內存中用于多處理的大型 numpy 數組:這種方法有問題嗎?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!