問題描述
如果這對某些人來說太簡單了,我很抱歉,但我仍然不明白 python 的多處理的技巧.我讀過
http://docs.python.org/dev/library/multiprocessing
http://pymotw.com/2/multiprocessing/basics.html以及谷歌提供給我的許多其他教程和示例……其中許多也來自這里.
I'm sorry if this is too simple for some people, but I still don't get the trick with python's multiprocessing. I've read
http://docs.python.org/dev/library/multiprocessing
http://pymotw.com/2/multiprocessing/basics.html
and many other tutorials and examples that google gives me... many of them from here too.
嗯,我的情況是我必須計(jì)算許多 numpy 矩陣,然后我需要將它們存儲(chǔ)在單個(gè) numpy 矩陣中.假設(shè)我想使用 20 個(gè)內(nèi)核(或者我可以使用 20 個(gè)內(nèi)核),但我沒有成功使用池資源,因?yàn)樗惯M(jìn)程保持活動(dòng)狀態(tài)直到池死亡".所以我想做這樣的事情:
Well, my situation is that I have to compute many numpy matrices and I need to store them in a single numpy matrix afterwards. Let's say I want to use 20 cores (or that I can use 20 cores) but I haven't managed to successfully use the pool resource since it keeps the processes alive till the pool "dies". So I thought on doing something like this:
from multiprocessing import Process, Queue
import numpy as np
def f(q,i):
q.put( np.zeros( (4,4) ) )
if __name__ == '__main__':
q = Queue()
for i in range(30):
p = Process(target=f, args=(q,))
p.start()
p.join()
result = q.get()
while q.empty() == False:
result += q.get()
print result
但是看起來這些進(jìn)程不是并行運(yùn)行的,而是按順序運(yùn)行的(如果我錯(cuò)了,請糾正我)而且我不知道它們在計(jì)算后是否會(huì)死掉(所以超過 20處理那些完成了他們的工作的那些,讓核心空閑給另一個(gè)進(jìn)程).另外,對于一個(gè)非常大的數(shù)字(比如 100.000),將所有這些矩陣(可能也很大)存儲(chǔ)在一個(gè)隊(duì)列中會(huì)占用大量內(nèi)存,從而使代碼變得無用,因?yàn)槲覀兊南敕ㄊ菍⒚總€(gè)結(jié)果都放在每次迭代中在最終結(jié)果中,就像使用了一個(gè)鎖(以及它的 acquire() 和 release() 方法),但是如果這段代碼不是用于并行處理,那么鎖也是無用的......
but then it looks like the processes don't run in parallel but they run sequentially (please correct me if I'm wrong) and I don't know if they die after they do their computation (so for more than 20 processes the ones that did their part leave the core free for another process). Plus, for a very large number (let's say 100.000), storing all those matrices (which may be really big too) in a queue will use a lot of memory, rendering the code useless since the idea is to put every result on each iteration in the final result, like using a lock (and its acquire() and release() methods), but if this code isn't for parallel processing, the lock is useless too...
我希望有人可以幫助我.
I hope somebody may help me.
提前致謝!
推薦答案
你是對的,在你的例子中它們是按順序執(zhí)行的.
You are correct, they are executing sequentially in your example.
p.join()
導(dǎo)致當(dāng)前線程阻塞,直到它完成執(zhí)行.您要么希望在 for 循環(huán)之外單獨(dú)加入進(jìn)程(例如,通過將它們存儲(chǔ)在列表中然后對其進(jìn)行迭代),要么使用類似 numpy.Pool
和 apply_async
帶有回調(diào).這也可以讓您直接將其添加到結(jié)果中,而不是保留對象.
p.join()
causes the current thread to block until it is finished executing. You'll either want to join your processes individually outside of your for loop (e.g., by storing them in a list and then iterating over it) or use something like numpy.Pool
and apply_async
with a callback. That will also let you add it to your results directly rather than keeping the objects around.
例如:
def f(i):
return i*np.identity(4)
if __name__ == '__main__':
p=Pool(5)
result = np.zeros((4,4))
def adder(value):
global result
result += value
for i in range(30):
p.apply_async(f, args=(i,), callback=adder)
p.close()
p.join()
print result
關(guān)閉并在最后加入池可確保池的進(jìn)程已完成并且 result
對象已完成計(jì)算.您還可以使用 Pool.imap
作為解決問題的方法進(jìn)行調(diào)查.該特定解決方案看起來像這樣:
Closing and then joining the pool at the end ensures that the pool's processes have completed and the result
object is finished being computed. You could also investigate using Pool.imap
as a solution to your problem. That particular solution would look something like this:
if __name__ == '__main__':
p=Pool(5)
result = np.zeros((4,4))
im = p.imap_unordered(f, range(30), chunksize=5)
for x in im:
result += x
print result
這更適合您的具體情況,但可能不適用于您最終想要做的任何事情.
This is cleaner for your specific situation, but may not be for whatever you are ultimately trying to do.
至于存儲(chǔ)所有不同的結(jié)果,如果我理解您的問題,您可以將其添加到回調(diào)方法(如上)或使用 imap
/imap_unordered
(它仍然存儲(chǔ)結(jié)果,但您會(huì)在構(gòu)建時(shí)將其清除).然后它不需要存儲(chǔ)比添加到結(jié)果中更長的時(shí)間.
As to storing all of your varied results, if I understand your question, you can just add it off into a result in the callback method (as above) or item-at-a-time using imap
/imap_unordered
(which still stores the results, but you'll clear it as it builds). Then it doesn't need to be stored for longer than it takes to add to the result.
這篇關(guān)于用于并行進(jìn)程的 Python 多處理的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!