問題描述
我有一個當前正在運行的模擬,但 ETA 大約需要 40 小時 - 我正在嘗試通過多處理來加速它.
I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.
它本質上迭代了一個變量 (L) 的 3 個值,以及第二個變量 (a) 的 99 個值.使用這些值,它實際上運行了一個復雜的模擬并返回 9 個不同的標準偏差.因此(盡管我還沒有這樣編碼)它本質上是一個函數,它接受兩個值作為輸入 (L,a) 并返回 9 個值.
It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.
這是我擁有的代碼的精髓:
Here is the essence of the code I have:
STD_1 = []
STD_2 = []
# etc.
for L in range(0,6,2):
for a in range(1,100):
### simulation code ###
STD_1.append(value_1)
STD_2.append(value_2)
# etc.
以下是我可以修改的內容:
Here is what I can modify it to:
master_list = []
def simulate(a,L):
### simulation code ###
return (a,L,STD_1, STD_2 etc.)
for L in range(0,6,2):
for a in range(1,100):
master_list.append(simulate(a,L))
由于每個模擬都是獨立的,因此它似乎是實現某種多線程/處理的理想場所.
Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.
我將如何編寫這個代碼?
How exactly would I go about coding this?
另外,是否所有內容都會按順序返回到主列表,或者如果多個進程正在工作,它可能會出現故障?
Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?
編輯 2:這是我的代碼——但它運行不正確.它詢問我是否想在我運行程序后立即終止它.
EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.
import multiprocessing
data = []
for L in range(0,6,2):
for a in range(1,100):
data.append((L,a))
print (data)
def simulation(arg):
# unpack the tuple
a = arg[1]
L = arg[0]
STD_1 = a**2
STD_2 = a**3
STD_3 = a**4
# simulation code #
return((STD_1,STD_2,STD_3))
print("1")
p = multiprocessing.Pool()
print ("2")
results = p.map(simulation, data)
編輯 3:還有什么是多處理的限制.我聽說它不能在 OS X 上運行.這是正確的嗎?
EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?
推薦答案
- 將每次迭代的數據包裝成一個元組.
- 列出這些元組的
data
- 編寫函數
f
處理一個元組并返回一個結果 - 創建
p = multiprocessing.Pool()
對象. - 調用
results = p.map(f, data)
- Wrap the data for each iteration up into a tuple.
- Make a list
data
of those tuples - Write a function
f
to process one tuple and return one result - Create
p = multiprocessing.Pool()
object. - Call
results = p.map(f, data)
這將運行盡可能多的 f
實例,因為您的機器在不同進程中擁有內核.
This will run as many instances of f
as your machine has cores in separate processes.
Edit1:示例:
from multiprocessing import Pool
data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]
def f(t):
name, a, b, c = t
return (name, a + b + c)
p = Pool()
results = p.map(f, data)
print results
多處理應該可以在 OSX 等類 UNIX 平臺上正常工作.只有缺少 os.fork
的平臺(主要是 MS Windows)需要特別注意.但即使在那里它仍然有效.請參閱多處理文檔.
Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork
(mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.
這篇關于具有單個函數的 Python 多處理的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!