問題描述
我是 Python 中多處理的新手,并試圖弄清楚是否應該使用 Pool 或 Process 來異步調用兩個函數.我有兩個函數進行 curl 調用并將信息解析為 2 個單獨的列表.根據互聯網連接,每個功能可能需要大約 4 秒.我意識到瓶頸在于 ISP 連接,多處理不會加快速度,但讓它們都啟動異步會很好.另外,這對我來說是一個很好的學習經驗,可以讓我進入 python 的多處理,因為我以后會更多地使用它.
I'm new to multiprocessing in Python and trying to figure out if I should use Pool or Process for calling two functions async. The two functions I have make curl calls and parse the information into a 2 separate lists. Depending on the internet connection, each function could take about 4 seconds each. I realize that the bottleneck is in the ISP connection and multiprocessing won't speed it up much, but it would be nice to have them both kick off async. Plus, this is a great learning experience for me to get into python's multi-processing because I will be using it more later.
我已閱讀 Python multiprocessing.Pool: 什么時候使用 apply、apply_async 或 map? 這很有用,但我還是有自己的問題.
I have read Python multiprocessing.Pool: when to use apply, apply_async or map? and it was useful, but still had my own questions.
所以我可以做到的一種方法是:
So one way I could do it is:
def foo():
pass
def bar():
pass
p1 = Process(target=foo, args=())
p2 = Process(target=bar, args=())
p1.start()
p2.start()
p1.join()
p2.join()
我對此實施的疑問是:1)由于連接阻塞直到調用進程完成......這是否意味著p1進程必須在p2進程啟動之前完成?我一直認為 .join() 與 pool.apply() 和 pool.apply_sync().get() 相同,其中父進程在當前運行完成之前無法啟動另一個進程(任務).
Questions I have for this implementation is: 1) Since join blocks until calling process is completed...does this mean p1 process has to finish before p2 process is kicked off? I always understood the .join() be the same as pool.apply() and pool.apply_sync().get() where the parent process can not launch another process(task) until the current one running is completed.
另一種選擇是:
def foo():
pass
def bar():
pass
pool = Pool(processes=2)
p1 = pool.apply_async(foo)
p1 = pool.apply_async(bar)
我對此實施的疑問是:1) 我需要 pool.close()、pool.join() 嗎?2) pool.map() 在我得到結果之前會讓它們全部完成嗎?如果是這樣,它們是否仍然異步運行?3) pool.apply_async() 與使用 pool.apply() 執行每個進程有何不同4) 這與之前的 Process 實現有何不同?
Questions I have for this implementation would be: 1) Do I need a pool.close(), pool.join()? 2) Would pool.map() make them all complete before I could get results? And if so, are they still ran asynch? 3) How would pool.apply_async() differ from doing each process with pool.apply() 4) How would this differ from the previous implementation with Process?
推薦答案
您列出的兩個方案完成相同的事情,但方式略有不同.
The two scenarios you listed accomplish the same thing but in slightly different ways.
第一個場景啟動兩個單獨的進程(稱為 P1 和 P2)并啟動 P1 運行 foo
和 P2 運行 bar
,然后等待直到兩個進程都完成它們各自的任務.
The first scenario starts two separate processes (call them P1 and P2) and starts P1 running foo
and P2 running bar
, and then waits until both processes have finished their respective tasks.
第二個場景啟動兩個進程(稱為 Q1 和 Q2),首先在 Q1 或 Q2 上啟動 foo
,然后在 Q1 或 Q2 上啟動 bar
.然后代碼等待,直到兩個函數調用都返回.
The second scenario starts two processes (call them Q1 and Q2) and first starts foo
on either Q1 or Q2, and then starts bar
on either Q1 or Q2. Then the code waits until both function calls have returned.
所以最終結果實際上是相同的,但在第一種情況下,您可以保證在不同的進程上運行 foo
和 bar
.
So the net result is actually the same, but in the first case you're guaranteed to run foo
and bar
on different processes.
至于您對并發的具體問題,Process
上的 .join()
方法確實會阻塞,直到進程完成,但是因為您調用了 .start()
在加入之前在 P1 和 P2(在您的第一個場景中)上,然后兩個進程將異步運行.但是,解釋器會等到 P1 完成后再嘗試等待 P2 完成.
As for the specific questions you had about concurrency, the .join()
method on a Process
does indeed block until the process has finished, but because you called .start()
on both P1 and P2 (in your first scenario) before joining, then both processes will run asynchronously. The interpreter will, however, wait until P1 finishes before attempting to wait for P2 to finish.
對于關于池場景的問題,從技術上講,您應該使用 pool.close()
但這有點取決于您之后可能需要它做什么(如果它超出了范圍,那么您不需要關閉它).pool.map()
是一種完全不同的動物,因為它將一堆參數(異步)分配給同一個函數,跨池進程,然后等待直到所有函數調用都完成之前返回結果列表.
For your questions about the pool scenario, you should technically use pool.close()
but it kind of depends on what you might need it for afterwards (if it just goes out of scope then you don't need to close it necessarily). pool.map()
is a completely different kind of animal, because it distributes a bunch of arguments to the same function (asynchronously), across the pool processes, and then waits until all function calls have completed before returning the list of results.
這篇關于我在做什么的 Python 多處理進程或池?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!