問題描述
我正在嘗試決定是否應該使用多處理或線程,并且我學到了一些關于 全局解釋器鎖.在這篇不錯的博文中,似乎多線程不適合繁忙的任務.但是,我也了解到某些功能,例如 I/O 或 numpy,不受 GIL 的影響.
I'm trying to decide if I should use multiprocessing or threading, and I've learned some interesting bits about the Global Interpreter Lock. In this nice blog post, it seems multithreading isn't suitable for busy tasks. However, I also learned that some functionality, such as I/O or numpy, is unaffected by the GIL.
誰能解釋一下原因,以及我如何確定我的(可能是相當 numpy-heavy)代碼是否適合多線程?
Can anyone explain why, and how I can find out if my (probably quite numpy-heavy) code is going to be suitable for multithreading?
推薦答案
許多 numpy 計算不受 GIL 影響,但不是全部.
Many numpy calculations are unaffected by the GIL, but not all.
在不需要 Python 解釋器的代碼(例如 C 庫)中,可以專門釋放 GIL - 允許依賴于解釋器的其他代碼繼續運行.在 Numpy C 代碼庫中,宏 NPY_BEGIN_THREADS
和 NPY_END_THREADS
用于分隔允許 GIL 發布的代碼塊.你可以在 這個 numpy 源的搜索中看到這些.
While in code that does not require the Python interpreter (e.g. C libraries) it is possible to specifically release the GIL - allowing other code that depends on the interpreter to continue running. In the Numpy C codebase the macros NPY_BEGIN_THREADS
and NPY_END_THREADS
are used to delimit blocks of code that permit GIL release. You can see these in this search of the numpy source.
NumPy C API 文檔 有更多關于線程支持的信息.注意處理條件 GIL 釋放的附加宏 NPY_BEGIN_THREADS_DESCR
、NPY_END_THREADS_DESCR
和 NPY_BEGIN_THREADS_THRESHOLDED
,取決于數組 dtypes
和大小的循環.
The NumPy C API documentation has more information on threading support. Note the additional macros NPY_BEGIN_THREADS_DESCR
, NPY_END_THREADS_DESCR
and NPY_BEGIN_THREADS_THRESHOLDED
which handle conditional GIL release, dependent on array dtypes
and the size of loops.
大多數核心函數都發布了 GIL - 例如 通用函數 (ufunc) 這樣做 如所述:
Most core functions release the GIL - for example Universal Functions (ufunc) do so as described:
只要不涉及對象數組,Python 全局解釋器鎖 (GIL) 就會在調用循環之前釋放.必要時重新獲取它以處理錯誤情況.
as long as no object arrays are involved, the Python Global Interpreter Lock (GIL) is released prior to calling the loops. It is re-acquired if necessary to handle error conditions.
關于您自己的代碼,NumPy 的源代碼可用.檢查您為上述宏使用的函數(以及它們調用的函數).另請注意,性能優勢在很大程度上取決于多長時間 GIL 發布 - 如果您的代碼不斷地加入/退出 Python,您將不會看到太大的改進.
With regard to your own code, the source code for NumPy is available. Check the functions you use (and the functions they call) for the above macros. Note also that the performance benefit is heavily dependent on how long the GIL is released - if your code is constantly dropping in/out of Python you won't see much of an improvement.
另一種選擇是測試它.但是,請記住,使用條件 GIL 宏的函數可能會針對小型和大型數組表現出不同的行為.因此,使用小數據集的測試可能無法準確表示大型任務的性能.
The other option is to just test it. However, bear in mind that functions using the conditional GIL macros may exhibit different behaviour with small and large arrays. A test with a small dataset may therefore not be an accurate representation of performance for a larger task.
官方 wiki 上提供了一些關于使用 numpy 進行并行處理的附加信息 以及一篇關于 Python GIL 的有用帖子在 Programmers.SE 上.
There is some additional information on parallel processing with numpy available on the official wiki and a useful post about the Python GIL in general over on Programmers.SE.
這篇關于為什么 numpy 計算不受全局解釋器鎖的影響?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!