久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

使用 Numba 時(shí)如何并行化此 Python for 循環(huán)

How to parallelize this Python for loop when using Numba(使用 Numba 時(shí)如何并行化此 Python for 循環(huán))
本文介紹了使用 Numba 時(shí)如何并行化此 Python for 循環(huán)的處理方法,對(duì)大家解決問(wèn)題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)吧!

問(wèn)題描述

我正在使用 Python 的 Anaconda 發(fā)行版以及 Numba,并且我編寫(xiě)了以下 Python 函數(shù),該函數(shù)乘以稀疏矩陣 A(存儲(chǔ)在CSR 格式)由密集向量 x:

I'm using the Anaconda distribution of Python, together with Numba, and I've written the following Python function that multiplies a sparse matrix A (stored in a CSR format) by a dense vector x:

@jit
def csrMult( x, Adata, Aindices, Aindptr, Ashape ):

    numRowsA = Ashape[0]
    Ax       = numpy.zeros( numRowsA )

    for i in range( numRowsA ):
        Ax_i = 0.0
        for dataIdx in range( Aindptr[i], Aindptr[i+1] ):

            j     = Aindices[dataIdx]
            Ax_i +=    Adata[dataIdx] * x[j]

        Ax[i] = Ax_i

    return Ax 

這里A是一個(gè)很大的scipy稀疏矩陣,

>>> A.shape
( 56469, 39279 )
#                  having ~ 142,258,302 nonzero entries (so about 6.4% )
>>> type( A[0,0] )
dtype( 'float32' )

x 是一個(gè) numpy 數(shù)組.這是調(diào)用上述函數(shù)的代碼片段:

and x is a numpy array. Here is a snippet of code that calls the above function:

x       = numpy.random.randn( A.shape[1] )
Ax      = A.dot( x )   
AxCheck = csrMult( x, A.data, A.indices, A.indptr, A.shape )

注意 @jit 裝飾器,它告訴 Numba 對(duì) csrMult() 進(jìn)行即時(shí)編譯 功能.

Notice the @jit-decorator that tells Numba to do a just-in-time compilation for the csrMult() function.

在我的實(shí)驗(yàn)中,我的函數(shù) csrMult() 大約是 scipy .dot() 方法.這對(duì) Numba 來(lái)說(shuō)是一個(gè)非常令人印象深刻的結(jié)果.

In my experiments, my function csrMult() is about twice as fast as the scipy .dot() method. That is a pretty impressive result for Numba.

但是,MATLAB 執(zhí)行矩陣向量乘法的速度仍然比 csrMult()6 倍.我相信這是因?yàn)?MATLAB 在執(zhí)行稀疏矩陣向量乘法時(shí)使用了多線程.

However, MATLAB still performs this matrix-vector multiplication about 6 times faster than csrMult(). I believe that is because MATLAB uses multithreading when performing sparse matrix-vector multiplication.

使用 Numba 時(shí)如何并行化外部 for 循環(huán)?

How can I parallelize the outer for-loop when using Numba?

Numba 曾經(jīng)有一個(gè) prange() 函數(shù),這使得并行化變得簡(jiǎn)單,令人尷尬的并行 for-循環(huán).不幸的是,Numba 不再具有 prange() [實(shí)際上,這是錯(cuò)誤的,請(qǐng)參閱下面的編輯].那么現(xiàn)在并行化這個(gè) for 循環(huán)的正確方法是什么,Numba 的 prange() 函數(shù)不見(jiàn)了?

Numba used to have a prange() function, that made it simple to parallelize embarassingly parallel for-loops. Unfortunately, Numba no longer has prange() [actually, that is false, see the edit below]. So what is the correct way to parallelize this for-loop now, that Numba's prange() function is gone?

當(dāng) prange() 從 Numba 中移除時(shí),Numba 的開(kāi)發(fā)人員想到了哪些替代方案?

When prange() was removed from Numba, what alternative did the developers of Numba have in mind?

編輯 1:
我更新到 Numba 的最新版本,即 .35,prange() 又回來(lái)了!它不包含在我一直使用的版本 .33 中.
這是個(gè)好消息,但不幸的是,當(dāng)我嘗試使用 prange() 并行化我的 for 循環(huán)時(shí)收到一條錯(cuò)誤消息.這是 Numba 文檔中的一個(gè)并行 for 循環(huán) 示例(請(qǐng)參閱第 1.9.2 節(jié)顯式并行循環(huán)"),下面是我的新代碼:

Edit 1:
I updated to the latest version of Numba, which is .35, and prange() is back! It was not included in version .33, the version I had been using.
That is good news, but unfortunately I am getting an error message when I attempt to parallelize my for loop using prange(). Here is a parallel for loop example from the Numba documentation (see section 1.9.2 "Explicit Parallel Loops"), and below is my new code:

from numba import njit, prange
@njit( parallel=True )
def csrMult_numba( x, Adata, Aindices, Aindptr, Ashape):

    numRowsA = Ashape[0]    
    Ax       = np.zeros( numRowsA )

    for i in prange( numRowsA ):
        Ax_i = 0.0        
        for dataIdx in range( Aindptr[i],Aindptr[i+1] ):

            j     = Aindices[dataIdx]
            Ax_i +=    Adata[dataIdx] * x[j]

        Ax[i] = Ax_i            

    return Ax 

當(dāng)我使用上面給出的代碼片段調(diào)用此函數(shù)時(shí),我收到以下錯(cuò)誤:

When I call this function, using the code snippet given above, I receive the following error:

AttributeError:在 nopython 處失敗(轉(zhuǎn)換為 parfors)'SetItem'對(duì)象沒(méi)有屬性get_targets"

AttributeError: Failed at nopython (convert to parfors) 'SetItem' object has no attribute 'get_targets'

<小時(shí)>

鑒于
上述使用 prange 的嘗試崩潰,我的問(wèn)題是:

正確的方法是什么(使用 prange 或替代方法)并行化這個(gè) Python for-loop?


Given
the above attempt to use prange crashes, my question stands:

What is the correct way ( using prange or an alternative method ) to parallelize this Python for-loop?

如下所述,在 20-omp-threads 上運(yùn)行類(lèi)似的 C++ 循環(huán)并獲得 8 倍 加速是微不足道的.必須有一種使用 Numba 的方法,因?yàn)?for 循環(huán)是令人尷尬的并行(并且因?yàn)橄∈杈仃囅蛄砍朔ㄊ强茖W(xué)計(jì)算中的基本操作).

As noted below, it was trivial to parallelize a similar for loop in C++ and obtain an 8x speedup, having been run on 20-omp-threads. There must be a way to do it using Numba, since the for loop is embarrassingly parallel (and since sparse matrix-vector multiplication is a fundamental operation in scientific computing).

編輯 2:
這是我的 csrMult() 的 C++ 版本.在 C++ 版本中并行化 for() 循環(huán)使我的測(cè)試中的代碼快了大約 8 倍.這向我表明,在使用 Numba 時(shí),Python 版本應(yīng)該可以實(shí)現(xiàn)類(lèi)似的加速.

Edit 2:
Here is my C++ version of csrMult(). Parallelizing the for() loop in the C++ version makes the code about 8x faster in my tests. This suggests to me that a similar speedup should be possible for the Python version when using Numba.

void csrMult(VectorXd& Ax, VectorXd& x, vector<double>& Adata, vector<int>& Aindices, vector<int>& Aindptr)
{
    // This code assumes that the size of Ax is numRowsA.
    #pragma omp parallel num_threads(20)
    {       
        #pragma omp for schedule(dynamic,590) 
        for (int i = 0; i < Ax.size(); i++)
        {
            double Ax_i = 0.0;
            for (int dataIdx = Aindptr[i]; dataIdx < Aindptr[i + 1]; dataIdx++)
            {
                Ax_i += Adata[dataIdx] * x[Aindices[dataIdx]];
            }

            Ax[i] = Ax_i;
        }
    }
}

推薦答案

Numba 已經(jīng)更新,prange() 現(xiàn)在可以使用了! (我在回答我自己的問(wèn)題.)

Numba has been updated and prange() works now! (I'm answering my own question.)

本博文,日期為 2017 年 12 月 12 日.以下是博客的相關(guān)片段:

The improvements to Numba's parallel computing capabilities are discussed in this blog post, dated December 12, 2017. Here is a relevant snippet from the blog:

很久以前(超過(guò) 20 個(gè)版本!),Numba 曾經(jīng)支持編寫(xiě)名為 prange() 的并行循環(huán)的習(xí)慣用法.大一之后在 2014 年重構(gòu)代碼庫(kù),這個(gè)特性不得不被移除,但它一直是最常被請(qǐng)求的 Numba 功能之一從那之后.英特爾開(kāi)發(fā)人員并行化陣列后表達(dá),他們意識(shí)到帶回 prange 將是公平的容易

Long ago (more than 20 releases!), Numba used to have support for an idiom to write parallel for loops called prange(). After a major refactoring of the code base in 2014, this feature had to be removed, but it has been one of the most frequently requested Numba features since that time. After the Intel developers parallelized array expressions, they realized that bringing back prange would be fairly easy

使用 Numba 版本 0.36.1,我可以使用以下簡(jiǎn)單代碼并行化我令人尷尬的并行 for-循環(huán):

Using Numba version 0.36.1, I can parallelize my embarrassingly parallel for-loop using the following simple code:

@numba.jit(nopython=True, parallel=True)
def csrMult_parallel(x,Adata,Aindices,Aindptr,Ashape): 

    numRowsA = Ashape[0]    
    Ax = np.zeros(numRowsA)

    for i in numba.prange(numRowsA):
        Ax_i = 0.0        
        for dataIdx in range(Aindptr[i],Aindptr[i+1]):

            j = Aindices[dataIdx]
            Ax_i += Adata[dataIdx]*x[j]

        Ax[i] = Ax_i            

    return Ax

在我的實(shí)驗(yàn)中,并行化 for 循環(huán)使函數(shù)的執(zhí)行速度比我在問(wèn)題開(kāi)頭發(fā)布的版本快大約八倍,該版本已經(jīng)使用 Numba,但未并行化.此外,在我的實(shí)驗(yàn)中,并行版本比使用 scipy 的稀疏矩陣向量乘法函數(shù)的命令 Ax = A.dot(x) 快大約 5 倍.Numba 已經(jīng)碾壓了 scipy,我終于有了一個(gè) 與 MATLAB 一樣快的 Python 稀疏矩陣向量乘法例程.

In my experiments, parallelizing the for-loop made the function execute about eight times faster than the version I posted at the beginning of my question, which was already using Numba, but which was not parallelized. Moreover, in my experiments the parallelized version is about 5x faster than the command Ax = A.dot(x) which uses scipy's sparse matrix-vector multiplication function. Numba has crushed scipy and I finally have a python sparse matrix-vector multiplication routine that is as fast as MATLAB.

這篇關(guān)于使用 Numba 時(shí)如何并行化此 Python for 循環(huán)的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來(lái)源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問(wèn)題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請(qǐng)聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

Troubles while parsing with python very large xml file(使用 python 解析非常大的 xml 文件時(shí)出現(xiàn)問(wèn)題)
Find all nodes by attribute in XML using Python 2(使用 Python 2 在 XML 中按屬性查找所有節(jié)點(diǎn))
Python - How to parse xml response and store a elements value in a variable?(Python - 如何解析 xml 響應(yīng)并將元素值存儲(chǔ)在變量中?)
How to get XML tag value in Python(如何在 Python 中獲取 XML 標(biāo)記值)
How to correctly parse utf-8 xml with ElementTree?(如何使用 ElementTree 正確解析 utf-8 xml?)
Parse XML from URL into python object(將 XML 從 URL 解析為 python 對(duì)象)
主站蜘蛛池模板: 日韩欧美一区在线 | 一区二区三区高清在线观看 | 久久久久国产精品午夜一区 | 天天爽天天操 | 欧美日韩一区二区三区不卡视频 | 午夜精品一区二区三区在线观看 | 老牛影视av一区二区在线观看 | 999国产精品视频免费 | 精品视频一区二区三区 | 日韩一区二区三区在线观看 | 久久午夜视频 | 1级毛片 | 在线视频一区二区 | 99re视频在线 | 久久国产欧美日韩精品 | 中文字幕一区在线观看视频 | 成人免费视频网站在线看 | www.97国产 | 91精品一区 | 天天操天天射综合网 | 国产精品高清一区二区三区 | 一区二区三区不卡视频 | 伊人伊成久久人综合网站 | 九九热精品在线 | 久久一 | 一级片成人 | 99re视频精品| 99re| 99视频在线免费观看 | 99精品视频在线观看 | 欧美一级一 | 夜夜艹天天干 | 久久69精品久久久久久久电影好 | 免费午夜视频 | 国产亚洲一区二区在线观看 | 欧美一区二 | 成人午夜网站 | 国产免费一区 | 一区二区三区日韩 | 国产视频久久 | 国产免费福利小视频 |