久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

  1. <legend id='7DRQ6'><style id='7DRQ6'><dir id='7DRQ6'><q id='7DRQ6'></q></dir></style></legend>

    <i id='7DRQ6'><tr id='7DRQ6'><dt id='7DRQ6'><q id='7DRQ6'><span id='7DRQ6'><b id='7DRQ6'><form id='7DRQ6'><ins id='7DRQ6'></ins><ul id='7DRQ6'></ul><sub id='7DRQ6'></sub></form><legend id='7DRQ6'></legend><bdo id='7DRQ6'><pre id='7DRQ6'><center id='7DRQ6'></center></pre></bdo></b><th id='7DRQ6'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='7DRQ6'><tfoot id='7DRQ6'></tfoot><dl id='7DRQ6'><fieldset id='7DRQ6'></fieldset></dl></div>

      <bdo id='7DRQ6'></bdo><ul id='7DRQ6'></ul>
  2. <small id='7DRQ6'></small><noframes id='7DRQ6'>

    1. <tfoot id='7DRQ6'></tfoot>

      有效地將函數(shù)并行應(yīng)用于分組的 pandas DataFrame

      Efficiently applying a function to a grouped pandas DataFrame in parallel(有效地將函數(shù)并行應(yīng)用于分組的 pandas DataFrame)

        <i id='Kbnuy'><tr id='Kbnuy'><dt id='Kbnuy'><q id='Kbnuy'><span id='Kbnuy'><b id='Kbnuy'><form id='Kbnuy'><ins id='Kbnuy'></ins><ul id='Kbnuy'></ul><sub id='Kbnuy'></sub></form><legend id='Kbnuy'></legend><bdo id='Kbnuy'><pre id='Kbnuy'><center id='Kbnuy'></center></pre></bdo></b><th id='Kbnuy'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='Kbnuy'><tfoot id='Kbnuy'></tfoot><dl id='Kbnuy'><fieldset id='Kbnuy'></fieldset></dl></div>
          • <bdo id='Kbnuy'></bdo><ul id='Kbnuy'></ul>

            <tfoot id='Kbnuy'></tfoot>
            • <small id='Kbnuy'></small><noframes id='Kbnuy'>

            • <legend id='Kbnuy'><style id='Kbnuy'><dir id='Kbnuy'><q id='Kbnuy'></q></dir></style></legend>
                <tbody id='Kbnuy'></tbody>

              • 本文介紹了有效地將函數(shù)并行應(yīng)用于分組的 pandas DataFrame的處理方法,對(duì)大家解決問(wèn)題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)吧!

                問(wèn)題描述

                限時(shí)送ChatGPT賬號(hào)..

                我經(jīng)常需要將一個(gè)函數(shù)應(yīng)用到一個(gè)非常大的DataFrame(混合數(shù)據(jù)類(lèi)型)的組中,并希望利用多個(gè)內(nèi)核.

                I often need to apply a function to the groups of a very large DataFrame (of mixed data types) and would like to take advantage of multiple cores.

                我可以從組中創(chuàng)建一個(gè)迭代器并使用多處理模塊,但效率不高,因?yàn)槊總€(gè)組和函數(shù)的結(jié)果都必須為進(jìn)程之間的消息傳遞進(jìn)行腌制.

                I can create an iterator from the groups and use the multiprocessing module, but it is not efficient because every group and the results of the function must be pickled for messaging between processes.

                有什么方法可以避免酸洗甚至完全避免 DataFrame 的復(fù)制?看起來(lái)多處理模塊的共享內(nèi)存功能僅限于 numpy 數(shù)組.還有其他選擇嗎?

                Is there any way to avoid the pickling or even avoid the copying of the DataFrame completely? It looks like the shared memory functions of the multiprocessing modules are limited to numpy arrays. Are there any other options?

                推薦答案

                從上面的評(píng)論看來(lái),這似乎是為 pandas 計(jì)劃的(還有一個(gè)看起來(lái)很有趣的 rosetta 項(xiàng)目 我剛剛注意到).

                From the comments above, it seems that this is planned for pandas some time (there's also an interesting-looking rosetta project which I just noticed).

                然而,在所有并行功能都被整合到 pandas 之前,我注意到編寫(xiě)高效的 & 是非常容易的.直接使用 cython + pandas 進(jìn)行非內(nèi)存復(fù)制并行擴(kuò)充"http://www.google.co.il/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAA&url=http%3A%2F%2Fwww.openmp.org%2F&ei=HKpdVfyVJcj8ULXHgcAF&usg=AFQjCNGlD5aZM8ZP3Qx7WXT74Y7C54jLNQ&bvm=bv.93756505,d.d24">OpenMP 和 C++.

                However, until every parallel functionality is incorporated into pandas, I noticed that it's very easy to write efficient & non-memory-copying parallel augmentations to pandas directly using cython + OpenMP and C++.

                這是一個(gè)編寫(xiě)并行 groupby-sum 的簡(jiǎn)短示例,其用法如下:

                Here's a short example of writing a parallel groupby-sum, whose use is something like this:

                import pandas as pd
                import para_group_demo
                
                df = pd.DataFrame({'a': [1, 2, 1, 2, 1, 1, 0], 'b': range(7)})
                print para_group_demo.sum(df.a, df.b)
                

                輸出是:

                     sum
                key     
                0      6
                1      11
                2      4
                

                <小時(shí)>

                注意 毫無(wú)疑問(wèn),這個(gè)簡(jiǎn)單示例的功能最終將成為 pandas 的一部分.然而,有些事情在 C++ 中并行化一段時(shí)間會(huì)更自然,重要的是要知道將其組合到 pandas 中是多么容易.


                Note Doubtlessly, this simple example's functionality will eventually be part of pandas. Some things, however, will be more natural to parallelize in C++ for some time, and it's important to be aware of how easy it is to combine this into pandas.

                為此,我編寫(xiě)了一個(gè)簡(jiǎn)單的單源文件擴(kuò)展名,其代碼如下.

                To do this, I wrote a simple single-source-file extension whose code follows.

                從一些導(dǎo)入和類(lèi)型定義開(kāi)始

                It starts with some imports and type definitions

                from libc.stdint cimport int64_t, uint64_t
                from libcpp.vector cimport vector
                from libcpp.unordered_map cimport unordered_map
                
                cimport cython
                from cython.operator cimport dereference as deref, preincrement as inc
                from cython.parallel import prange
                
                import pandas as pd
                
                ctypedef unordered_map[int64_t, uint64_t] counts_t
                ctypedef unordered_map[int64_t, uint64_t].iterator counts_it_t
                ctypedef vector[counts_t] counts_vec_t
                

                C++的unordered_map類(lèi)型是單線(xiàn)程求和,vector是所有線(xiàn)程求和.

                The C++ unordered_map type is for summing by a single thread, and the vector is for summing by all threads.

                現(xiàn)在到函數(shù) sum.它從 鍵入的內(nèi)存視圖 開(kāi)始,以便快速訪(fǎng)問(wèn):

                Now to the function sum. It starts off with typed memory views for fast access:

                def sum(crit, vals):
                    cdef int64_t[:] crit_view = crit.values
                    cdef int64_t[:] vals_view = vals.values
                

                該函數(shù)繼續(xù)將半等分到線(xiàn)程(這里硬編碼為 4),并讓每個(gè)線(xiàn)程將其范圍內(nèi)的條目相加:

                The function continues by dividing the semi-equally to the threads (here hardcoded to 4), and having each thread sum the entries in its range:

                    cdef uint64_t num_threads = 4
                    cdef uint64_t l = len(crit)
                    cdef uint64_t s = l / num_threads + 1
                    cdef uint64_t i, j, e
                    cdef counts_vec_t counts
                    counts = counts_vec_t(num_threads)
                    counts.resize(num_threads)
                    with cython.boundscheck(False):
                        for i in prange(num_threads, nogil=True): 
                            j = i * s
                            e = j + s
                            if e > l:
                                e = l
                            while j < e:
                                counts[i][crit_view[j]] += vals_view[j]
                                inc(j)
                

                當(dāng)線(xiàn)程完成時(shí),該函數(shù)將所有結(jié)果(來(lái)自不同范圍)合并到單個(gè) unordered_map:

                When the threads have completed, the function merges all the results (from the different ranges) into a single unordered_map:

                    cdef counts_t total
                    cdef counts_it_t it, e_it
                    for i in range(num_threads):
                        it = counts[i].begin()
                        e_it = counts[i].end()
                        while it != e_it:
                            total[deref(it).first] += deref(it).second
                            inc(it)        
                

                剩下的就是創(chuàng)建一個(gè)DataFrame并返回結(jié)果:

                All that's left is to create a DataFrame and return the results:

                    key, sum_ = [], []
                    it = total.begin()
                    e_it = total.end()
                    while it != e_it:
                        key.append(deref(it).first)
                        sum_.append(deref(it).second)
                        inc(it)
                
                    df = pd.DataFrame({'key': key, 'sum': sum_})
                    df.set_index('key', inplace=True)
                    return df
                

                這篇關(guān)于有效地將函數(shù)并行應(yīng)用于分組的 pandas DataFrame的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

                【網(wǎng)站聲明】本站部分內(nèi)容來(lái)源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問(wèn)題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請(qǐng)聯(lián)系我們刪除處理,感謝您的支持!

                相關(guān)文檔推薦

                What exactly is Python multiprocessing Module#39;s .join() Method Doing?(Python 多處理模塊的 .join() 方法到底在做什么?)
                Passing multiple parameters to pool.map() function in Python(在 Python 中將多個(gè)參數(shù)傳遞給 pool.map() 函數(shù))
                multiprocessing.pool.MaybeEncodingError: #39;TypeError(quot;cannot serialize #39;_io.BufferedReader#39; objectquot;,)#39;(multiprocessing.pool.MaybeEncodingError: TypeError(cannot serialize _io.BufferedReader object,)) - IT屋-程序員軟件開(kāi)
                Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done?(Python 多進(jìn)程池.當(dāng)其中一個(gè)工作進(jìn)程確定不再需要完成工作時(shí),如何退出腳本?) - IT屋-程序員
                How do you pass a Queue reference to a function managed by pool.map_async()?(如何將隊(duì)列引用傳遞給 pool.map_async() 管理的函數(shù)?)
                yet another confusion with multiprocessing error, #39;module#39; object has no attribute #39;f#39;(與多處理錯(cuò)誤的另一個(gè)混淆,“模塊對(duì)象沒(méi)有屬性“f)

              • <i id='rvZ7a'><tr id='rvZ7a'><dt id='rvZ7a'><q id='rvZ7a'><span id='rvZ7a'><b id='rvZ7a'><form id='rvZ7a'><ins id='rvZ7a'></ins><ul id='rvZ7a'></ul><sub id='rvZ7a'></sub></form><legend id='rvZ7a'></legend><bdo id='rvZ7a'><pre id='rvZ7a'><center id='rvZ7a'></center></pre></bdo></b><th id='rvZ7a'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='rvZ7a'><tfoot id='rvZ7a'></tfoot><dl id='rvZ7a'><fieldset id='rvZ7a'></fieldset></dl></div>
                    • <small id='rvZ7a'></small><noframes id='rvZ7a'>

                      <legend id='rvZ7a'><style id='rvZ7a'><dir id='rvZ7a'><q id='rvZ7a'></q></dir></style></legend>
                      <tfoot id='rvZ7a'></tfoot>
                        <tbody id='rvZ7a'></tbody>

                          <bdo id='rvZ7a'></bdo><ul id='rvZ7a'></ul>
                          主站蜘蛛池模板: 久久久精品视 | 日韩av第一页 | 青娱乐一区二区 | 国产精品区二区三区日本 | 不卡一二三区 | www.亚洲视频 | 九色视频网站 | 国产午夜精品视频 | 日韩中字幕 | www.99热这里只有精品 | 精品1区| 95国产精品 | 日本韩国欧美在线观看 | 亚洲欧美日韩在线一区二区 | 蜜桃av人人夜夜澡人人爽 | 欧美一区二区二区 | 国产日韩欧美一区二区 | 精品久久香蕉国产线看观看亚洲 | 精品视频一区二区三区在线观看 | 91视频正在播放 | 91精品国产综合久久久久久丝袜 | 狠狠操你 | 日本成人午夜影院 | 精品欧美激情在线观看 | 九九久久精品 | 欧美激情第一区 | 天天操操操操操 | 日韩另类 | 亚洲在线视频 | 国产成人一区二 | 在线91| 亚洲影音 | 国产成人综合在线 | 久久9热 | 精品国产1区2区3区 一区二区手机在线 | 99精品在线观看 | 久久久久久久97 | 久久精品视频一区二区 | 亚洲欧美综合精品久久成人 | 夜夜av | 一区二区三区四区国产精品 |