久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

    <bdo id='Paroz'></bdo><ul id='Paroz'></ul>

  • <tfoot id='Paroz'></tfoot>
    <legend id='Paroz'><style id='Paroz'><dir id='Paroz'><q id='Paroz'></q></dir></style></legend>

        <i id='Paroz'><tr id='Paroz'><dt id='Paroz'><q id='Paroz'><span id='Paroz'><b id='Paroz'><form id='Paroz'><ins id='Paroz'></ins><ul id='Paroz'></ul><sub id='Paroz'></sub></form><legend id='Paroz'></legend><bdo id='Paroz'><pre id='Paroz'><center id='Paroz'></center></pre></bdo></b><th id='Paroz'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='Paroz'><tfoot id='Paroz'></tfoot><dl id='Paroz'><fieldset id='Paroz'></fieldset></dl></div>

        <small id='Paroz'></small><noframes id='Paroz'>

        跨多處理 python 共享 pandas 數(shù)據(jù)框字典

        Share a dictionary of pandas dataframe across multiprocessing python(跨多處理 python 共享 pandas 數(shù)據(jù)框字典)

      1. <tfoot id='U4OVX'></tfoot>

              <bdo id='U4OVX'></bdo><ul id='U4OVX'></ul>
                  <tbody id='U4OVX'></tbody>
              • <i id='U4OVX'><tr id='U4OVX'><dt id='U4OVX'><q id='U4OVX'><span id='U4OVX'><b id='U4OVX'><form id='U4OVX'><ins id='U4OVX'></ins><ul id='U4OVX'></ul><sub id='U4OVX'></sub></form><legend id='U4OVX'></legend><bdo id='U4OVX'><pre id='U4OVX'><center id='U4OVX'></center></pre></bdo></b><th id='U4OVX'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='U4OVX'><tfoot id='U4OVX'></tfoot><dl id='U4OVX'><fieldset id='U4OVX'></fieldset></dl></div>

                  <legend id='U4OVX'><style id='U4OVX'><dir id='U4OVX'><q id='U4OVX'></q></dir></style></legend>
                • <small id='U4OVX'></small><noframes id='U4OVX'>

                  本文介紹了跨多處理 python 共享 pandas 數(shù)據(jù)框字典的處理方法,對(duì)大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

                  問題描述

                  限時(shí)送ChatGPT賬號(hào)..

                  我有一本 python pandas 數(shù)據(jù)框字典.這本詞典的總大小約為 2GB.但是,當(dāng)我在 16 個(gè)多進(jìn)程中共享它時(shí)(在子進(jìn)程中我只讀取 dict 的數(shù)據(jù)而不修改它),它需要 32GB 內(nèi)存.所以我想問一下我是否可以在不復(fù)制的情況下跨多處理共享這本字典.我試圖將其轉(zhuǎn)換為 manager.dict().但似乎時(shí)間太長(zhǎng)了.實(shí)現(xiàn)這一目標(biāo)的最標(biāo)準(zhǔn)方法是什么?謝謝.

                  I have a dictionary of python pandas dataframes. The total size of this dictionary is about 2GB. However, when I share it across 16 multiprocessing (in the subprocesses I only read the data of the dict without modifying it), it takes 32GB ram. So I would like to ask if it is possible for me to share this dictionary across multiprocessing without copying it. I tried to convert it to manager.dict(). But it seems it takes too long. What would be the most standard way to achieve this? Thank you.

                  推薦答案

                  我發(fā)現(xiàn)的最佳解決方案(它僅適用于某些類型的問題)是使用使用 Python 的 BaseManager 和 SyncManager 類的客戶端/服務(wù)器設(shè)置.為此,您首先設(shè)置一個(gè)服務(wù)器,為數(shù)據(jù)提供代理類.

                  The best solution I've found (and it only works for some types of problems) is to use a client/server setup using Python's BaseManager and SyncManager classes. To do this you first setup a Server that serve's up a proxy class for the data.

                  DataServer.py

                  #!/usr/bin/python
                  from    multiprocessing.managers import SyncManager
                  import  numpy
                  
                  # Global for storing the data to be served
                  gData = {}
                  
                  # Proxy class to be shared with different processes
                  # Don't put big data in here since that will force it to be piped to the
                  # other process when instantiated there, instead just return a portion of
                  # the global data when requested.
                  class DataProxy(object):
                      def __init__(self):
                          pass
                  
                      def getData(self, key, default=None):
                          global gData
                          return gData.get(key, None)
                  
                  if __name__ == '__main__':
                      port  = 5000
                  
                      print 'Simulate loading some data'
                      for i in xrange(1000):
                          gData[i] = numpy.random.rand(1000)
                  
                      # Start the server on address(host,port)
                      print 'Serving data. Press <ctrl>-c to stop.'
                      class myManager(SyncManager): pass
                      myManager.register('DataProxy', DataProxy)
                      mgr = myManager(address=('', port), authkey='DataProxy01')
                      server = mgr.get_server()
                      server.serve_forever()
                  

                  運(yùn)行一次以上并讓它繼續(xù)運(yùn)行.下面是您用來訪問數(shù)據(jù)的客戶端類.

                  Run the above once and leave it running. Below is the client class you use to access the data.

                  DataClient.py

                  from   multiprocessing.managers import BaseManager
                  import psutil   #3rd party module for process info (not strictly required)
                  
                  # Grab the shared proxy class.  All methods in that class will be availble here
                  class DataClient(object):
                      def __init__(self, port):
                          assert self._checkForProcess('DataServer.py'), 'Must have DataServer running'
                          class myManager(BaseManager): pass
                          myManager.register('DataProxy')
                          self.mgr = myManager(address=('localhost', port), authkey='DataProxy01')
                          self.mgr.connect()
                          self.proxy = self.mgr.DataProxy()
                  
                      # Verify the server is running (not required)
                      @staticmethod
                      def _checkForProcess(name):
                          for proc in psutil.process_iter():
                              if proc.name() == name:
                                  return True
                          return False
                  

                  下面是多處理的測(cè)試代碼.

                  Below is the test code to try this with multiprocessing.

                  TestMP.py

                  #!/usr/bin/python
                  import time
                  import multiprocessing as mp
                  import numpy
                  from   DataClient import *    
                  
                  # Confusing, but the "proxy" will be global to each subprocess, 
                  # it's not shared across all processes.
                  gProxy = None
                  gMode  = None
                  gDummy = None
                  def init(port, mode):
                      global gProxy, gMode, gDummy
                      gProxy  = DataClient(port).proxy
                      gMode  = mode
                      gDummy = numpy.random.rand(1000)  # Same as the dummy in the server
                      #print 'Init proxy ', id(gProxy), 'in ', mp.current_process()
                  
                  def worker(key):
                      global gProxy, gMode, gDummy
                      if 0 == gMode:   # get from proxy
                          array = gProxy.getData(key)
                      elif 1 == gMode: # bypass retrieve to test difference
                          array = gDummy
                      else: assert 0, 'unknown mode: %s' % gMode
                      for i in range(1000):
                          x = sum(array)
                      return x    
                  
                  if __name__ == '__main__':
                      port   = 5000
                      maxkey = 1000
                      numpts = 100
                  
                      for mode in [1, 0]:
                          for nprocs in [16, 1]:
                              if 0==mode: print 'Using client/server and %d processes' % nprocs
                              if 1==mode: print 'Using local data and %d processes' % nprocs                
                              keys = [numpy.random.randint(0,maxkey) for k in xrange(numpts)]
                              pool = mp.Pool(nprocs, initializer=init, initargs=(port,mode))
                              start = time.time()
                              ret_data = pool.map(worker, keys, chunksize=1)
                              print '   took %4.3f seconds' % (time.time()-start)
                              pool.close()
                  

                  當(dāng)我在我的機(jī)器上運(yùn)行它時(shí),我得到...

                  When I run this on my machine I get...

                  Using local data and 16 processes
                     took 0.695 seconds
                  Using local data and 1 processes
                     took 5.849 seconds
                  Using client/server and 16 processes
                     took 0.811 seconds
                  Using client/server and 1 processes
                     took 5.956 seconds
                  

                  這是否適合您的多處理系統(tǒng)取決于獲取數(shù)據(jù)的頻率.每次傳輸都會(huì)產(chǎn)生少量開銷.如果您減少 x=sum(array) 循環(huán)中的迭代次數(shù),您可以看到這一點(diǎn).在某些時(shí)候,您將花費(fèi)更多時(shí)間獲取數(shù)據(jù)而不是處理數(shù)據(jù).

                  Whether this works for you in your multiprocessing system depends on how often have to grab the data. There's a small overhead associated with each transfer. You can see this if you turn down the number of iterations in the x=sum(array) loop. At some point you'll spend more time getting data than working on it.

                  除了多處理之外,我也喜歡這種模式,因?yàn)槲抑恍柙诜?wù)器程序中加載一次我的大數(shù)組數(shù)據(jù),它會(huì)一直加載到我終止服務(wù)器為止.這意味著我可以針對(duì)數(shù)據(jù)運(yùn)行一堆單獨(dú)的腳本,并且它們可以快速執(zhí)行;無需等待數(shù)據(jù)加載.

                  Besides multiprocessing, I also like this pattern because I only have to load my big array data once in the server program and it stays loaded until I kill the server. That means I can run a bunch of separate scripts against the data and they execute quickly; no waiting for data to load.

                  雖然這里的方法有點(diǎn)類似于使用數(shù)據(jù)庫,但它的優(yōu)勢(shì)在于可以處理任何類型的 python 對(duì)象,而不僅僅是字符串和整數(shù)等簡(jiǎn)單的 DB 表.我發(fā)現(xiàn)使用 DB 是一種對(duì)于那些簡(jiǎn)單的類型來說要快一些,但對(duì)我來說,它往往更多地以編程方式工作,而且我的數(shù)據(jù)并不總是很容易移植到數(shù)據(jù)庫中.

                  While the approach here is somewhat similar to using a database, it has the advantage of working on any type of python object, not just simple DB tables of strings and ints, etc. I've found that using a DB is a bit faster for those simple types but for me, it tends to be more work programatically and my data doesn't always port over easily to a database.

                  這篇關(guān)于跨多處理 python 共享 pandas 數(shù)據(jù)框字典的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

                  【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請(qǐng)聯(lián)系我們刪除處理,感謝您的支持!

                  相關(guān)文檔推薦

                  What exactly is Python multiprocessing Module#39;s .join() Method Doing?(Python 多處理模塊的 .join() 方法到底在做什么?)
                  Passing multiple parameters to pool.map() function in Python(在 Python 中將多個(gè)參數(shù)傳遞給 pool.map() 函數(shù))
                  multiprocessing.pool.MaybeEncodingError: #39;TypeError(quot;cannot serialize #39;_io.BufferedReader#39; objectquot;,)#39;(multiprocessing.pool.MaybeEncodingError: TypeError(cannot serialize _io.BufferedReader object,)) - IT屋-程序員軟件開
                  Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done?(Python 多進(jìn)程池.當(dāng)其中一個(gè)工作進(jìn)程確定不再需要完成工作時(shí),如何退出腳本?) - IT屋-程序員
                  How do you pass a Queue reference to a function managed by pool.map_async()?(如何將隊(duì)列引用傳遞給 pool.map_async() 管理的函數(shù)?)
                  yet another confusion with multiprocessing error, #39;module#39; object has no attribute #39;f#39;(與多處理錯(cuò)誤的另一個(gè)混淆,“模塊對(duì)象沒有屬性“f)

                      <tbody id='gPcZb'></tbody>

                    • <bdo id='gPcZb'></bdo><ul id='gPcZb'></ul>
                      <legend id='gPcZb'><style id='gPcZb'><dir id='gPcZb'><q id='gPcZb'></q></dir></style></legend>
                          <tfoot id='gPcZb'></tfoot>
                          <i id='gPcZb'><tr id='gPcZb'><dt id='gPcZb'><q id='gPcZb'><span id='gPcZb'><b id='gPcZb'><form id='gPcZb'><ins id='gPcZb'></ins><ul id='gPcZb'></ul><sub id='gPcZb'></sub></form><legend id='gPcZb'></legend><bdo id='gPcZb'><pre id='gPcZb'><center id='gPcZb'></center></pre></bdo></b><th id='gPcZb'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='gPcZb'><tfoot id='gPcZb'></tfoot><dl id='gPcZb'><fieldset id='gPcZb'></fieldset></dl></div>

                          <small id='gPcZb'></small><noframes id='gPcZb'>

                          1. 主站蜘蛛池模板: 蜜桃av人人夜夜澡人人爽 | 亚洲精品一区国产精品 | 全免费a级毛片免费看视频免 | 国产精品99一区二区 | 人人鲁人人莫人人爱精品 | av天天干| 青青久草| 久久久高清 | 国产精品极品美女在线观看免费 | 国产黄色大片 | 在线国产视频观看 | 伊人免费在线观看高清 | 一级毛片在线播放 | 精品久草 | 亚洲综合一区二区三区 | 国产在线精品一区二区三区 | a在线观看| 国产成人精品免费视频 | 日一区二区三区 | 免费一看一级毛片 | 91免费观看 | 玖玖国产精品视频 | 一级做a爰片性色毛片16 | 黄色成人亚洲 | 国产精品一区久久久 | 欧美一区二区三区视频 | 亚洲91精品 | 一a一片一级一片啪啪 | 国产观看 | 久久久蜜桃 | 欧美精品久久久久久久久老牛影院 | 国产精品久久久久久福利一牛影视 | 狠狠躁天天躁夜夜躁婷婷老牛影视 | 国产成人一区二区三区 | 婷婷开心激情综合五月天 | 国产精品久久av | 91视频久久 | 五月天婷婷久久 | 日本成人久久 | 综合天天久久 | 草草草影院 |