問題描述
我有數(shù)十萬個文本文件,我想以各種方式進(jìn)行解析.我想將輸出保存到單個文件而不會出現(xiàn)同步問題.我一直在使用多處理池來執(zhí)行此操作以節(jié)省時間,但我不知道如何組合池和隊列.
I have hundreds of thousands of text files that I want to parse in various ways. I want to save the output to a single file without synchronization problems. I have been using multiprocessing pool to do this to save time, but I can't figure out how to combine Pool and Queue.
以下代碼將保存文件名以及文件中連續(xù)x"的最大數(shù)量.但是,我希望所有進(jìn)程都將結(jié)果保存到同一個文件中,而不是像我的示例中那樣保存到不同的文件中.對此的任何幫助將不勝感激.
The following code will save the infile name as well as the maximum number of consecutive "x"s in the file. However, I want all processes to save results to the same file, and not to different files as in my example. Any help on this would be greatly appreciated.
import multiprocessing
with open('infilenamess.txt') as f:
filenames = f.read().splitlines()
def mp_worker(filename):
with open(filename, 'r') as f:
text=f.read()
m=re.findall("x+", text)
count=len(max(m, key=len))
outfile=open(filename+'_results.txt', 'a')
outfile.write(str(filename)+'|'+str(count)+'
')
outfile.close()
def mp_handler():
p = multiprocessing.Pool(32)
p.map(mp_worker, filenames)
if __name__ == '__main__':
mp_handler()
推薦答案
多處理池為您實現(xiàn)了一個隊列.只需使用將工作人員返回值返回給調(diào)用者的池方法.imap 運行良好:
Multiprocessing pools implement a queue for you. Just use a pool method that returns the worker return value to the caller. imap works well:
import multiprocessing
import re
def mp_worker(filename):
with open(filename) as f:
text = f.read()
m = re.findall("x+", text)
count = len(max(m, key=len))
return filename, count
def mp_handler():
p = multiprocessing.Pool(32)
with open('infilenamess.txt') as f:
filenames = [line for line in (l.strip() for l in f) if line]
with open('results.txt', 'w') as f:
for result in p.imap(mp_worker, filenames):
# (filename, count) tuples from worker
f.write('%s: %d
' % result)
if __name__=='__main__':
mp_handler()
這篇關(guān)于Python:在使用多處理池時使用隊列寫入單個文件的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!