問題描述
我在 python 中遇到了這個問題:
I'm having this problem in python:
- 我需要不時檢查的 URL 隊列
- 如果隊列已滿,我需要處理隊列中的每個項目
- 隊列中的每個項目都必須由單個進程處理(多處理)
到目前為止,我設法像這樣手動"實現了這一目標:
So far I managed to achieve this "manually" like this:
while 1:
self.updateQueue()
while not self.mainUrlQueue.empty():
domain = self.mainUrlQueue.get()
# if we didn't launched any process yet, we need to do so
if len(self.jobs) < maxprocess:
self.startJob(domain)
#time.sleep(1)
else:
# If we already have process started we need to clear the old process in our pool and start new ones
jobdone = 0
# We circle through each of the process, until we find one free ; only then leave the loop
while jobdone == 0:
for p in self.jobs :
#print "entering loop"
# if the process finished
if not p.is_alive() and jobdone == 0:
#print str(p.pid) + " job dead, starting new one"
self.jobs.remove(p)
self.startJob(domain)
jobdone = 1
但是,這會導致大量問題和錯誤.我想知道我是否更適合使用進程池.這樣做的正確方法是什么?
However that leads to tons of problems and errors. I wondered if I was not better suited using a Pool of process. What would be the right way to do this?
但是,很多時候我的隊列是空的,一秒鐘可以填滿 300 個項目,所以我不太清楚這里該怎么做.
However, a lot of times my queue is empty, and it can be filled by 300 items in a second, so I'm not too sure how to do things here.
推薦答案
您可以使用 queue
在啟動時產生多個進程(使用 multiprocessing.Pool
) 并讓它們休眠,直到隊列中有一些數據可供處理.如果您對此不熟悉,可以嘗試玩"用那個簡單的程序:
You could use the blocking capabilities of queue
to spawn multiple process at startup (using multiprocessing.Pool
) and letting them sleep until some data are available on the queue to process. If your not familiar with that, you could try to "play" with that simple program:
import multiprocessing
import os
import time
the_queue = multiprocessing.Queue()
def worker_main(queue):
print os.getpid(),"working"
while True:
item = queue.get(True)
print os.getpid(), "got", item
time.sleep(1) # simulate a "long" operation
the_pool = multiprocessing.Pool(3, worker_main,(the_queue,))
# don't forget the comma here ^
for i in range(5):
the_queue.put("hello")
the_queue.put("world")
time.sleep(10)
在 Linux 上使用 Python 2.7.3 測試
這將產生 3 個進程(除了父進程).每個孩子都執行 worker_main
函數.這是一個簡單的循環,在每次迭代時從隊列中獲取一個新項目.如果沒有準備好處理,worker 將阻塞.
This will spawn 3 processes (in addition of the parent process). Each child executes the worker_main
function. It is a simple loop getting a new item from the queue on each iteration. Workers will block if nothing is ready to process.
在啟動時,所有 3 個進程都將休眠,直到向隊列提供一些數據.當數據可用時,等待的工作人員之一獲得該項目并開始處理它.之后,它會嘗試從隊列中獲取其他項目,如果沒有可用則再次等待...
At startup all 3 process will sleep until the queue is fed with some data. When a data is available one of the waiting workers get that item and starts to process it. After that, it tries to get an other item from the queue, waiting again if nothing is available...
這篇關于在 python 中填充隊列和管理多處理的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!