問(wèn)題描述
如何在 pymongo 中進(jìn)行批量更新插入?我想更新一堆條目,一次做一個(gè)非常慢.
How can I do a bulk upsert in pymongo? I want to Update a bunch of entries and doing them one at a time is very slow.
幾乎相同問(wèn)題的答案在這里:在 MongoDB 中批量更新/upsert?一個(gè)>
The answer to an almost identical question is here: Bulk update/upsert in MongoDB?
接受的答案實(shí)際上并沒(méi)有回答問(wèn)題.它只是提供了 mongo CLI 的鏈接以進(jìn)行導(dǎo)入/導(dǎo)出.
The accepted answer doesn't actually answer the question. It simply gives a link to the mongo CLI for doing import/exports.
我也愿意解釋為什么做批量 upsert 是不可能的/不是最佳實(shí)踐,但請(qǐng)解釋解決此類問(wèn)題的首選解決方案是什么.
I would also be open to someone explaining why doing a bulk upsert is no possible / no a best practice, but please explain what the preferred solution to this sort of problem is.
推薦答案
現(xiàn)代版本的 pymongo(大于 3.x)將批量操作包裝在一致的接口中,該接口在服務(wù)器版本不支持批量操作的情況下降級(jí).這在 MongoDB 官方支持的驅(qū)動(dòng)程序中現(xiàn)在是一致的.
Modern releases of pymongo ( greater than 3.x ) wrap bulk operations in a consistent interface that downgrades where the server release does not support bulk operations. This is now consistent in MongoDB officially supported drivers.
所以編碼的首選方法是使用 bulk_write()
改為使用 UpdateOne
其他適當(dāng)?shù)牟僮鲃?dòng)作代替.現(xiàn)在當(dāng)然首選使用自然語(yǔ)言列表而不是特定的構(gòu)建器
So the preferred method for coding is to use bulk_write()
instead, where you use an UpdateOne
other other appropriate operation action instead. And now of course it is preferred to use the natural language lists rather than a specific builder
舊文檔的直接翻譯:
from pymongo import UpdateOne
operations = [
UpdateOne({ "field1": 1},{ "$push": { "vals": 1 } },upsert=True),
UpdateOne({ "field1": 1},{ "$push": { "vals": 2 } },upsert=True),
UpdateOne({ "field1": 1},{ "$push": { "vals": 3 } },upsert=True)
]
result = collection.bulk_write(operations)
或者經(jīng)典的文檔轉(zhuǎn)換循環(huán):
Or the classic document transformation loop:
import random
from pymongo import UpdateOne
random.seed()
operations = []
for doc in collection.find():
# Set a random number on every document update
operations.append(
UpdateOne({ "_id": doc["_id"] },{ "$set": { "random": random.randint(0,10) } })
)
# Send once every 1000 in batch
if ( len(operations) == 1000 ):
collection.bulk_write(operations,ordered=False)
operations = []
if ( len(operations) > 0 ):
collection.bulk_write(operations,ordered=False)
返回的結(jié)果是BulkWriteResult
將包含匹配和更新文檔的計(jì)數(shù)器以及發(fā)生的任何更新插入"的返回 _id
值.
對(duì)于批量操作數(shù)組的大小存在一些誤解.發(fā)送到服務(wù)器的實(shí)際請(qǐng)求不能超過(guò) 16MB BSON 限制,因?yàn)樵撓拗埔策m用于發(fā)送到使用 BSON 格式的服務(wù)器的請(qǐng)求".
There is a bit of a misconception about the size of the bulk operations array. The actual request as sent to the server cannot exceed the 16MB BSON limit since that limit also applies to the "request" sent to the server which is using BSON format as well.
但是,這并不能控制您可以構(gòu)建的請(qǐng)求數(shù)組的大小,因?yàn)閷?shí)際操作無(wú)論如何只會(huì)以 1000 個(gè)批次發(fā)送和處理.唯一真正的限制是這 1000 條操作指令本身實(shí)際上并不會(huì)創(chuàng)建大于 16MB 的 BSON 文檔.這確實(shí)是一項(xiàng)艱巨的任務(wù).
However that does not govern the size of the request array that you can build, as the actual operations will only be sent and processed in batches of 1000 anyway. The only real restriction is that those 1000 operation instructions themselves do not actually create a BSON document greater than 16MB. Which is indeed a pretty tall order.
批量方法的一般概念是減少流量",因?yàn)橐淮伟l(fā)送許多東西并且只處理一個(gè)服務(wù)器響應(yīng).減少附加到每個(gè)更新請(qǐng)求的開(kāi)銷(xiāo)可以節(jié)省大量時(shí)間.
The general concept of bulk methods is "less traffic", as a result of sending many things at once and only dealing with one server response. The reduction of that overhead attached to every single update request saves lots of time.
這篇關(guān)于pymongo 中的快速或批量更新的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!