問題描述
我很好奇使用迭代器的最快方式是什么,也是最 Pythonic 的方式.
I am curious what the fastest way to consume an iterator would be, and the most Pythonic way.
例如,假設我想創建一個帶有 map
內置函數的迭代器,它會累積一些東西作為副作用.我實際上并不關心 map
的結果,只關心副作用,所以我想以盡可能少的開銷或樣板文件來完成迭代.比如:
For example, say that I want to create an iterator with the map
builtin that accumulates something as a side-effect. I don't actually care about the result of the map
, just the side effect, so I want to blow through the iteration with as little overhead or boilerplate as possible. Something like:
my_set = set()
my_map = map(lambda x, y: my_set.add((x, y)), my_x, my_y)
在這個例子中,我只是想通過迭代器來累積 my_set
中的東西,而 my_set
只是一個空集,直到我真正運行 我的地圖
.比如:
In this example, I just want to blow through the iterator to accumulate things in my_set
, and my_set
is just an empty set until I actually run through my_map
. Something like:
for _ in my_map:
pass
或赤身裸體
[_ for _ in my_map]
有效,但他們都覺得笨重.有沒有更 Pythonic 的方法來確保迭代器快速迭代,以便您從一些副作用中受益?
works, but they both feel clunky. Is there a more Pythonic way to make sure an iterator iterates quickly so that you can benefit from some side-effect?
我在以下方面測試了上述兩種方法:
I tested the two methods above on the following:
my_x = np.random.randint(100, size=int(1e6))
my_y = np.random.randint(100, size=int(1e6))
與上面定義的 my_set
和 my_map
一起使用.我用 timeit 得到了以下結果:
with my_set
and my_map
as defined above. I got the following results with timeit:
for _ in my_map:
pass
468 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[_ for _ in my_map]
476 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
兩者之間沒有真正的區別,而且都感覺很笨重.
No real difference between the two, and they both feel clunky.
注意,我使用 list(my_map)
獲得了類似的性能,這是評論中的建議.
Note, I got similar performance with list(my_map)
, which was a suggestion in the comments.
推薦答案
雖然您不應該僅僅為了副作用而創建地圖對象,但實際上在 itertools
文檔:
While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the itertools
docs:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is None, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
對于完全消費"的情況,這可以簡化為
For just the "consume entirely" case, this can be simplified to
def consume(iterator):
collections.deque(iterator, maxlen=0)
以這種方式使用 collections.deque
可以避免存儲所有元素(因為 maxlen=0
)并以 C 速度迭代,沒有字節碼解釋開銷.雙端隊列中甚至還有一個專用快速路徑使用 maxlen=0
雙端隊列來使用迭代器的實現.
Using collections.deque
this way avoids storing all the elements (because maxlen=0
) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a maxlen=0
deque to consume an iterator.
時間:
In [1]: import collections
In [2]: x = range(1000)
In [3]: %%timeit
...: i = iter(x)
...: for _ in i:
...: pass
...:
16.5 μs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [4]: %%timeit
...: i = iter(x)
...: collections.deque(i, maxlen=0)
...:
12 μs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
當然,這都是基于 CPython 的.解釋器開銷的整個性質在其他 Python 實現中非常不同,并且 maxlen=0
快速路徑特定于 CPython.有關其他 Python 實現,請參閱 abarnert 的回答.
Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the maxlen=0
fast path is specific to CPython. See abarnert's answer for other Python implementations.
這篇關于使用迭代器的最快(最 Pythonic)方式的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!