問題描述
我有一個這樣定義的多行字符串:
I have a multi-line string defined like this:
我們將這個字符串用作我正在編寫的解析器的測試輸入.解析器函數(shù)接收一個 file
對象作為輸入并對其進行迭代.它還直接調(diào)用 next()
方法來跳過行,所以我真的需要一個迭代器作為輸入,而不是一個可迭代的.我需要一個迭代器來迭代該字符串的各個行,就像 file
-object 將遍歷文本文件的行一樣.我當然可以這樣做:
This string we used as test-input for a parser I am writing. The parser-function receives a file
-object as input and iterates over it. It does also call the next()
method directly to skip lines, so I really need an iterator as input, not an iterable.
I need an iterator that iterates over the individual lines of that string like a file
-object would over the lines of a text-file. I could of course do it like this:
有沒有更直接的方法?在這種情況下,字符串必須遍歷一次以進行拆分,然后再由解析器遍歷.在我的測試用例中沒關(guān)系,因為那里的字符串很短,我只是出于好奇而問.Python 為這些東西提供了很多有用且高效的內(nèi)置函數(shù),但我找不到適合這種需要的東西.
Is there a more direct way of doing this? In this scenario the string has to traversed once for the splitting, and then again by the parser. It doesn't matter in my test-case, since the string is very short there, I am just asking out of curiosity. Python has so many useful and efficient built-ins for such stuff, but I could find nothing that suits this need.
推薦答案
這里有三種可能:
將其作為主腳本運行可確認這三個功能是等效的.使用 timeit
(以及 * 100
用于 foo
以獲得大量字符串以進行更精確的測量):
Running this as the main script confirms the three functions are equivalent. With timeit
(and a * 100
for foo
to get substantial strings for more precise measurement):
請注意,我們需要調(diào)用 list()
來確保遍歷迭代器,而不僅僅是構(gòu)建迭代器.
Note we need the list()
call to ensure the iterators are traversed, not just built.
IOW,天真的實現(xiàn)快得多,甚至都不好笑:比我嘗試使用 find
調(diào)用的速度快 6 倍,而后者又比較低級別的方法快 4 倍.
IOW, the naive implementation is so much faster it isn't even funny: 6 times faster than my attempt with find
calls, which in turn is 4 times faster than a lower-level approach.
要記住的教訓(xùn):測量總是一件好事(但必須準確);像 splitlines
這樣的字符串方法以非常快的方式實現(xiàn);通過在非常低的級別編程(尤其是通過非常小片段的 +=
循環(huán))將字符串放在一起可能會很慢.
Lessons to retain: measurement is always a good thing (but must be accurate); string methods like splitlines
are implemented in very fast ways; putting strings together by programming at a very low level (esp. by loops of +=
of very small pieces) can be quite slow.
編輯:添加了@Jacob 的建議,稍作修改以提供與其他建議相同的結(jié)果(保留一行尾隨空格),即:
Edit: added @Jacob's proposal, slightly modified to give the same results as the others (trailing blanks on a line are kept), i.e.:
測量給出:
不如基于 .find
的方法好——仍然值得牢記,因為它可能不太容易出現(xiàn)小錯誤(任何你看到出現(xiàn)+1 和 -1,就像我上面的 f3
一樣,應(yīng)該自動觸發(fā)一對一的懷疑——許多缺乏這種調(diào)整的循環(huán)也應(yīng)該有這些調(diào)整——盡管我相信我的代碼是也是正確的,因為我能夠使用其他功能檢查它的輸出').
not quite as good as the .find
based approach -- still, worth keeping in mind because it might be less prone to small off-by-one bugs (any loop where you see occurrences of +1 and -1, like my f3
above, should automatically trigger off-by-one suspicions -- and so should many loops which lack such tweaks and should have them -- though I believe my code is also right since I was able to check its output with other functions').
但基于拆分的方法仍然適用.
But the split-based approach still rules.
順便說一句:f4
可能更好的樣式是:
An aside: possibly better style for f4
would be:
至少,它不那么冗長了.不幸的是,需要去除尾隨
的需要禁止用 return iter(stri)
更清晰、更快速地替換 while
循環(huán)(>iter
部分在現(xiàn)代版本的 Python 中是多余的,我相信從 2.3 或 2.4 開始,但它也是無害的).也許也值得一試:
at least, it's a bit less verbose. The need to strip trailing
s unfortunately prohibits the clearer and faster replacement of the while
loop with return iter(stri)
(the iter
part whereof is redundant in modern versions of Python, I believe since 2.3 or 2.4, but it's also innocuous). Maybe worth trying, also:
或其變體——但我在這里停下來,因為它幾乎是一個基于 strip
的理論練習(xí),最簡單,最快,一個.
or variations thereof -- but I'm stopping here since it's pretty much a theoretical exercise wrt the strip
based, simplest and fastest, one.
這篇關(guān)于遍歷字符串的行的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!