問題描述
我在寫這個問題的答案時考慮了以下.
I thought about the following while writing an answer to this question.
假設我有一個像這樣深度嵌套的 xml
文件(但嵌套更多且更長):
Suppose I have a deeply nested xml
file like this (but much more nested and much longer):
<section name="1">
<subsection name"foo">
<subsubsection name="bar">
<deeper name="hey">
<much_deeper name"yo">
<li>Some content</li>
</much_deeper>
</deeper>
</subsubsection>
</subsection>
</section>
<section name="2">
... and so forth
</section>
len(soup.find_all("section"))
的問題在于,在執行 find_all("section")
時,BS 一直在深入搜索一個標簽我知道不會包含任何其他 section
標記.
The problem with len(soup.find_all("section"))
is that while doing find_all("section")
, BS keeps searching deep into a tag that I know won't contain any other section
tag.
那么,兩個問題:
- 有沒有辦法讓 BS 不遞歸搜索到已經找到的標簽?
- 如果對 1 的回答是肯定的,是效率更高還是內部流程相同?
- Is there a way to make BS not search recursively into an already found tag?
- If the answer to 1 is yes, will it be more efficient or is it the same internal process?
推薦答案
BeautifulSoup
不能只提供它找到的標簽的計數/數量.
BeautifulSoup
cannot give you just a count/number of tags it found.
不過,您可以改進的是:不要讓 BeautifulSoup
通過傳遞 recursive=False
來搜索其他部分中的部分:
What you, though, can improve is: don't let BeautifulSoup
go searching sections inside other sections by passing recursive=False
:
len(soup.find_all("section", recursive=False))
除了改進之外,lxml
會更快地完成這項工作:
Aside from that improvement, lxml
would do the job faster:
tree.xpath('count(//section)')
這篇關于BeautifulSoup 計數標簽而不深入解析它們的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!