問題描述
我是 xml 解析的新手.此 xml 文件 具有以下樹:
I am new to xml parsing. This xml file has the following tree:
FHRSEstablishment
|--> Header
| |--> ...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
但是當(dāng)我使用 ElementTree 訪問它并查找 child
標(biāo)記和屬性時(shí),
but when I access it with ElementTree and look for the child
tags and attributes,
import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
print child.tag, child.attrib
我只得到:
Header {}
EstablishmentCollection {}
我認(rèn)為這意味著它們的屬性是空的.為什么會(huì)這樣,如何訪問嵌套在 EstablishmentDetail
和 Scores
中的子級(jí)?
which I assume means that their attributes are empty. Why is it so, and how can I access the children nested inside EstablishmentDetail
and Scores
?
編輯
感謝下面的答案,我可以進(jìn)入樹內(nèi),但是如果我想檢索諸如 Scores
中的值,這將失敗:
Thanks to the answers below I can get inside the tree, but if I want to retrieve values such as those in Scores
, this fails:
for node in root.find('.//EstablishmentDetail/Scores'):
rating = node.attrib.get('Hygiene')
print rating
并產(chǎn)生
None
None
None
這是為什么呢?
推薦答案
你必須在你的根目錄上迭代().
Yo have to iter() over your root.
那就是 root.iter()
可以解決問題!
that is root.iter()
would do the trick!
import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
print child.tag, child.attrib
輸出:
FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
- 要獲取
EstablishmentDetail
中的所有標(biāo)簽,您需要找到該標(biāo)簽,然后遍歷其子標(biāo)簽! - To get all tags inside
EstablishmentDetail
you need to find that tag and then loop through its children!
也就是說,例如.
for child in root.find('.//EstablishmentDetail'):
print child.tag, child.attrib
輸出:
FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
- 要獲得您在評(píng)論中提到的
Hygiene
的分?jǐn)?shù),
您所做的是,它將獲得第一個(gè) Scores
標(biāo)簽,并且當(dāng)您在 root.find('.//Scores'):rating=child.get('Hygiene').也就是說,顯然所有三個(gè)孩子都不會(huì)有元素!
What you have done is, it will get the first Scores
tag and that will have Hygiene, ConfidenceInManagement, Structural tags as child when you call for each in root.find('.//Scores'):rating=child.get('Hygiene')
. That is, obviously all three child will not have the element!
你需要先- 查找所有 Scores
標(biāo)簽.- 在找到的每個(gè)標(biāo)簽中找到Hygiene
!
You need to first
- find all Scores
tag.
- find Hygiene
in every tags found!
for each in root.findall('.//Scores'):
rating = each.find('.//Hygiene')
print '' if rating is None else rating.text
輸出:
5
5
5
0
5
這篇關(guān)于訪問使用 ElementTree 解析的 xml 文件中的嵌套子項(xiàng)的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!