問題描述
當解析XML文檔格式為:
When parsing XML documents in the format of:
<Car>
<Color>Blue</Color>
<Make>Chevy</Make>
<Model>Camaro</Model>
</Car>
我使用以下代碼:
carData = element.xpath('//Root/Foo/Bar/Car/node()[text()]')
parsedCarData = [{field.tag: field.text for field in carData} for action in carData]
print parsedCarData[0]['Color'] #Blue
如果標簽為空,則此代碼將不起作用,例如:
This code will not work if a tag is empty such as :
<Car>
<Color>Blue</Color>
<Make>Chevy</Make>
<Model/>
</Car>
使用與上面相同的代碼:
Using the same code as above:
carData = element.xpath('//Root/Foo/Bar/Car/node()[text()]')
parsedCarData = [{field.tag: field.text for field in carData} for action in carData]
print parsedCarData[0]['Model'] #Key Error
我將如何解析這個空白標簽.
How would I parse this blank tag.
推薦答案
您正在放入一個 [text()]
過濾器,該過濾器僅顯式詢問具有文本節點的元素...然后當它沒有給你沒有文本節點的元素時你會不高興?
You're putting in a [text()]
filter which explicitly asks only for elements which have text nodes them... and then you're unhappy when it doesn't give you elements without text nodes?
去掉那個過濾器,你會得到你的模型元素:
Leave that filter out, and you'll get your model element:
>>> s='''
... <root>
... <Car>
... <Color>Blue</Color>
... <Make>Chevy</Make>
... <Model/>
... </Car>
... </root>'''
>>> e = lxml.etree.fromstring(s)
>>> carData = e.xpath('Car/node()')
>>> carData
[<Element Color at 0x23a5460>, <Element Make at 0x23a54b0>, <Element Model at 0x23a5500>]
>>> dict(((e.tag, e.text) for e in carData))
{'Color': 'Blue', 'Make': 'Chevy', 'Model': None}
也就是說——如果你的直接目標是遍歷樹中的節點,你可以考慮使用 lxml.etree.iterparse()
代替,這將避免嘗試構建完整的 DOM樹在內存中,否則比構建樹然后使用 XPath 對其進行迭代要高效得多.(想想 SAX,但沒有瘋狂和痛苦的 API).
That said -- if your immediate goal is to iterate over the nodes in the tree, you might consider using lxml.etree.iterparse()
instead, which will avoid trying to build a full DOM tree in memory and otherwise be much more efficient than building a tree and then iterating over it with XPath. (Think SAX, but without the insane and painful API).
使用 iterparse
實現可能如下所示:
Implementing with iterparse
could look like this:
def get_cars(infile):
in_car = False
current_car = {}
for (event, element) in lxml.etree.iterparse(infile, events=('start', 'end')):
if event == 'start':
if element.tag == 'Car':
in_car = True
current_car = {}
continue
if not in_car: continue
if element.tag == 'Car':
yield current_car
continue
current_car[element.tag] = element.text
for car in get_cars(infile = cStringIO.StringIO('''<root><Car><Color>Blue</Color><Make>Chevy</Make><Model/></Car></root>''')):
print car
...這是更多代碼,但是(如果我們不使用 StringIO 作為示例)它可以處理比內存容量大得多的文件.
...it's more code, but (if we weren't using StringIO for the example) it could process a file much larger than could fit in memory.
這篇關于使用 LXML 和 Python 解析空白 XML 標簽的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!