精一区二区,波多野结衣一区三区,亚洲首页

本文介紹了如何獲取 Python 中兩個 xml 標簽之間的全部內容?的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學習吧！

問題描述

我嘗試獲取打開的 xml 標記和它的關閉對應項之間的全部內容.

I try to get the whole content between an opening xml tag and it's closing counterpart.

像下面的 title 這樣直接獲取內容很容易，但是如果 mixed-content 被使用，我想保留內部標簽?

Getting the content in straight cases like title below is easy, but how can I get the whole content between the tags if mixed-content is used and I want to preserve the inner tags?

<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text sometimes="attribute">Some text with <extradata>data</extradata> in it.
  It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag> 
  or more</sometag>.</text>
</review>

我想要的是兩個text標簽之間的內容，包括任何標簽:Some text with <extradata>data</extradata>在里面.它跨越<sometag>多行:<tag>one</tag>、<tag>two</tag>或更多</sometag>.

現在我使用正則表達式，但它有點亂，我不喜歡這種方法.我傾向于基于 XML 解析器的解決方案.我查看了 minidom、etree、lxml 和 BeautifulSoup，但找不到適合這種情況的解決方案(整個內容，包括內部標簽).

For now I use regular expressions but it get's kinda messy and I don't like this approach. I lean towards a XML parser based solution. I looked over minidom, etree, lxml and BeautifulSoup but couldn't find a solution for this case (whole content, including inner tags).

推薦答案

from lxml import etree
t = etree.XML(
"""<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text>Some text with <extradata>data</extradata> in it.</text>
</review>"""
)
(t.text + ''.join(map(etree.tostring, t))).strip()

這里的訣竅是 t 是可迭代的，并且在迭代時會產生所有子節點.因為etree避免了文本節點，所以還需要恢復第一個子標簽之前的文本，用t.text.

The trick here is that t is iterable, and when iterated, yields all child nodes. Because etree avoids text nodes, you also need to recover the text before the first child tag, with t.text.

In [50]: (t.text + ''.join(map(etree.tostring, t))).strip()
Out[50]: '<title>Some testing stuff</title>
  <text>Some text with <extradata>data</extradata> in it.</text>'

或者:

In [6]: e = t.xpath('//text')[0]

In [7]: (e.text + ''.join(map(etree.tostring, e))).strip()
Out[7]: 'Some text with <extradata>data</extradata> in it.'

這篇關于如何獲取 Python 中兩個 xml 標簽之間的全部內容?的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網！

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題，如果有圖片或者內容侵犯了您的權益，請聯系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

如何獲取 Python 中兩個 xml 標簽之間的全部內容

問題描述

推薦答案

相關文檔推薦