問題描述
我有一個巨大的 xml 文件 (1 Gig).我想將一些元素(條目)移動到具有相同標題和規范的另一個文件中.
假設原始文件包含帶有標簽<to_move>
的條目:
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE some SYSTEM "some.dtd"><一些>...<to_move date="somedate"><child>一些文字</child>......</to_move>...</一些>
我使用 lxml.etree.iterparse 來遍歷文件.工作正常.當我找到帶有標簽 <to_move>
的元素時,假設它存儲在變量 element
我做
new_file.write(etree.tostring(element))
但這會導致
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE some SYSTEM "some.dtd"><一些>...<to_move xmlns:="some" date="somedate"># <---- 這就是問題所在.我不想要命名空間.<child>一些文字</child>......</to_move>...</一些>
所以問題是:如何告訴 etree.tostring() 不要寫 xmlns:="some"
.這可能嗎?我在 lxml.etree 的 api-documentation 中苦苦掙扎,但找不到令人滿意的答案.
這是我為 etree.trostring
找到的:
tostring(element_or_tree, encoding=None, method="xml",xml_declaration=無,pretty_print=False,with_tail=True,獨立=無,文檔類型=無,排他=假,with_comments=真)
<塊引用>
將元素序列化為其 XML 的編碼字符串表示樹.
對我來說,tostring()
的每個參數似乎都沒有幫助.有什么建議或更正嗎?
我經常像這樣抓取一個命名空間為它創建一個別名:
someXML = lxml.etree.XML(someString)如果 ns 為無:ns = {"m": someXML.tag.split("}")[0][1:]}someid = someXML.xpath('.//m:ImportantThing//m:ID', namespaces=ns)
你可以做一些類似的事情來獲取命名空間,以便在使用 tostring
后創建一個正則表達式來清理它.
或者你可以清理輸入字符串.找到第一個空格,檢查后面是否有xmlns,如果是,則刪除整個xmlns直到下一個空格,如果沒有則刪除空格.重復直到沒有更多的空格或 xmlns 聲明.但不要超過第一個 >
.
I have a huge xml file (1 Gig). I want to move some of the elements (entrys) to another file with the same header and specifications.
Let's say the original file contains this entry with tag <to_move>
:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE some SYSTEM "some.dtd">
<some>
...
<to_move date="somedate">
<child>some text</child>
...
...
</to_move>
...
</some>
I use lxml.etree.iterparse to iterate through the file. Works fine. When I find the element with tag <to_move>
, let's assume it is stored in the variable element
I do
new_file.write(etree.tostring(element))
But this results in
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE some SYSTEM "some.dtd">
<some>
...
<to_move xmlns:="some" date="somedate"> # <---- Here is the problem. I don't want the namespace.
<child>some text</child>
...
...
</to_move>
...
</some>
So the question is: How to tell etree.tostring() not to write the xmlns:="some"
. Is this possible? I struggeled with the api-documentation of lxml.etree, but I couldn't find a satisfying answer.
This is what I found for etree.trostring
:
tostring(element_or_tree, encoding=None, method="xml",
xml_declaration=None, pretty_print=False, with_tail=True,
standalone=None, doctype=None, exclusive=False, with_comments=True)
Serialize an element to an encoded string representation of its XML tree.
To me every one of the parameters of tostring()
does not seem to help. Any suggestion or corrections?
I often grab a namespace to make an alias for it like this:
someXML = lxml.etree.XML(someString)
if ns is None:
ns = {"m": someXML.tag.split("}")[0][1:]}
someid = someXML.xpath('.//m:ImportantThing//m:ID', namespaces=ns)
You could do something similar to grab the namespace in order to make a regex that will clean it up after using tostring
.
Or you could clean up the input string. Find the first space, check if it is followed by xmlns, if yes, delete the whole xmlns bit up to the next space, if no delete the space. Repeat until there are no more spaces or xmlns declarations. But don't go past the first >
.
這篇關于如何告訴 lxml.etree.tostring(element) 不要在 python 中編寫命名空間?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!