精品欧美,日韩欧美视频在线,国产精品一区二区三区四区在线观看

本文介紹了iterparse 無法解析字段，而其他類似的都可以的處理方法，對(duì)大家解決問題具有一定的參考價(jià)值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

我使用 Python 的 iterparse 來解析 nessus 掃描的 XML 結(jié)果(.nessus 文件).意外記錄解析失敗，但類似的記錄已正確解析.

I use Python's iterparse to parse the XML result of a nessus scan (.nessus file). The parsing fails on unexpected records, wile similar ones have been parsed correctly.

XML 文件的一般結(jié)構(gòu)是很多記錄，如下所示:

The general structure of the XML file is a lot of records like the one below:

<ReportHost>
  <ReportItem>
    <foo>9.3</foo>
    <bar>hello</bar>
  </ReportItem>
  <ReportItem>
     <foo>10.0</foo>
     <bar>world</bar>
</ReportHost>
<ReportHost>
   ...
</ReportHost>

換句話說，很多主機(jī) (ReportHost) 有很多要報(bào)告的項(xiàng)目 (ReportItem)，而后者有幾個(gè)特征 (foo，條).我將考慮為每個(gè)項(xiàng)目生成一行，并具有其特征.

In other words a lot of hosts (ReportHost) with a lot of items to report (ReportItem), and the latter having several characteristics (foo, bar). I will be looking at generating one line per item, with its characteristics.

在文件中間的一行簡(jiǎn)單的解析失敗(foo 在這種情況下是 cvss_base_score)

The parsing fails in the middle of the file at a simple line (foo in that case being cvss_base_score)

<cvss_base_score>9.3</cvss_base_score>

雖然已經(jīng)解析了大約 200 條類似的行，但沒有問題.

while ~200 similar lines have been parsed without problems.

相關(guān)的代碼如下——它設(shè)置了上下文標(biāo)記(inReportHost 和 inReportEvent 告訴我我所在的 XML 文件的具體位置，以及根據(jù)上下文分配或打印一個(gè)值)

The relevant piece of code is below -- it sets context markers (inReportHost and inReportEvent which tell me where in the stricture of the XML file I am in, and either assign or print a value, depending on the context)

import xml.etree.cElementTree as ET
inReportHost = False
inReportItem = False

for event, elem in ET.iterparse("test2.nessus", events=("start", "end")):
    if event == 'start' and elem.tag == "ReportHost":
        inReportHost = True
    if event == 'end' and elem.tag == "ReportHost":
        inReportHost = False
        elem.clear()
    if inReportHost:
        if event == 'start' and elem.tag == 'ReportItem':
            inReportItem = True
            cvss = ''
        if event == 'start' and inReportItem:
            if event == 'start' and elem.tag == 'cvss_base_score':
                cvss = elem.text
        if event == 'end' and elem.tag == 'ReportItem':
            print cvss
            inReportItem = False

cvss 有時(shí)具有 None 值(在 cvss = elem.text 賦值之后)，即使相同的條目已在文件的前面正確解析.

cvss sometimes has the None value (after the cvss = elem.text assignment), even though identical entries have been parsed properely earlier in the file.

如果我在分配下面添加一些類似的東西

If I add below the assignement something along the lines of

if cvss is None: cvss = "0"

然后解析許多進(jìn)一步的 cvss 分配它們的正確值(還有一些是 None ).

then the parsing of many further cvss assign their proper values (and some other are None).

當(dāng)使用 <ReportHost>...</reportHost> 這會(huì)導(dǎo)致錯(cuò)誤的解析并通過程序運(yùn)行它 - 它工作正常(即.cvss 按預(yù)期分配了 9.3).

When taking the <ReportHost>...</reportHost> which causes the wrong parsing and running it through the program - it works fine (ie. cvss is assigned 9.3 as expected).

我迷失在我的代碼中出現(xiàn)錯(cuò)誤的地方，因?yàn)橛写罅肯嗨频挠涗洠行┮颜_處理，有些 - 未正確處理(有些記錄是相同的，但處理方式仍然不同).我也找不到任何關(guān)于失敗記錄的具體信息 - 早晚相同的記錄都可以.

I am lost at where I make a mistake in my code since, withing a large set of similar records, some apre processed correctly and some - not (some of the records are identical, and still are processed differently). I also cannot find anything particular about the records that fail - identical ones earlier and later are fine.

推薦答案

來自 iterparse() 文檔:

注意:iterparse() 只保證它已經(jīng)看到了>"字符當(dāng)它發(fā)出一個(gè)開始"事件時(shí)，它的起始標(biāo)簽，所以屬性是已定義，但 text 和 tail 屬性的內(nèi)容是那時(shí)未定義.這同樣適用于子元素；它們可能存在也可能不存在.如果您需要一個(gè)完全填充的元素，而是尋找結(jié)束"事件.

Note: iterparse() only guarantees that it has seen the ">" character of a starting tag when it emits a "start" event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present. If you need a fully populated element, look for "end" events instead.

刪除 inReport* 變量并在完全解析后僅在結(jié)束"事件上處理 ReportHost.使用 ElementTree API 從當(dāng)前 ReportHost 元素中獲取必要的信息，例如 cvss_base_score.

Drop inReport* variables and process ReportHost only on "end" events when it fully parsed. Use ElementTree API to get necessary info such as cvss_base_score from current ReportHost element.

要保留內(nèi)存，請(qǐng)執(zhí)行以下操作:

To preserve memory, do:

import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory

for host in getelements("test2.nessus", "ReportHost"):
    for cvss_el in host.iter("cvss_base_score"):
        print(cvss_el.text)

這篇關(guān)于iterparse 無法解析字段，而其他類似的都可以的文章就介紹到這了，希望我們推薦的答案對(duì)大家有所幫助，也希望大家多多支持html5模板網(wǎng)！

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題，如果有圖片或者內(nèi)容侵犯了您的權(quán)益，請(qǐng)聯(lián)系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

iterparse 無法解析字段，而其他類似的都可以

問題描述

推薦答案

相關(guān)文檔推薦