Python implements XML file parsing

admin · Posted on 6/5/2018 2:27:09 PM

1. XML Introduction

XML (eXtensible Markup Language) refers to an extensible markup language, designed to transmit and store data, and has become the core of many new technologies today, with different applications in different fields. It is an inevitable product of the development of the web to a certain stage, which has the core characteristics of SGML, the simple characteristics of HTML, and many new features such as clear and well-structured.

test. XML file

<?xml version="1.0" encoding="utf-8"?>
<catalog>
<maxid>4</maxid>
<login username="pytest" passwd='123456'>
      <caption>Python</caption>
      <item id="4">
         <caption>test</caption>
      </item>
</login>
<item id="2">
      <caption>Zope</caption>
</item>
</catalog>

For a detailed introduction to XML, please refer to:http://www.w3school.com.cn/xmldom/dom_nodetype.asp

2. XML document parsing

There are three common methods for python to parse XML: one is the xml.dom.* module, which is an implementation of the W3C DOM API, which is very suitable if you need to process the DOM API; The second is the xml.sax.* module, which is an implementation of the SAX API, which sacrifices convenience for speed and memory footprint, and SAX is an event-based API, which means it can process a huge number of documents "in the air" without having to load it completely into memory; The third is the xml.etree.ElementTree module (ET for short), which provides a lightweight Python-style API, which is much faster than the DOM, and there are many pleasant APIs that can be used, compared to SAX, ET's ET.iterparse also provides a "in the air" processing method, there is no need to load the entire document into memory, and the average performance of ET is similar to SAX. However, the API is a little more efficient and easy to use.

2.1 xml.dom.*

The Document Object Model (DOM) is a standard programming interface recommended by the W3C organization to handle extensible markup languages. When a DOM parser parses an XML document, it reads the entire document at once, stores all the elements in the document in a tree structure in memory, and then you can use different functions provided by the DOM to read or modify the content and structure of the document, or write the modified content to the xml file. Use xml.dom.minidom in Python to parse the xml file.

a. Get subtags
b. Labels that distinguish the same tag name
c. Get the tag attribute value
d. Get data between label pairs

Login is visible.

Output:

Login is visible.

2.2 xml.etree.ElementTree

ElementTree was born to handle XML, and there are two implementations in the Python standard library: one is a pure Python implementation, such as xml.etree.ElementTree, and the other is a faster xml.etree.cElementTree. Note: Try to use the C language to implement it, as it is faster and consumes less memory.

a. Traverse the next layer of the root node
b. Subscript access to individual tags, attributes, and text
c. Find the specified tag under root
d. Traversing XML files
e. Modify the XML file

Login is visible.

Output:

Login is visible.

Attached:

Login is visible.

2.3 xml.sax.*

SAX is an event-driven API that uses SAX to parse XML in two parts: the parser and the event handler.

The parser is responsible for reading the XML document and sending events to the event handler, such as element start and element end events

The event processor is responsible for responding to the events and processing the XML data passed

Common scenarios:

(1) Process large documents

(2) Only part of the document is required, or specific information is only obtained from the file

(3) I want to build my own object model

[Source] Python implements XML file parsing

Related Posts

Sections viewed