| Writing an application for a SAX-compliant XML parser | ||
|---|---|---|
| Prev | Writing an application for a SAX-compliant XML parser | Next |
A SAX application must in principle contain the handler classes DocumentHandler, DTDHandler, EntityResolver, and ErrorHandler, which should implement the methods prescribed by the SAX specification. A SAX-compliant parser expects these methods and makes calls to them.
A SAX parser comes with a dummy implementation of these classes and their methods. Almost all methods consist of the single statement 'pass'. These dummy classes are found in the SAX library.
Therefore you need not specify all these methods yourself. You can let your handlers inherit from these dummy classes, so that they automatically implement dummy versions of the methods. Then you only write the few handler methods from which you wish to see more action.
You can also write one single class which implements all handler methods, and register it with the parser for all four handler classes. The SAX library also comes with a dummy implementation of such a class, called HandlerBase, from which you can inherit. HandlerBase simply is a subclass of all four dummy handler classes.
Note that the dummy implementation of the ErrorHandler lets warnings, errors and fatal errors pass without message or action. Probably this is not what you want, and you should write your own methods. Or you can use one of the ErrorHandler classes that are available in the saxutils module:
Simply raises the exception it receives from the parser, i.e., a SAXParseException.
Prints the error message but does not raise an exception.
A SAX parser may have information about the location in the XML source document of the reported event or transferred data. To make this information available to the application, the parser registers a locator object with the DocumentHandler, through the DocumentHandler's setDocumentLocator method. When called, the application's DocumentHandler methods may call back to the parser with one of the Locator methods to obtain this information.
There is no guarantee that the parser does register a locator. The DocumentHandler's calls should be aware of this, e.g., by embedding the calls in a try statement which intercepts an AttributeError.
Short list of required methods:
# --- EntityResolver
resolveEntity(self, publicId, systemId):
"Resolve the system identifier of an entity."
# --- ErrorHandler
error(self, exception):
"Handle a recoverable error.
exception is an instance of SAXParseException"
fatalError(self, exception):
"Handle a non-recoverable error.
exception is an instance of SAXParseException"
warning(self, exception):
"Handle a warning.
exception is an instance of SAXParseException"
# --- DTDHandler
notationDecl(self, name, publicId, systemId):
"Handle a notation declaration event."
unparsedEntityDecl(self, name, publicId, systemId, ndata):
"Handle an unparsed entity declaration event."
# --- DocumentHandler
characters(self, ch, start, length):
"Handle a character data event.
The data are contained in the substring of ch
starting at position start and of length length."
endDocument(self):
"Handle an event for the end of a document."
endElement(self, name):
"Handle an event for the end of an element."
ignorableWhitespace(self, ch, start, length):
"Handle an event for ignorable whitespace in element content.
The data are contained in the substring of ch
starting at position start and of length length."
processingInstruction(self, target, data):
"Handle a processing instruction event."
setDocumentLocator(self, locator):
"Receive an object for locating the origin of SAX document events.
locator is an object of type Locator."
startDocument(self):
"Handle an event for the beginning of a document."
startElement(self, name, atts):
"Handle an event for the beginning of an element.
atts is an object of type AttributeList." |
For a description of the types Locator, AttributeList, and SAXParseException, see the same section, subsections 4.1, 4.6, 4.9. Note that these types are implemented in the SAX library, and that the parser passes an object of these types to your handler. Your handler can use the methods of these types in its action.
You may select any XML parser that implements the SAX interface.
Most Python XML parsers do not implement the SAX interface natively. To make them SAX-compliant, so-called driver modules have been written. These modules contain a parser class, which translate the parser's native interface to a SAX interface. The parser class of such a driver module can be used as a SAX parser. The driver modules use the saxlib module.
If you do not know which SAX XML parsers are available, you can make use of a so-called parser factory:
SAXparser=xml.sax.saxexts.make_parser() |
Register all four handlers or the single combined handler with the parser using the parser's setDocumentHandler, setDTDHandler, setEntityResolver, and setErrrorHandler methods.
Start the parse by calling the parser's parse method, which requires the system identifier (file name) of the XML file as its argument.
Now the parser and the application interact with each other on the basis of the events and data in the XML document, and the actions of the application's handlers: The parser reports events and transfers data by calling the handler methods. The DocumentHandler methods may call back to the parser for location information. Similarly for the ErrorHandler methods, since they receive a locator object as their third argument.
At the end of the parse, the parser returns from the parse or parseFile method. If your application is done with the data it obtained from the parse, it may now stop. Alternatively, your document handler may have stored the data in memory, and your application may now start processing them.