Overview Java XML API
The most important decision you'll make at the start of an XML project is the application- programming interface (API) you'll use. Many APIs are implemented by multiple vendors, so if the specific parser gives you trouble you can swap in an alternative, often without even recompiling your code. However, if you choose the wrong API, changing to a different one may well involve redesigning and rebuilding the entire application from scratch. Of course, as Fred Brooks taught us, “In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved.… Hence plan to throw one away; you will, anyhow. ” [1] Still, it is much easier to change parsers than APIs.
There are two major standard APIs for processing XML documents with Java, the Simple API for XML (SAX) and the Document Object Model (DOM), each of which comes in several versions. In addition there are a host of other, somewhat idiosyncratic APIs including JDOM, dom4j, ElectricXML, and XMLPULL. Finally each specific parser generally has a native API that it exposes below the level of the standard APIs. For instance, the Xerces parser has the Xerces Native Interface (XNI). However, picking such an API limits your choice of parser, and indeed may even tie you to one particular version of the parser since parser vendors tend not to worry a great deal about maintaining native compatibility between releases. Each of these APIs has its own strengths and weaknesses.
The most important decision you'll make at the start of an XML project is the application- programming interface (API) you'll use. Many APIs are implemented by multiple vendors, so if the specific parser gives you trouble you can swap in an alternative, often without even recompiling your code. However, if you choose the wrong API, changing to a different one may well involve redesigning and rebuilding the entire application from scratch. Of course, as Fred Brooks taught us, “In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved.… Hence plan to throw one away; you will, anyhow. ” [1] Still, it is much easier to change parsers than APIs.
There are two major standard APIs for processing XML documents with Java, the Simple API for XML (SAX) and the Document Object Model (DOM), each of which comes in several versions. In addition there are a host of other, somewhat idiosyncratic APIs including JDOM, dom4j, ElectricXML, and XMLPULL. Finally each specific parser generally has a native API that it exposes below the level of the standard APIs. For instance, the Xerces parser has the Xerces Native Interface (XNI). However, picking such an API limits your choice of parser, and indeed may even tie you to one particular version of the parser since parser vendors tend not to worry a great deal about maintaining native compatibility between releases. Each of these APIs has its own strengths and weaknesses.
SAX
SAX, the Simple API for XML, is the gold standard of XML APIs. It is the most complete and correct by far. Given a fully validating parser that supports all its optional features, there is very little you can’t do with it. It has one or two holes, but they're really off in the weeds of the XML specifications, and you have to look pretty hard to find them. SAX is a event driven API. The SAX classes and interfaces model the parser, the stream from which the document is read, and the client application receiving data from the parser. However, no class models the XML document itself. Instead the parser feeds content to the client application through a callback interface, much like the ones used in Swing and the AWT.
This makes SAX very fast and very memory efficient (since it doesn’t have to store the entire document in memory). However, SAX programs can be harder to design and code because you normally need to develop your own data structures to hold the content from the document. SAX works best when your processing is fairly local; that is, when all the information you need to use is close together in the document. For example, you might process one element at a time. Applications that require access to the entire document at once in order to take useful action would be better served by one of the tree-based APIs like DOM or JDOM. Finally, because SAX is so efficient, it’s the only real choice for truly huge XML documents. Of course, “truly huge” has to be defined relative to available memory. However, if the documents you're processing are in the gigabyte range, you really have no choice but to use SAX.
SAX, the Simple API for XML, is the gold standard of XML APIs. It is the most complete and correct by far. Given a fully validating parser that supports all its optional features, there is very little you can’t do with it. It has one or two holes, but they're really off in the weeds of the XML specifications, and you have to look pretty hard to find them. SAX is a event driven API. The SAX classes and interfaces model the parser, the stream from which the document is read, and the client application receiving data from the parser. However, no class models the XML document itself. Instead the parser feeds content to the client application through a callback interface, much like the ones used in Swing and the AWT.
This makes SAX very fast and very memory efficient (since it doesn’t have to store the entire document in memory). However, SAX programs can be harder to design and code because you normally need to develop your own data structures to hold the content from the document. SAX works best when your processing is fairly local; that is, when all the information you need to use is close together in the document. For example, you might process one element at a time. Applications that require access to the entire document at once in order to take useful action would be better served by one of the tree-based APIs like DOM or JDOM. Finally, because SAX is so efficient, it’s the only real choice for truly huge XML documents. Of course, “truly huge” has to be defined relative to available memory. However, if the documents you're processing are in the gigabyte range, you really have no choice but to use SAX.
DOM
DOM, the Document Object Model, is a fairly complex API that models an XML document as a tree. Unlike SAX, DOM is a read-write API. It can both parse existing XML documents and create new ones. Each XML document is represented as Document object. Documents are searched, queried, and updated by invoking methods on this Document object and the objects it contains. This makes DOM much more convenient when random access to widely separated parts of the original document is required. However, it is quite memory intensive compared to SAX, and not nearly as well suited to streaming applications.
JAXP
JAXP, the Java API for XML Processing, bundles SAX and DOM together along with some factory classes and the TrAX XSLT API. (TrAX is not a general purpose XML API like SAX and DOM. I'll get to it in Chapter 17.) It is a standard part of Java 1.4 and later. However, it is not really a different API. When starting a new program, you ask yourself whether you should choose SAX or DOM. You don’t ask yourself whether you should use SAX or JAXP, or DOM or JAXP. SAX and DOM are part of JAXP.
JDOM
JDOM is a Java-native tree-based API that attempts to remove a lot of DOM’s ugliness. The JDOM mission statement is, “There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck,” and for the most part JDOM delivers. Like DOM, JDOM reads the entire document into memory before it begins to work on it; and the broad outline of JDOM programs tends to be the same as for DOM programs. However, the low-level code is a lot less tricky and ugly than the DOM equivalent. JDOM uses concrete classes and constructors rather than interfaces and factory methods. It uses standard Java coding conventions, methods, and classes throughout. JDOM programs often flow a lot more naturally than the equivalent DOM program. I think JDOM often does make the easy problems easier; but in my experience JDOM also makes the hard problems harder. Its design shows a very solid understanding of Java, but the XML side of the equation feels much rougher. It’s missing some crucial pieces like a common node interface or superclass for navigation. JDOM works well (and much better than DOM) on fairly simple documents with no recursion, limited mixed content, and a well-known vocabulary. It begins to show some weakness when asked to process arbitrary XML. When I need to write programs that operate on any XML document, I tend to find DOM simpler despite its ugliness.
Download XML - Application Programming Interfaces (APIs)
DOM, the Document Object Model, is a fairly complex API that models an XML document as a tree. Unlike SAX, DOM is a read-write API. It can both parse existing XML documents and create new ones. Each XML document is represented as Document object. Documents are searched, queried, and updated by invoking methods on this Document object and the objects it contains. This makes DOM much more convenient when random access to widely separated parts of the original document is required. However, it is quite memory intensive compared to SAX, and not nearly as well suited to streaming applications.
JAXP
JAXP, the Java API for XML Processing, bundles SAX and DOM together along with some factory classes and the TrAX XSLT API. (TrAX is not a general purpose XML API like SAX and DOM. I'll get to it in Chapter 17.) It is a standard part of Java 1.4 and later. However, it is not really a different API. When starting a new program, you ask yourself whether you should choose SAX or DOM. You don’t ask yourself whether you should use SAX or JAXP, or DOM or JAXP. SAX and DOM are part of JAXP.
JDOM
JDOM is a Java-native tree-based API that attempts to remove a lot of DOM’s ugliness. The JDOM mission statement is, “There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck,” and for the most part JDOM delivers. Like DOM, JDOM reads the entire document into memory before it begins to work on it; and the broad outline of JDOM programs tends to be the same as for DOM programs. However, the low-level code is a lot less tricky and ugly than the DOM equivalent. JDOM uses concrete classes and constructors rather than interfaces and factory methods. It uses standard Java coding conventions, methods, and classes throughout. JDOM programs often flow a lot more naturally than the equivalent DOM program. I think JDOM often does make the easy problems easier; but in my experience JDOM also makes the hard problems harder. Its design shows a very solid understanding of Java, but the XML side of the equation feels much rougher. It’s missing some crucial pieces like a common node interface or superclass for navigation. JDOM works well (and much better than DOM) on fairly simple documents with no recursion, limited mixed content, and a well-known vocabulary. It begins to show some weakness when asked to process arbitrary XML. When I need to write programs that operate on any XML document, I tend to find DOM simpler despite its ugliness.
Download XML - Application Programming Interfaces (APIs)
0 comments:
Post a Comment