XML / Database Links
Copyright 2001-2011 by Ronald Bourret
1.0 Technical Papers
1.1 Overview of XML and Databases
XML and Databases by Ronald Bourret. A widely read paper describing how XML is used with both relational and native XML databases.
Storing XML in Databases (PDF) by Michael Champion. A well-balanced discussion of when it is appropriate to use relational and native XML databases. Includes a nice argument as to why non-normalization of XML documents is appropriate in some cases.
XML and Databases? Follow Your Nose by Leigh Dodds. An excellent summary of some of the issues involved in deciding between an XML-enabled database and a native XML database.
XML and Relational Storage - Are they mutually exclusive? by George Lapis. An excellent article describing "hybrid databases" - that is, databases that offer both XML-enabled and native XML storage - as well as when it is appropriate to use each type of storage.
Teamgeist! Entwicklung neuer Standards fuer XML-Datenbanken by Lars Martin. Eine kurze Beschreibung von XML und Datenbanken. Auf Deutsch.
Putting XML in context with hierarchical, relational, and object-oriented models by David Mertz. A description of hierarchical, relational, and object-oriented databases, with a discussion of how XML relates to each.
An Exploration of XML in Database Management Systems by Dare Obasanjo. A general overview of XML and databases, with product-specific discussions about SQL Server, Oracle, DB2, Tamino, dbXML, and others. Also discusses XPath and XQuery.
XML database tools for Linux by Uche Ogbuji. A brief introduction to how XML and databases work together, followed by a summary of tools available on Linux.
What Do You Know About Databases And XML?
With XML, is the Time Right for Hierarchical DBs?
Discussions on slashdot.org. A couple of useful discussions about XML and databases, focusing on the relative usefulness of native XML databases and relational databases. Without the usual slashdot flames.
1.2 Relational databases and XML
1.2.1 General papers
Mapping Objects To Relational Databases (PDF) by Scott W. Ambler. An excellent paper on object-relational mappings, which are often used when mapping XML to a relational database.
Mapping DTDs to Databases by Ronald Bourret. Describes two ways to map DTDs to databases. The table-based mapping requires XML documents to have tabular structure. The object-relational mapping treats XML documents as object serializations and then maps these objects to the database.
Mapping W3C Schemas to Object Schemas to Relational Schemas by Ronald Bourret. Similar to the previous paper, but discusses W3C XML Schemas instead.
Modeling Relational Data in XML by Lee Buck. Describes an object-relational mapping from a relational database to XML.
Integrating XML and Databases (PDF) by Thomas Erl. An overview of using XML with relational databases, with a little bit about native XML databases. Very nice sections on the differences between XML and relational databases and the architectures of applications that use XML and relational databases.
XML APIs for databases by Ramnivas Laddad. Describes how to implement SAX and DOM over result sets (the "table model"). Includes source code.
XML Structures for Existing Databases: Eleven rules for moving a relational database to XML by Kevin Williams, et. al. An overview of the issues involved in exposing relational data as XML. From the Wrox Press Book 'Professional XML Databases'.
1.2.2 Product-specific papers
Kiss the Middle-tier Goodbye with SQL Server Yukon by Klaus Aschenbrenner
An article describing the XML features in SQL Server Yukon (aka SQL Server 2005), including a native XML data type, XQuery support (including use of relational columns in XQuery queries), and extensions to XQuery for updates.
Integrating XML and Relational Database Technologies by Shaku Atre and Giovanni Guardalben (registration required). A white paper discussing XML and relational databases, focusing on the support in Oracle, SQL Server, DB2, and HiT Allora. (The latter is a middleware product from the company that commissioned the paper.) The discussion of Allora is a bit biased, although not unreasonably so. The discussions of Oracle, SQL Server, and DB2 are excellent, describing both the strengths and weaknesses of those products.
XML and Database Mapping in .NET by Niel Bornstein. An example-driven description of the XML and database capabilities in .NET. Not very deep, but a good starting point.
Storing XML in Relational Databases by Igor Dayen. A survey of the techniques used to transfer data between XML documents and various relational databases (Oracle, SQL Server, DB2, etc.), including sample code.
Using XML and Relational Databases with Perl by Kip Hampton. Discusses the DBIx::XML_RDB and XML::XMLtoDBMS modules.
DB2 9 XML performance characteristics by Irina Kogan, Matthias Nicola, and Berni Schiefer. Benchmark results for the XML data type in DB2 v9 using a simulated brokerage house. Very well written.
Oracle XML DB from Oracle
A description of the database features in Oracle, with links a number of papers and other XML technologies.
1.3 Native XML databases
1.3.1 General papers
Native XML database overview by Scott Carroll (Bluestream Database Software). An e-mail summary of what native XML databases are and how they can be used.
Going Native by Marc Cyrenne. A brief description of what a native XML database is, followed by a discussion of when to use one.
XML Databases (PDF) by Michael Kay. A very nice article discussing what native XML databases are and why you might want to use them.
Age of the XML Database (PDF) by David McGoveran. Another nice article discussing what native XML databases are and why you might want to use them.
Normalizing XML, Part 1
Normalizing XML, Part 2 by Will Provost
A discussion of how to apply the rules of normalization to XML schemas. This is relevant when designing schemas for storing data in native XML databases. I don't agree with everything Will says, but there is surprisingly little information about this topic on the Web, so I think this article is a useful introduction.
All XML Databases are Equal by John Snelson. A nice summary of the kinds of features found in native XML databases and the variations in how those features are actually implemented.
Introduction to Native XML Databases by Kimbro Staken. A very nice introduction to native XML databases.
Native XML databases: a bad idea for data? by Kevin Williams. A brief discussion about why structured data fits best in relational databases. The subtitle "Taking a look at the pros and cons of native XML databases" is entirely misleading.
1.3.2 Use case papers
Going Native: Use cases for native XML databases by Ronald Bourret
An article describing the most common use cases for native XML databases, based on interviews with roughly half the native XML database vendors, as well as a handful of customers.
Medlane/XMLMARC Update: From MARC to XML Database by Kevin Clarke. A discussion about storing MARC (bibliographic) records as XML in a variety of databases. Interesting discussion of the pros and cons of various systems.
Use Cases for Native XML Servers by Bryan Quinn. A brief introduction to some of the main use cases for native XML databases: managing document-centric XML, data integration, and mid-tier data cache.
Use XML databases to empower Java Web services by Robert Smik, Ash Parikh, and Ajay Ramachandran. A discussion of using native XML databases for enterprise information integration (EII) via XQuery and Web services, and as a mid-tier data store. Includes detailed examples from the health field. Note that the syntax for calling Web services in XQuery is proprietary.
1.3.3 Product-specific papers
Getting Reacquainted with dbXML 2.0 by Tom Bradford. An overview of the features in dbXML 2.0, which is a complete overhaul/rewrite of dbXML 1.0. (dbXML 1.0 is now known as Xindice.)
Berkeley DB XML: An Embedded XML Database by Paul Ford. A nice introduction to Berkeley DB XML. Includes code samples.
An Introduction to the XML:DB API by Kimbro Staken. Another nice piece from Kimbro, this time introducing the XML:DB API, a JDBC-like API for native XML databases. Includes code samples.
1.4 Query languages
See also XML Query engines on the XML Database Products page.
What is XQuery? by Per Bothner. A comprehensive, if brief, introduction to XQuery.
XQuery 1.0: Primer (PDF) by Julianne Harbarth. An in-depth introduction to XQuery with lots of examples.
X Is for XQuery by Jason Hunter. An introduction that covers the main parts of XQuery.
XQuery Tricks and Traps by Jason Hunter. A nice list of ways to avoid tripping yourself up in XQuery while working with sequences, data types, effective boolean values, and sorting. Advanced.
An introduction to XQuery by Howard Katz. More history and overview than details, but interesting reading nonetheless.
Comparing XSLT and XQuery by Michael Kay. A nice comparison of XSLT and XQuery.
XQuery API for Java (XQJ) from Oracle, IBM, and others. JSR for an API for using XQuery. That is, JSR 225 is to XQuery what JDBC is to SQL.
XML with Virtuoso and SQLX by Tom Bradford. A nice, example-driven introduction to SQL/XML. Not specific to Virtuoso.
SQL in, XML out by Jonathan Gennick. A brief introduction to the XMLELEMENT, XMLATTRIBUTES, XMLFOREST, and XMLAGG functions in SQL/XML, with a bit of Oracle-specific material at the end.
SQL/XML Tutorial: SQL/XML, XQuery, and Native XML Programming Languages by Jonathan Robie. A well-written introduction to SQL/XML, with additional discussions of XQuery.
1.5 Service-oriented architectures (SOAs)
The power behind the SOA repository by Ash Parikh, Robert Smik, and Premal Parikh. A discussion of the use of native XML databases in SOAs to store both application data and service metadata. Includes XQuery examples.
1.6 Academic papers
There are almost as many academic papers about XML and databases as there are characters in Unicode. Here are a few of the ones I liked around 2000-2005. For more papers, search DBLP for the keyword "XML".
A Transaction Model for XML Databases (PDF) by Stijn Dekeyser, Jan Hidders, and Jan Paredaens. Two related proposals for node-level locking in native XML databases. Both schemes essentially annotate locks with the query defining the path from the locked node to the target node of the query. This allows other transactions to determine whether they conflict with transactions already holding locks. (The actual locking scheme is somewhat more limited than this, but the idea is roughly the same.) For related papers, see Concurrency Control for Semi-Structured Data and XML and the Publications 2004 page of the DBMS group at TU Kaiserslautern.
Structured Information Retrieval using XML by Daniel Egnor and Robert Lord. A description of how XYZFind indexes XML documents for later querying. Interesting if you're curious how a native XML database can be built using indexed files.
Updating XML (PDF) by Patrick Lehti. A proposal for an update syntax for XQuery. The proposal "is based on an update extension proposal from members of the XQuery working group" and clearly builds on the work by Tatarinov, et al (see below).
XML Parsing: A Threat to Database Performance (PDF) by Matthias Nicola. An interesting paper discussing the current performance limitations of XML parsers and how this is likely to affect high-performance database applications.
Efficiently Publishing Relational Data as XML Documents (PDF) by Shanmugasundaram, Shekita, Barr, Carey, Lindsay, Pirahesh, and Reinwald. A very complete paper discussing strategies for retrieving relational data and building XML documents from it. Includes experimental results to verify the conclusions.
A General Technique for Querying XML Documents using a Relational Database System (PDF) by Shanmugasundaram, Shekita, Kiernan, Krishnamurthy, Viglas, Naughton, and Tatarinov. "A technique for querying XML documents using a relational database system, which (a) enables the same query processor to be used with most [relational-to-XML mappings], and (b) allows users to query seamlessly across relational data and XML documents." Includes implementation experience.
XTABLES: Bridging Relational Technology and XML (PDF) by J. Funderburk, G. Kiernan, J. Shanmugasundaram, E. Shekita, C. Wei. Discusses an implementation of XQuery over relational databases using a default table-based mapping. Using a default mapping saves the user from having to define virtual XML documents over the database and (presumably) allows the system to process queries more efficiently, since joins must be expressed explicitly in the query. Note that XTABLES was formerly known as XPERANTO.
Updating XML (PDF) by Tatarinov, Ives, Halevy, and Weld. A nice proposal to add an UPDATE clause to FLWOR statements in XQuery.
Analysis and Evaluation of a Native XML Database by Ken Wenker. An in-depth description of how the Neocore native XML database works. Very briefly, NeoCore constructs strings corresponding to all paths in the document, then hashes these strings. It also hashes all attribute values and PCDATA values. Queries (including retrieving whole documents) are therefore executed as a series of hashtable lookups. (NeoCore uses a proprietary hash technology, which is also described in the paper.)
JSR 170: Java Content Repository An API for accessing content management systems. The API views content as a graph (hierarchy with links) of data, where branch nodes may be folders, documents, or document fragments, and data is stored in leaf nodes. While data may be XML, it is not required to be. The API provides methods for connecting to content management systems and executing XPath queries against the content.
JSR 225: XQuery API for Java (XQJ) An API for executing XQuery queries and retrieving their results. This fulfills a role similar to JDBC. Public draft now available. Proposed by Oracle and IBM.
XML representation of a relational database A complete description of the "table model" of an XML document.
1.8 Collections of papers
XML Query Home page of the W3C XML Query Working Group. Contains links to papers about and products that implement XQuery.
QL '98 Position Papers A collection of papers from the 1998 Query Languages workshop. For history buffs.
For a long time, there was a surprising lack of books about XML and databases. That is true no more, so be sure to check the shelves of your local or Web bookstore for titles I've missed.
1.9.1 Books: General
Data on the Web: From Relations to Semistructured Data and XML by Abiteboul, Suciu, and Buneman. A discussion of semi-structured databases, especially as they relate to XML.
XML and SQL: Developing Web Applications by Daniel Appelquist. How to build database-driven XML Web sites. Appears to focus on rolling your own software, but does include a chapter on SQL Server.
XML Data Management: Native XML and XML-Enabled Database Systems by Chaudri, Rashid, Zacari, et al. Based on the preface, this appears to be an excellent discussion of XML and databases, including sections on native XML databases and XML-enabled databases. Includes both practical and theoretical discussions by many of the leading players in the XML/database world.
Succeeding with Object Databases: A Practical Look at Today's Implementations with Java and XML by Chaudhri and Zicari (editors). A collection of academic papers about object-oriented and object-relational databases and XML. Probably of most interest to people who want to know what's going on under the covers.
Document Engineering by Robert J. Glushko and Tim McGrath. An interesting book on designing business documents, such as those that use XML to exchange data. The basic premise of the book is two-fold: talk to everybody and look at everything before designing your business model, then construct documents as views over that model. [Ed. -- I was one of the technical reviewers of this book.]
Designing XML Databases by Mark Graves. Appears to discuss both native XML databases and XML-enabled databases. The emphasis appears to be on providing enough background information that you can roll your own, although commercial systems are discussed as well. Also covers schema design.
Open Source XML Database Toolkit: Resources and Techniques for Improved Development by Liam Quin. A discussion of Open Source tools that you can use to integrate XML and databases. Also discusses some popular commercial tools.
XML Databases and the Semantic Web by Bhavani Thuraisingham. Discusses XML, semi-structured databases, and the semantic Web. I assume the semi-structured databases that are discussed include native XML databases.
1.9.2 Books: Oracle
Professional Oracle 8i Application Programming with Java, PL/SQL and XML by Awai, et al. Another look at how to use XML with Oracle 8i. From WROX.
Building Oracle XML Applications by Steve Muench. An in-depth look at how to use XML with Oracle 8i. Written by Oracle's XML evangelist.
1.9.3 Books: SQL Server
Professional SQL Server 2000 XML by Burke, Ferguson, Gosnell, et al. A detailed look at how to use XML with SQL Server 2000, including the FOR XML extension, the OPENXML function, XDR and XML Schemas, XPath, XML bulk loading, and case studies.
Pro SQL Server 2008 XML (Expert's Voice) by Michael Coles. A detailed description of the XML features in SQL Server 2008, including the FOR XML clause, the XML data type, XML Schema collections, XQuery, and indexing XML.
XML and SQL Server 2000 by John Griffin. A detailed look at how to use XML with SQL Server 2000, including the FOR XML extension, the OPENXML function, XDR Schemas, XPath, XSLT, and IIS.
The Guru's Guide to SQL Server Stored Procedures, XML, and HTML by Ken Henderson. Primarily discusses SQL Server stored procedures, but does have several chapters on the XML features of SQL Server.
Programming Microsoft SQL Server 2000 with XML by Graeme Malcolm. An in-depth look at how to use XML with SQL Server 2000, including using XML with ADO.
SQL Server 2000 XML Distilled by Kevin Williams, Jeni Tennison, et al. A detailed look at how to use XML with SQL Server 2000, including the FOR XML extension, the OPENXML function, annotated schemas, XPath, XML bulk loading, and future support.
1.9.4 Books: Other products
Integrating XML with DB2 XML Extender and DB2 Text Extender (PDF) by IBM Redbooks. Good discussion of how to use XML with DB2 through the XML Extender and Text Extender.
XML for DB2 Information Integration by IBM Redbooks. Another discussion of how to use XML with DB2. This book covers the XML Extender, Net Search (Text) Extender, SQL/XML, the XML Wrapper, MQ Series, and WebSphere Studio. Also covered are XML and DBMS schema design issues and bulk processing of XML documents, neither of which is specific to DB2. [Editor's note: I am one of the authors of this book.]
DB2 pureXML Cookbook: Master the Power of the IBM Hybrid Data Server by Matthias Nicola and Pav Kumar-Chatterjee. A comprehensive guide to pureXML -- the XML capabilities in IBM DB2.
eXist: A NoSQL Document Database and Application Platform by Erik Siegel and Adam Retter. A complete guide to eXist, the most popular open source native XML database, by one of the leading developers and a consultant who uses eXist extensively.
XML-Based Integration with XAware by Kirstan Vandersluis. XA-Suite from XAware is a data integration product that uses XML as its data transport. The first part of the book discusses data integration issues, such as Enterprise Application Integration (EAI), Business Process Management (BPM), Enterprise Information Integration (EII), Service-Oriented Architectures (SOA), and Data Warehousing, and applies to all readers. (Chapter 4 gives a particularly good overview of the entire data integration landscape, including message queues, RPC, transaction monitors, Web Services, and ETL tools.) The second part of the book discusses XA-Suite as a tool for data integration.
XQuery from the Experts: A Guide to the W3C XML Query Language by Katz, Chamberlin, Draper, Fernandez, Kay, Robie, Rhys, Simeon, Tivy, and Wadler. A set of essays about XQuery and the thoughts behind it, from the minds of the people who developed it. Includes some tutorial material, but not designed as a definitive way to learn the language.
XQuery: Search Across a Variety of XML Data by Priscilla Walmsley. A very complete introduction to XQuery. Buy it.
1.9.6 Buecher: Auf Deutsch
XML und Datenbanken. Die Schnittstellen von Access und SQL Server professionell nutzen von Uwe Hess. Beschreibt wie man XML Features in SQL Server u. Access benutzt.
Datenbanken und XML. Konzepte, Anwendungen, Systeme von Kazakos, Schmidt, u. Tomczyk. Eine Ueberblick von XML und Datenbanken. Ich habe das Buch nicht gesehen, aber es ist mir gut vorgeschlagen. Inhaltsverzeichnis inkluiert theoretische und praktische Teilen. Das Buch beschreibt auch Produkte: relationale Datenbanken mit XML Unterstuetzung, reine XML Datenbanken, und XML/Datenbank Middleware.
XML und Datenbanken. XML- Dokumente effizient speichern und verarbeiten. von Meike Klettke u. Holger Meyer. Vollstaendig Ueberblick von XML und Datenbanken. Das Buch beschreibt wie man benutzt XML mit relationale Datenbanken, wie reine XML Datenbanken gebaut sind, und XML Anfragesprachen. Das Buch ist konzeptuel geschrieben, aber es inkluiert benutzvolle Kurzbeschreibungen von viele Produkten. Klar und verstehbar geschrieben.
XML und Datenbanken. Konzepte und Systeme. von Harald Schoening. Wie XML und Datenbanken zusammenarbeiten ist mit Produkten illustriert: Oracle, DB2, u. SQL Server (relationale Datenbanken) und Tamino und eXcelon (reine XML Datenbanken). Auch Kapiteln auf XML Schemasprachen und XML Anfragesprachen.
Evaluierung der SQL/XML:2006-Standardkonformitaet von ausgewaehlten Datenbanksystemen von Michael Wagner. Eine Erklaerung von SQL/XML und eine kurze Blick auf die XML Unterstuetzung in Oracle, Microsoft SQL Server, und MySQL.
2.0 Non-Technical Papers
2.1 Magazine articles
Native XML databases boost e-business transaction speeds by Maggie Biggs. A high-level article discussing the use of native XML databases on the middle tier as an application integration tool.
XML-Native Databases by David F. Carr. A discussion of the differences between native XML databases and XML-enabled databases.
XML Enters the DBMS Arena by Edmund X. DeJesus. A high-level discussion about XML and databases, focusing on the difference between native XML databases and XML-enabled databases.
Find a home for your XML data by Mark Leon. An overview of native XML databases.
Benchmark characteristics courtesy of Matthias Nicola. Micro-benchmarks are designed to exercise a particular part of a language, while application benchmarks are designed to simulate real-world applications.
MemBeR: XQuery Micro-Benchmark Repository by Afanasiev, Manolescu, Michiels, and others. Repository of XQuery micro-benchmarks. Users can submit results and new benchmarks. Includes document generator and engine for running benchmarks. Characteristics:
- Contains 34 benchmarks (parameterized queries) as of January, 2007
- Queries test XPath, XQuery, and scalability
- Most document sizes about 11MB
The Michigan Benchmark by Kanda Runapongsa, Jignesh M. Patel, and H.V. Jagadish. A micro-benchmark designed to help developers improve/tune XML processing engines. Characteristics:
- "Wisconsin-like" micro-benchmark
- 1 large XML document with recursive structure
- 45 queries
- Tests loading, insert/update/delete, and point, bulk, and structural updates
- Methodically defined fanout, node, and data distributions
- Systematically excercises navigation and predicate evaluation
TPoX by Nicola, Kogan, Raghu, Liu, and Schiefer. Application benchmark for databases that support XQuery or SQL/XML. Includes a document generator, XML Schemas, queries, driver for running the benchmark, and documentation. The driver currently supports DB2, but can be modified to support other databases. Characteristics:
- Models data-centric financial transactions using FIXML documents
- Eight document sets between 100MB and 1PB
- Document sizes from 1 to 20KB
- Transactions are 70% reading and 30% insert/update/delete
- 7 read-only queries (XQuery or SQL/XML)
- 2 insert, 2 delete, and 6 update queries (XQuery Update syntax)
- Driver uses threads to simulate one to 1 million users
XBench - A Family of Benchmarks for XML DBMSs by Benjamin Bin Yao, M. Tamer Ozsu, and John Keenleyside. A family of application benchmarks based on real-world XML documents. Characteristics:
- Considers data-centric vs. text-centric XML and single vs. multiple documents. This results in four variations -- data-centric/single document, data-centric/multiple documents, text-centric/single document, and text-centric/multiple documents.
- ~20 XQuery statements per variation
- Total database sizes between 10MB and 10GB
- No update or load tests
XMach-1: A Benchmark for XML Data Management by Timo Boehme and Erhard Rahm. Application benchmark for native XML databases and XML-enabled databases. Characteristics:
- 10,000 to 10,000,000 documents
- Documents are text-centric
- 8 XQuery statements
- Tests both database and application server
XMark: An XML Benchmark Project by Busse, Carey, Florescu, Kersten, Manolescu, Schmidt, and Waas. A benchmark for native XML databases. Characteristics:
- 1 large XML document (up to 1GB)
- 20 XQuery statements
- Models an online auction
- Read-only workload
- No performance metric defined
XOO7 by Bressan, Dobbie, Lacroix, Lee, Li, Nambiar, and Wadhwa. Application benchmark. An XML version of the OO7 benchmark. Includes a paper about applying XOO7 to Lore, Kweelt, and "an ORDBMS" and a paper comparing XOO7, XMach-1, and Xmark. Characteristics:
- 23 XQuery statements
- Single recursive XML document
XPathMark by Massimo Franceschet. Micro-benchmark for XPath engines, plus a set of functional tests for testing completeness and correctness of XPath coverage. Characteristics:
- Performance test uses XML documents from XMark
- Performance test has six groups of up to 15 XPath expressions
- Performance tests available as XPath, XQuery, and XSLT
- Functional test uses small, data-centric document
- Functional test has five groups of up to 26 XPath expressions
Performance Comparison of Xyleme Zone Server and Oracle 9i XML Support (PDF) by Thierry Bigaignon. A brief comparison of the performance of Xyleme Zone Server (a native XML database) and Oracle 9i, using Oracle's test data and queries. (Xyleme Zone Server is 5.8 to 19.5 times faster.)
3.2 Mailing lists
For: General discussion of XQuery. A good place to ask syntax questions.
To (un)subscribe: See http://www.x-query.com/mailman/listinfo/talk
To post: mailto:email@example.com
Native XML Database by Matthias Nicola. An on-going discussion of XML support in DB2, as well as related subjects.
3.4 Product lists
XML Database Products by Ronald Bourret. A (now out-of-date) list of more than 150 products you can use with XML and databases.
XML Data Binding Resources by Ronald Bourret (with lots of help from Sean Sullivan and Brendan Macmillan). A (now out-of-date) list of almost 50 products you can use for XML data binding, along with links to papers on the subject.
Free XML Software by Lars Garshol.
kleffel's datenbank datenbank (in German). A list of more than 200 databases, including XML databases.
3.5 Other link pages
The XML Cover Pages: XML and Databases by Robin Cover. A page with links to papers and software for XML and databases.