XML and Databases

Ronald Bourret

Consulting, writing, and research in XML and databases

XML Guild

Member

XML / Database Links

Copyright 2001-2010 by Ronald Bourret


1.0 Technical Papers


1.1 Overview of XML and Databases

1.2 Relational databases and XML

1.2.1 General papers

1.2.2 Product-specific papers

  • Kiss the Middle-tier Goodbye with SQL Server Yukon by Klaus Aschenbrenner
    XML IN YUKON by Bob Beauchemin
    XML Best Practices for Microsoft SQL Server "Yukon" by Shankar Pal
    Three good articles describing the XML features in SQL Server Yukon (aka SQL Server 2005), including a native XML data type, XQuery support (including use of relational columns in XQuery queries), and extensions to XQuery for updates. Lots of code examples.

  • Integrating XML and Relational Database Technologies by Shaku Atre and Giovanni Guardalben (registration required). A white paper discussing XML and relational databases, focusing on the support in Oracle, SQL Server, DB2, and HiT Allora. (The latter is a middleware product from the company that commissioned the paper.) The discussion of Allora is a bit biased, although not unreasonably so. The discussions of Oracle, SQL Server, and DB2 are excellent, describing both the strengths and weaknesses of those products.

  • XML and Database Mapping in .NET by Niel Bornstein. An example-driven description of the XML and database capabilities in .NET. Not very deep, but a good starting point.

  • Storing XML in Relational Databases by Igor Dayen. A survey of the techniques used to transfer data between XML documents and various relational databases (Oracle, SQL Server, DB2, etc.), including sample code.

  • Using XML and Relational Databases with Perl by Kip Hampton. Discusses the DBIx::XML_RDB and XML::XMLtoDBMS modules.

  • DB2 9 XML performance characteristics by Irina Kogan, Matthias Nicola, and Berni Schiefer. Benchmark results for the XML data type in DB2 v9 using a simulated brokerage house. Very well written.

  • Oracle XML FAQ by Frank Naude. Brief FAQ describing the XML capabilities in Oracle 8i and Oracle 9i. Includes code examples. Note that this FAQ is not from Oracle.

  • Oracle XML DB
    Oracle SQL/XML
    Oracle XQuery
    Oracle XML Developer's Kit from Oracle
    The first page is a collection of papers describing Oracle's support for XML, including the XMLType data type, storage options, indexes, and the Oracle XML DB Repository. The latter three pages contain papers that go into more detail about Oracle's support for SQL/XML (including Oracle-specific extensions), XQuery, and the Oracle XML Developer's Kit, which includes a parser, XSLT processor, XML Schema validator, XML data binding tool, and the XML SQL Utility.

1.3 Native XML databases

1.3.1 General papers

  • Native XML database overview by Scott Carroll. A nice e-mail summary of what native XML databases are and how they can be used.

  • Going Native by Marc Cyrenne. A brief description of what a native XML database is, followed by a discussion of when to use one.

  • XML Databases (PDF) by Michael Kay. A very nice article discussing what native XML databases are and why you might want to use them.

  • Age of the XML Database (PDF) by David McGoveran. Another nice article discussing what native XML databases are and why you might want to use them.

  • Normalizing XML, Part 1
    Normalizing XML, Part 2 by Will Provost
    A discussion of how to apply the rules of normalization to XML schemas. This is relevant when designing schemas for storing data in native XML databases. I don't agree with everything Will says, but there is surprisingly little information about this topic on the Web, so I think this article is a useful introduction.

  • All XML Databases are Equal by John Snelson. A nice summary of the kinds of features found in native XML databases and the variations in how those features are actually implemented.

  • Introduction to Native XML Databases by Kimbro Staken. A very nice introduction to native XML databases.

  • Native XML databases: a bad idea for data? by Kevin Williams. A brief discussion about why structured data fits best in relational databases. The subtitle "Taking a look at the pros and cons of native XML databases" is entirely misleading.

  • XML Persistence A discussion of the problems of using native XML databases for data-centric XML documents.

1.3.2 Use case papers

  • Going Native: Use cases for native XML databases by Ronald Bourret
    An article describing the most common use cases for native XML databases, based on interviews with roughly half the native XML database vendors, as well as a handful of customers.

  • Medlane/XMLMARC Update: From MARC to XML Database by Kevin Clarke. A discussion about storing MARC (bibliographic) records as XML in a variety of databases. Interesting discussion of the pros and cons of various systems.

  • Use Cases for Native XML Servers by Bryan Quinn. A brief introduction to some of the main use cases for native XML databases: managing document-centric XML, data integration, and mid-tier data cache.

  • Use XML databases to empower Java Web services by Robert Smik, Ash Parikh, and Ajay Ramachandran. A discussion of using native XML databases for enterprise information integration (EII) via XQuery and Web services, and as a mid-tier data store. Includes detailed examples from the health field. Note that the syntax for calling Web services in XQuery is proprietary.

1.3.3 Product-specific papers

1.4 Query languages

1.4.1 XQuery

See also XML Query engines on the XML Database Products page.

1.4.2 SQL/XML

1.5 Service-oriented architectures (SOAs)

  • The power behind the SOA repository by Ash Parikh, Robert Smik, and Premal Parikh. A discussion of the use of native XML databases in SOAs to store both application data and service metadata. Includes XQuery examples.

1.6 Academic papers

There are almost as many academic papers about XML and databases as there are characters in Unicode. Here are a few of the ones I like. For more papers, search DBLP for the keyword "XML".

  • A Transaction Model for XML Databases (PDF) by Stijn Dekeyser, Jan Hidders, and Jan Paredaens. Two related proposals for node-level locking in native XML databases. Both schemes essentially annotate locks with the query defining the path from the locked node to the target node of the query. This allows other transactions to determine whether they conflict with transactions already holding locks. (The actual locking scheme is somewhat more limited than this, but the idea is roughly the same.) For related papers, see Concurrency Control for Semi-Structured Data and XML and the Publications 2004 page of the DBMS group at TU Kaiserslautern.

  • Persistent DOM: An architecture for XML repositories in relational databases (PDF) by Richard Edwards and Sian Hope. A simple technique for mapping the DOM to relational tables and retrieving documents or fragments with a minimal number of joins. Interesting if you're curious how a native XML database can be built on a relational database.

  • Structured Information Retrieval using XML by Daniel Egnor and Robert Lord. A description of how XYZFind indexes XML documents for later querying. Interesting if you're curious how a native XML database can be built using indexed files.

  • PDOM: Lightweight Persistency Support for the Document Object Model (PDF) by Huck, Macherius, and Fankhauser. A description of the internal architecture used by the PDOM (Infonyte) native XML database. Interesting if you're curious how a native XML database can be built on a proprietary, model-based storage engine (as opposed to an object-oriented or relational database).

  • Updating XML (PDF) by Patrick Lehti. A proposal for an update syntax for XQuery. The proposal "is based on an update extension proposal from members of the XQuery working group" and clearly builds on the work by Tatarinov, et al (see below).

  • XML Parsing: A Threat to Database Performance (PDF) by Matthias Nicola. An interesting paper discussing the current performance limitations of XML parsers and how this is likely to affect high-performance database applications.

  • Jayavel Shanmugasundaram's Publications A very nice collection of papers, including the following:

    • Efficiently Publishing Relational Data as XML Documents (PDF) by Shanmugasundaram, Shekita, Barr, Carey, Lindsay, Pirahesh, and Reinwald. A very complete paper discussing strategies for retrieving relational data and building XML documents from it. Includes experimental results to verify the conclusions.

    • A General Technique for Querying XML Documents using a Relational Database System (PDF) by Shanmugasundaram, Shekita, Kiernan, Krishnamurthy, Viglas, Naughton, and Tatarinov. "A technique for querying XML documents using a relational database system, which (a) enables the same query processor to be used with most [relational-to-XML mappings], and (b) allows users to query seamlessly across relational data and XML documents." Includes implementation experience.

    • XTABLES: Bridging Relational Technology and XML (PDF) by J. Funderburk, G. Kiernan, J. Shanmugasundaram, E. Shekita, C. Wei. Discusses an implementation of XQuery over relational databases using a default table-based mapping. Using a default mapping saves the user from having to define virtual XML documents over the database and (presumably) allows the system to process queries more efficiently, since joins must be expressed explicitly in the query. Note that XTABLES was formerly known as XPERANTO.

  • Updating XML (PDF) by Tatarinov, Ives, Halevy, and Weld. A nice proposal to add an UPDATE clause to FLWOR statements in XQuery.

  • Analysis and Evaluation of a Native XML Database by Ken Wenker. An in-depth description of how the Neocore native XML database works. Very briefly, NeoCore constructs strings corresponding to all paths in the document, then hashes these strings. It also hashes all attribute values and PCDATA values. Queries (including retrieving whole documents) are therefore executed as a series of hashtable lookups. (NeoCore uses a proprietary hash technology, which is also described in the paper.)

1.7 Specifications

  • XQuery The main XQuery specification. Contains links to other XQuery specifications.

  • SQL/XML (PDF) Final committee draft (July, 2004) of the ISO effort to integrate SQL and XML. For more information, see the SQLX.org Web site.

  • XML:DB API A language- and database-independent API for accessing XML documents stored in native XML databases. This fulfills a role similar to ODBC or JDBC and uses a separate query language.

  • JSR 170: Java Content Repository An API for accessing content management systems. The API views content as a graph (hierarchy with links) of data, where branch nodes may be folders, documents, or document fragments, and data is stored in leaf nodes. While data may be XML, it is not required to be. The API provides methods for connecting to content management systems and executing XPath queries against the content.

  • JSR 225: XQuery API for Java (XQJ) An API for executing XQuery queries and retrieving their results. This fulfills a role similar to JDBC. Public draft now available. Proposed by Oracle and IBM.

  • XUpdate - XML Update Language An XML language for updating XML documents. Used in several native XML databases, but not limited to use in any particular environment. See also XUpdate Use Cases.

  • SiXDML (Simple XML Data Manipulation Language) A language and API for performing both DDL and DML in native XML and XML-enabled databases. Links to other documents and a reference implementation built on top of Xindice are available on the SixDML SourceForge page.

  • XML representation of a relational database A complete description of the "table model" of an XML document.

1.8 Collections of papers

1.9 Books

For a long time, there was a surprising lack of books about XML and databases. That is true no more, so be sure to check the shelves of your local or Web bookstore for titles I've missed. Also, check the relevant categories at XMLBookstore.com and All the XML Books in Print.

1.9.1 Books: General

  • Data on the Web: From Relations to Semistructured Data and XML by Abiteboul, Suciu, and Buneman. A discussion of semi-structured databases, especially as they relate to XML.

  • XML and SQL: Developing Web Applications by Daniel Appelquist. How to build database-driven XML Web sites. Appears to focus on rolling your own software, but does include a chapter on SQL Server.

  • XML Data Management by Chaudri, Rashid, Zacari, et al. Based on the preface, this appears to be an excellent discussion of XML and databases, including sections on native XML databases and XML-enabled databases. Includes both practical and theoretical discussions by many of the leading players in the XML/database world.

  • Succeeding with Object Databases: A Practical Look at Today's Implementations with Java and XML by Chaudhri and Zicari (editors). A collection of academic papers about object-oriented and object-relational databases and XML. Probably of most interest to people who want to know what's going on under the covers.

  • Document Engineering by Robert J. Glushko and Tim McGrath. An interesting book on designing business documents, such as those that use XML to exchange data. The basic premise of the book is two-fold: talk to everybody and look at everything before designing your business model, then construct documents as views over that model. [Ed. -- I was one of the technical reviewers of this book.]

  • Designing XML Databases by Mark Graves. Appears to discuss both native XML databases and XML-enabled databases. The emphasis appears to be on providing enough background information that you can roll your own, although commercial systems are discussed as well. Also covers schema design.

  • Open Source XML Database Toolkit: Resources and Techniques for Improved Development by Liam Quin. A discussion of Open Source tools that you can use to integrate XML and databases. Also discusses some popular commercial tools.

  • XML Databases and the Semantic Web by Bhavani Thuraisingham. Discusses XML, semi-structured databases, and the semantic Web. I assume the semi-structured databases that are discussed include native XML databases.

  • Professional XML Databases by Kevin Williams (editor). A detailed look at how to integrate XML into relational databases. Well written.

1.9.2 Books: Oracle

1.9.3 Books: SQL Server

  • Professional SQL Server 2000 XML by Burke, Ferguson, Gosnell, et al. A detailed look at how to use XML with SQL Server 2000, including the FOR XML extension, the OPENXML function, XDR and XML Schemas, XPath, XML bulk loading, and case studies.

  • XML and SQL Server 2000 by John Griffin. A detailed look at how to use XML with SQL Server 2000, including the FOR XML extension, the OPENXML function, XDR Schemas, XPath, XSLT, and IIS.

  • The Guru's Guide to SQL Server Stored Procedures, XML, and HTML by Ken Henderson. Primarily discusses SQL Server stored procedures, but does have several chapters on the XML features of SQL Server.

  • Programming Microsoft SQL Server 2000 with XML by Graeme Malcolm. An in-depth look at how to use XML with SQL Server 2000, including using XML with ADO.

  • SQL Server 2000 XML Distilled by Kevin Williams, Jeni Tennison, et al. A detailed look at how to use XML with SQL Server 2000, including the FOR XML extension, the OPENXML function, annotated schemas, XPath, XML bulk loading, and future support.

1.9.4 Books: Other products

  • Integrating XML with DB2 XML Extender and DB2 Text Extender by IBM Redbooks. Good discussion of how to use XML with DB2 through the XML Extender and Text Extender.

  • XML for DB2 Information Integration by IBM Redbooks. Another discussion of how to use XML with DB2. This book covers the XML Extender, Net Search (Text) Extender, SQL/XML, the XML Wrapper, MQ Series, and WebSphere Studio. Also covered are XML and DBMS schema design issues and bulk processing of XML documents, neither of which is specific to DB2. [Editor's note: I am one of the authors of this book.]

  • XML-Based Integration with XAware by Kirstan Vandersluis. XA-Suite from XAware is a data integration product that uses XML as its data transport. The first part of the book discusses data integration issues, such as Enterprise Application Integration (EAI), Business Process Management (BPM), Enterprise Information Integration (EII), Service-Oriented Architectures (SOA), and Data Warehousing, and applies to all readers. (Chapter 4 gives a particularly good overview of the entire data integration landscape, including message queues, RPC, transaction monitors, Web Services, and ETL tools.) The second part of the book discusses XA-Suite as a tool for data integration.

1.9.5 Buecher: Auf Deutsch


2.0 Non-Technical Papers


2.1 Magazine articles

2.2 Analyst reports


3.0 Miscellany


3.1 Benchmarks

Benchmark characteristics courtesy of Matthias Nicola. Micro-benchmarks are designed to exercise a particular part of a language, while application benchmarks are designed to simulate real-world applications.

  • MemBeR: XQuery Micro-Benchmark Repository by Afanasiev, Manolescu, Michiels, and others. Repository of XQuery micro-benchmarks. Users can submit results and new benchmarks. Includes document generator and engine for running benchmarks. Characteristics:

    • Contains 34 benchmarks (parameterized queries) as of January, 2007
    • Queries test XPath, XQuery, and scalability
    • Most document sizes about 11MB

  • The Michigan Benchmark by Kanda Runapongsa, Jignesh M. Patel, and H.V. Jagadish. A micro-benchmark designed to help developers improve/tune XML processing engines. Characteristics:

    • "Wisconsin-like" micro-benchmark
    • 1 large XML document with recursive structure
    • 45 queries
    • Tests loading, insert/update/delete, and point, bulk, and structural updates
    • Methodically defined fanout, node, and data distributions
    • Systematically excercises navigation and predicate evaluation

  • TPoX by Nicola, Kogan, Raghu, Liu, and Schiefer. Application benchmark for databases that support XQuery or SQL/XML. Includes a document generator, XML Schemas, queries, driver for running the benchmark, and documentation. The driver currently supports DB2, but can be modified to support other databases. Characteristics:

    • Models data-centric financial transactions using FIXML documents
    • Eight document sets between 100MB and 1PB
    • Document sizes from 1 to 20KB
    • Transactions are 70% reading and 30% insert/update/delete
    • 7 read-only queries (XQuery or SQL/XML)
    • 2 insert, 2 delete, and 6 update queries (XQuery Update syntax)
    • Driver uses threads to simulate one to 1 million users

  • XBench - A Family of Benchmarks for XML DBMSs by Benjamin Bin Yao, M. Tamer Ozsu, and John Keenleyside. A family of application benchmarks based on real-world XML documents. Characteristics:

    • Considers data-centric vs. text-centric XML and single vs. multiple documents. This results in four variations -- data-centric/single document, data-centric/multiple documents, text-centric/single document, and text-centric/multiple documents.
    • ~20 XQuery statements per variation
    • Total database sizes between 10MB and 10GB
    • No update or load tests

  • XMach-1: A Benchmark for XML Data Management by Timo Boehme and Erhard Rahm. Application benchmark for native XML databases and XML-enabled databases. Characteristics:

    • 10,000 to 10,000,000 documents
    • Documents are text-centric
    • 8 XQuery statements
    • Tests both database and application server

  • XMark: An XML Benchmark Project by Busse, Carey, Florescu, Kersten, Manolescu, Schmidt, and Waas. A benchmark for native XML databases. Characteristics:

    • 1 large XML document (up to 1GB)
    • 20 XQuery statements
    • Models an online auction
    • Read-only workload
    • No performance metric defined

  • XOO7 by Bressan, Dobbie, Lacroix, Lee, Li, Nambiar, and Wadhwa. Application benchmark. An XML version of the OO7 benchmark. Includes a paper about applying XOO7 to Lore, Kweelt, and "an ORDBMS" and a paper comparing XOO7, XMach-1, and Xmark. Characteristics:

    • 23 XQuery statements
    • Single recursive XML document

  • XPathMark by Massimo Franceschet. Micro-benchmark for XPath engines, plus a set of functional tests for testing completeness and correctness of XPath coverage. Characteristics:

    • Performance test uses XML documents from XMark
    • Performance test has six groups of up to 15 XPath expressions
    • Performance tests available as XPath, XQuery, and XSLT
    • Functional test uses small, data-centric document
    • Functional test has five groups of up to 26 XPath expressions

  • Performance Comparison of Xyleme Zone Server and Oracle 9i XML Support (PDF) by Thierry Bigaignon. A brief comparison of the performance of Xyleme Zone Server (a native XML database) and Oracle 9i, using Oracle's test data and queries. (Xyleme Zone Server is 5.8 to 19.5 times faster.)

3.2 Mailing lists

3.3 Blogs

These blogs all discuss XML, databases, and XML query languages to some extent.

3.4 Product lists

3.5 Audio/video

3.6 Other link pages


Copyright (c) 2010, Ronald Bourret