XML and Databases

Ronald Bourret

Consulting, writing, and research in XML and databases

XML Guild

Member

XML Database Products:

Native XML Databases

Copyright 2000-2010 by Ronald Bourret

Overview

As defined by the members of the XML:DB mailing list, a native XML database is one that:

Defines a (logical) model for an XML document -- as opposed to the data in that document -- and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. Examples of such models are the XPath data model, the XML Infoset, and the models implied by the DOM and the events in SAX 1.0.
Has an XML document as its fundamental unit of (logical) storage, just as a relational database has a row in a table as its fundamental unit of (logical) storage.
Is not required to have any particular underlying physical storage model. For example, it can be built on a relational, hierarchical, or object-oriented database, or use a proprietary storage format such as indexed, compressed files.

Native XML databases fall into two broad categories:

  • Document-based storage Store the entire document in text or binary form and provide some sort of database functionality in accessing the document. A simple strategy for this might store the document as a BLOB in a relational database or as a file in a file system and provide XML-aware indexes over the document. A more sophisticated strategy might store the document in a custom, optimized data store with indexes, transaction support, and so on.

  • Node-based storage Store individual nodes of the document (such as the DOM or a variant thereof) in an existing or custom data store. For example, this might map the DOM to relational tables such as Elements, Attributes, Entities or store the DOM in pre-parsed form in a data store written specifically for this task. This includes the category formerly known as "Persistent DOM Implementations".

There are two major differences between the two strategies. First, document-based storage can exactly round-trip the document, down to such trivialities as whether single or double quotes surround attribute values. Node-based storage can only round-trip documents at the level of the underlying document model. This should be adequate for most applications but applications with special needs in this area should check to see exactly what the database supports.

The second major difference is speed. Document-based storage obviously has the advantage in returning entire documents or fragments. Node-based storage probably has the advantage in combining fragments from different documents, although this does depend on factors such as document size, parsing speed (for document-based storage), and retrieval speed (for node-based storage). Whether it is faster to return an entire document as a DOM tree or SAX events probably depends on the individual database, again with parsing speed competing against retrieval speed.

Native XML databases differ from XML-enabled databases in three main ways:

  • Native XML databases can preserve physical structure (entity usage, CDATA sections, etc.) as well as comments, PIs, DTDs, etc. While XML-enabled databases can do this in theory, this is generally not (never?) done in practice.

  • Native XML databases can store XML documents without knowing their schema (DTD), assuming one even exists. Although XML-enabled databases could generate schemas on the fly, this is impractical in practice, especially when dealing with schema-less documents.

  • Native XML databases are accessed through XML-based technologies, such as XQuery, XPath, or the DOM, and use XML-specific APIs, such as XQJ or XML:DB API. XML-enabled databases access data through non-XML technologies, such as SQL and ODBC.

For more information about native XML databases, see "Native XML Databases".

Related categories

  • Content (Document) Management Systems: Applications built on top of native XML databases and/or the file system for content/document management. Include features such as check-in/check-out, versioning, and editors.
  • XML-Enabled Databases: Databases with extensions for transferring data between XML documents and themselves. Some of these also support native XML storage.

Products


4Suite, 4Suite Server

Developer: FourThought
URL: http://4suite.org/index.xhtml
License: Open Source
Database type: Object-oriented
Entry last updated: November, 2001

From the Web site:

"4Suite is a collection of Python tools for XML processing and object database management. It is an integrated package of several components: 4DOM, DbDom, 4XPath, 4XSLT, 4XPointer, 4XLink, 4RDF and 4ODS. All the tools are designed for maximum extensibility through custom Python code."

"DbDOM is a DOM implementation that is stored persistently in a 4ODS object database in order to support arbitrarily large documents and applications with specialized persistent needs."

"4XPath is a library implementing the W3C's full XPath 1.0 specification for indicating and selecting portions of an XML document."

"4Suite Server is a platform for XML processing. It is an XML data repository with a rules-based engine. It supports DOM access, XSLT transformation, XPath and RDF-based indexing and query, XLink resolution and many other XML services. It also provides support services such as distributed transactions, and access control lists. It supports remote, cross-platform and cross-language access through CORBA, SOAP and HTTP GET."

BaseX

Developer: University of Konstanz
URL: http://basex.org/
License: Open Source
Database type: Proprietary
Entry last updated: February, 2010

BaseX is a native XML database written in Java. It stores XML documents using a proprietary format, in which information about the document's node tree is stored in a set of tables. Documents are stored in a "database", which can contain one or more documents. A database provides the context for evaluating queries, such as where to find the documents specified by the XQuery functions fn:doc and fn:collection.

BaseX supports XQuery, the XQuery Update Facility, and XQuery Full Text. Extensions to XQuery include executing dynamically constructed XQuery expressions, try/catch expressions, and directly calling Java functions. Extensions to XQuery Full Text include fuzzy queries, which perform approximate text matches.

BaseX support four types of indexes. Text and attribute indexes index PCDATA and attribute values and are used to resolve comparisons in predicates. Both support exact matches and ranges. Full-text indexes are used during full-text searches and can be configured for the types of searches to be used, such as case sensitivity, stemming, and diacritics. Path indexes are used to resolve location path searches. Note that applications must explicitly update indexes after an update query; this is to balance the need for updates with the need for access.

BaseX can be run as a standalone query application or in client/server mode. As a server, it supports transactions, user management, and logging. It can be run from the command line, through a GUI tool, or by calling an API. Three different APIs are supported: XQJ, XML:DB, and a proprietary API that accepts command-line commands programmatically.

Berkeley DB XML

Developer: Oracle (formerly owned by Sleepycat Software)
URL: http://www.oracle.com/us/products/database/berkeley-db/xml/index.html
License: Open Source
Database type: Key-value
Entry last updated: January, 2009

Berkeley DB XML is a native XML database built on top of Berkeley DB, adding an XML parser, XML indexes, and an XQuery engine. From Berkeley DB it inherits a storage engine, transaction support (including XA), automatic recovery, and other features.

Berkeley DB XML stores XML documents in logical groupings called containers, which are the same as collections in other native XML databases. Users can specify a number of properties on a per-container basis, including whether to validate documents, whether to store documents whole or as individual nodes, and what indexes to create (element, attribute, or metadata). It is worth noting that schemas are specified through schemaLocation hints in documents, rather than being associated with the container as a whole.

In addition to storing XML documents, Berkeley DB XML can store non-XML documents (in the underlying Berkeley DB data store) as well as metadata for XML documents. The latter take the form of user-specified property-value pairs and can be queried as if they were child elements of the root element, although they do not actually appear in stored XML documents.

Berkeley DB XML supports XQuery as its query language. It provides an API for updating documents that uses XQuery to identify a set of nodes to update and allows users to append a new child to a target node, insert a new node before or after a target node, remove a target node, rename a target node, or change the value of a target node. Berkeley DB XML supports the XQuery Update Facility and performs updates at the node level.

Like Berkeley DB, Berkeley DB XML is a library that is linked directly to applications, rather than being used in client-server mode. It has a command-line interface as well as APIs for C++, Java, Tcl, Perl, Python, and PHP. Third-party APIs for other languages are available as well.

DBDOM

Developer: K. Ari Krupnikov
URL: http://dbdom.sourceforge.net/
License: Open Source
Database type: Relational
Entry last updated: November, 2000

DBDOM is an implementation of the DOM over a relational database, using a fixed set of tables to store the DOM tree in the database. DOM methods are implemented as stored procedures, but also included are a set of adapters so these can be called from Java. The initial version will run on PostgreSQL, with later versions planned for Oracle, DB2, and Microsoft SQL Server.

dbXML

Developer: dbXML Group
URL: http://sourceforge.net/projects/dbxml-core/
License: Open Source
Database type: Proprietary
Entry last updated: March, 2004

dbXML is a native XML database that supports four different data stores. The first of these is a proprietary data store that uses B trees. The second is an in-memory data store, which is used for temporary storage and whose contents are deleted when the database is stopped. The third is the file system. And the fourth is a mapping to a relational database (it is not known what mapping is used). Which data store to use is specified on a per-collection basis.

dbXML has a directory-like collection model. Collections can be nested and can store documents that match any XML schema, although it is suggested that a single collection contain documents that match a single XML schema to simplify indexing and querying. Collections can also contain binary streams (such as JPEG files), although a collection cannot contain both binary streams and XML documents.

dbXML supports XPath, XSLT, XUpdate, and full-text searches. XPath and XSLT have been extended for use against collections, and both XSLT and full-text searches can be run against the results of an XPath query.

dbXML supports three different types of indexes. Name indexes index element and attribute names. Value indexes index element and attribute values and support strings, characters, bytes, integers, real numbers, and booleans. Full text indexes index tokens in element and attribute values. They are case insensitive and actually index word stems; for example, both "happening" and "happen" have the same stem. Individual indexes are associated with a particular collection and users specify what to index according to an XPath-like expression.

dbXML supports triggers. These are user-specified Java classes that can be fired before or after an insert, update, delete, or data retrieval. They can be used to do such things as validating documents on insertion or modifying documents on retrieval. dbXML also supports extensions to the server through Java classes.

dbXML supports transactions and security. Security options are no security, a single user name and password for the entire database, and role-based security (the default).

dbXML has four different APIs: the direct API, the client API, XML:DB, and Web services. The direct API allows applications to work directly with dbXML. The client API allows applications to use dbXML in client-server fashion. This can be done where both client and server are in the same process, or through XML-RPC. The Web services interface supports both XML-RPC and REST (URL encoding).

dbXML comes with a set of command line tools for connecting to the database, managing collections, indexes, security, triggers, and extensions, and storing and retrieving documents.

NOTE: dbXML is a complete rewrite of the code that became Xindice and is therefore different from that product.

Dieselpoint

Developer: Dieselpoint, Inc.
URL: http://www.dieselpoint.com/xmlsearch.html
License: Commercial
Database type: None (indexes only)
Entry last updated: January, 2007

Dieselpoint is a search engine, not a native XML database. It indexes documents and data specified by the user and then executes queries against those indexes. Dieselpoint is written in Java will run in any J2EE-compliant application server. It is designed to be called from a user-written application and its API is designed with such applications in mind. For example, it returns metadata about search results so applications can dynamically create user interfaces relevant to those results. Applications can call Dieselpoint through a Java API, a JSP front end, JDBC, or XML. For users who do not want to write their own application, Dieselpoint ships with a number of sample applications (including a product catalog application) and a generic, JSP-based user interface that is "suitable for common uses".

Dieselpoint indexes documents and data retrieved by a crawler from Web sites, directories, and databases. It can index documents (XML, HTML, PDF, Microsoft Office), databases (via JDBC), and flat files (comma-separated, tab-separated, and so on). Data in other formats can be indexed via calls to a user-implemented API. The indexer extracts data in the form of attributes, such as document metadata, XML elements and attributes, and database columns. A preprocessor allows user-written code to modify, categorize, or reject items before they are indexed.

Dieselpoint uses a proprietary query language, which supports full-text and parametric searching. (Parametric searching limits a search to a particular attribute, such as a title, part number, or description.) Search clauses can be joined in any way by AND, OR, NOT, and parentheses, and can include comparisons (=, >, >=, <, <=, <>), wildcards, and regular expressions. Full-text features include stemming, thesauruses, stop words, misspellings, relevance, hit highlighting, and support for 40 languages and 140 dialects. Search results can be returned as a JDBC result set or XML document and can be sorted by relevance or attribute value.

XML-specific features include searching by element or attribute and by XML path. (The indexer preserves the XML hierarchy.) The query engine can return complete documents or fragments, and can also treat fragments of a document (headed by a particular element name) as separate documents. Dieselpoint understands both ECCMA (an XML language for catalogs) and Dublin Core and provides special processing for both. In addition, it can handle XMP metadata (RDF documents) embedded in PDF documents.

Dieselpoint includes an adminstrator for performing such tasks as managing indexes, defining data sources, and scheduling the crawler. It also contains a Web server and servlet container.

DOMSafeXML

Developer: Ellipsis
URL: http://www.ellipsis.nl/content/products.htm (English)
http://www.ellipsis.nl/mambo/index.php?option=com_content&task=view&id=29&Itemid=68&lang=NL (Dutch)
License: Commercial
Database type: File system(?)
Entry last updated: June, 2004

DOMSafeXML is a main-memory native XML database that stores XML files on disk and monitors them "for external changes". It supports XPath, SAX, DOM level 2, and the XML:DB API, with language bindings for COM, C++, Java, and C#. DOMSafeXML supports multi-user access through transactions and node-level locking and comes with a built-in Web server.

EMC Documentum xDB (formerly X-Hive/DB)

Developer: EMC Corporation
URL: http://www.emc.com/products/detail/software/documentum-xdb.htm
License: Commercial
Database type: Proprietary
Entry last updated: May, 2005

[Ed. -- EMC acquired X-Hive in 2007. Other than name changes, this description has not been updated for any changes that may have been made since that acquisition.]

EMC Documentum xDB is a native XML database that includes support for XQuery, XPath, XML Schemas, DOM Level 3, XSLT, and XSL-FO, as well transactions, user- and group-level access control, JAAS (Java Authentication and Authorization Service), replication, load balancing across multiple servers, and BLOB storage. Additional features include:

o Indexes. EMC Documentum xDB supports element name, value, full-text indexes, and custom, as well as "library, ID attribute, and context-conditioned" indexes. Full-text indexes use a proprietary indexing mechanism; these indexes can be searched from XQuery through the xhive:fts (full-text search) function. In addition, users can integrate their own full-text index engines. Custom indexes are based on a user-implemented DOM NodeFilter.

o Linking. A link engine that implements XLink and XPointer supports bi-directional links, link-bases, and link management.

o External data. The JDBC Bridge can retrieve a snapshot of relational data through JDBC. The data is converted to XML using a table model and can be integrated into other documents.

o WebDAV. Remote clients can directly access collections and documents in the database through WebDAV.

o SOAP. Applications can store and retrieve documents, execute XQuery queries, retrieve XML schemas, and so on through SOAP.

o Custom JSP tags. A tag library for calling EMC Documentum xDB through Java Server Pages.

o J2EE Resource Adapater. An implementation of J2EE Resource Adapter allows EMC Documentum xDB applications to use the transaction management facilities of an EJB application server.

o Versioning. Both linear and branched versioning (multiple versions of the same document) are supported.

In addition, an implementation of XUpdate (from the XML:DB Initiative) that uses Lexus may be downloaded from the X-Hive Web site.

eXist

Developer: Wolfgang Meier, et al
URL: http://exist.sourceforge.net
License: Open Source
Database type: Proprietary
Entry last updated: January, 2009

eXist is a native XML database that uses a proprietary data store (B+ trees and paged files). It can be run as a standalone database server, as an embedded Java library, or in the servlet engine of a Web application. Documents are stored in a hierarchy of collections. Collections can contain child collections and do not constrain documents to any particular schema or document type.

eXist supports all of XQuery (including the optional axes) except for schema import/validation and wildcard searches on the following and preceding axes. Extensions to XQuery include full-text searches, calls to the XML:DB API (such as to store query results in the database), executing dynamically constructed XQuery statements, applying XSLT stylesheets to a node, working with HTTP, and executing arbitrary Java methods. eXist ships with a number of user-written function modules, including modules for compression; date/time and math operations; working with images, files, and geospatial data; using HTTP, email, XSL-FO, and JNDI; working with relational databases; and performing XML differencing. Updates are supported through the extensions to XQuery proposed by Patrick Lehti and XUpdate. eXist also provides partial support for XInclude and XPointer.

eXist supports the XML:DB API, with additional services for preparing and executing XQuery statements, managing users, managing multiple database instances, and querying indexes. DOM and SAX are supported for documents returned through the XML:DB API and live (read-only) DOM trees are available when eXist is used as an embedded database. eXist can also be called via XML-RPC, a REST-style Web services API, SOAP, WebDAV, and the Atom Publishing Protocol.

eXist automatically indexes all element and attribute structure. By default, it creates full text indexes over all text and attribute values, but users can turn this off for selected parts of a document. It supports concurrent read/write access for multiple users. Security is provided through Unix style access permissions for both users and groups, which can be applied to both collections and individual documents. XQuery access control is supported through the eXtensible Access Control Markup Language (XACML). Transactions are supported, but "limited to the functionality needed for crash recovery"; they are not visible to applications.

Of note, eXist has complete documentation.

eXtc

Developer: M/Gateway Developments Ltd.
URL: http://gradvs1.mgateway.com/main/index.html?path=extcMenu
License: Free
Database type: Post-relational (Cache, GT.M)
Entry last updated: January, 2009

eXtc is a native XML database that runs on top of the Cache and GT.M databases. (While eXtc and GT.M are free, Cache is not.) It provides an implementation of the DOM, storing documents as DOM objects. Because eXtc is written in Cache ObjectScript (Cache's extension of the MUMPS programming language), the DOM implementation inherits the features of that language, such as integral support for transactions, multi-user access, and remote access. DOM level 2 is supported, with additional support for the Abstract Schemas and Load and Save features of DOM level 3.

eXtc supports XPath queries over individual documents, as well as SQL queries against the Cache tables used to store DOM trees. The latter is useful for SQL-like queries -- for example, finding the value of all CustomerName elements -- as well as finding the IDs of documents that match a certain criteria.

eXtc also supports XSL-FO, SVG (through a library of functions for creating SVG documents), WebDAV, and HTTP access through Cache's WebLink module. In addition, it includes client and server implementations of SOAP and WSDL, which allow Cache applications to be exposed as Web services and to be integrated with Web services.

Extraway

Developer: 3D Informatica
URL: http://cms.3di.it/tecnologia/highway-xml (Italian)
http://www.3di.it/nuovo/html/extraway.pdf (English, PDF)
License: Commercial
Database type: Files plus indexes
Entry last updated: August, 2005

From the company:

"Extraway is a native XML database that is designed to preserve data as "Information Units", which are objects defined by the database administrator and which use an XML data model. By default, information units correspond to the root element of a document, but can also correspond to lower-level elements. For example, a single document may contain multiple bibliographic records where each record is considered to be a single information unit."

"Extraway supports synchronous and asynchronous use cases. In the synchronous case, client applications create and retrieve information units. The engine receives XML information units and stores them on a private area of the file system. The storage process is configured by the database administrator, who determines the aggregation policy, the directory, and the file name settings. For example, the administrator can organize the file system by year, department, and classification and arrange information units of a given type in the same XML document."

"During the aggregation process, Extraway adds system metadata for each unit: time/username of submission/modification, an integrity hash, and the current versions of the organization's structure, DTD/XML Schema, classification plan, and client software."

"Extraway also manages multimedia objects, storing them in the same directory as the corresponding information unit. For the most common formats, the text is extracted and indexed, and metadata like size, resolution, compression, duration, and hash code are extracted and added to the information unit."

"In the asynchronous case, Extraway monitors XML files at a given network address and simply indexes them."

"Indexes are built relative to the root of each information unit. Each index has a specific type (string, number, or date) and can index the entire content of an element or attribute or individual tokens within that content. Indexes can also concatenate individual values, or be created from custom code that is run at index time. Indexes can be built on demand, at regular intervals, or, by default, in response to events such as adding, deleting, and modifying information units."

"Extraway has a proprietary query language that allows users to combine path expressions with boolean operators. Path expressions can be declared and aliased at design time. This allows path details to be abstracted, which is useful when merging different paths in the same model and in handling different DTD / XML Schema versions. The language supports equality, arithmetic, and full-text operators. Query results are returned as a named result set, which can be browsed, refined, referenced in other queries, or made persistent."

"Extraway can also be queried with SQL, which is used for its joining operators: when the selected columns return non-repetitive values, the result is a trivial table having information unit as rows; in the other cases the result is an array of information unit identifiers."

"Extraway includes a GUI-based DTD editor, an administration console, Java, .Net, and Web services APIs, and a JDBC driver. Other features include support for thesauruses and encryption of XML units stored on the file system."

Infonyte DB (formerly PDOM)

Developer: Infonyte
URL: http://www.infonyte.com/prod_db.html
License: Commercial
Database type: Proprietary (Model-based)
Entry last updated: February, 2002

Infonyte is a native XML database built from two components: Infonyte PDOM (Persistent DOM) and Infonyte XQL (which can be purchased separately). Infonyte PDOM is a storage engine for storing the XML documents in indexed, binary files. The PDOM engine provides an implementation of the DOM over these files. The DOM implementation can handle arbitrarily large documents because it swaps DOM nodes to disk as needed. It includes defragmentation and garbage collection facilities, commit points (for writing the in-memory tree to disk), file compression with gzip, and thread-safe operation.

Infonyte XQL is an implemenation of XQL with extensions for variables, multi-document queries, restructuring of query results, full-text search, result construction, and sequencing. It is addressable through HTTP.

Version 2.0 (feature complete and in beta as of December, 2001) includes support for XPath, DOM Level 2, and XSLT. XSLT support is provided by Xalan, which has been extended to work directly on data in the database.

Ipedo XML Database

Developer: Ipedo
URL: http://www.ipedo.com/html/ipedo_xml_db.html
License: Commercial
Database type: Proprietary
Entry last updated: January, 2009

Ipedo XML Database is a native XML database written in Java that uses a proprietary data store. It can store both XML and non-XML content. Documents are arranged into collections.

Ipedo XML Database supports XQuery, with proprietary extensions for performing updates, full-text searches, and calling Java functions. It also supports XSLT and the DOM. It supports value (date, string, integer, and float), path, and full-text indexes (including "full or partial document indexing, keyword and phrase searches, Boolean operators, fielded search, parametric search, word stemming and breaking, stop word lists and term weighting, proximity search, and wildcard search"). Users can specify which fields to index using XPath.

Schemas are supported through the use of XML Schemas and DTDs. (DTDs are converted to XML Schemas when added to the database.) Schemas can be applied to individual documents(?) or collections, and documents are validated when they are inserted, updated, or migrated to a new schema. A Schema Manager tracks schemas and supports schema versioning, including the ability to roll back schemas.

In addition to schema versioning, Ipedo XML Database supports document versioning, which can be applied on a per-document basis. Users can check versioned documents in and out, and retrieve older versions of documents by version number, date, or label. They can also view the entire version history of a document.

Other features include Java, .NET, SOAP, and WebDAV interfaces, a cache manager, transactions (including two-phase commit), security (including user and group access control lists and support for LDAP), replication, clustering and load-balancing, and GUI-based development and administration tools.

Lore

Developer: Stanford University
URL: http://www-db.stanford.edu/lore/home/index.html
License: Research
Database type: Semi-structured
Entry last updated: November, 2000

Semi-structured data is data with more structure than a conversation, but less structure than a telephone book. A good example is a resume (curriculum vitae). While virtually all resumes include a name, address, and telephone number, only some will include an email address, Web site, or FAX number. Most will include a list of previous jobs, but others might include only a list of university courses. Depending on the profession, there might be a list of software used or licenses held.

XML is well-suited to storing semi-structured data and shares a feature common to many semi-structured data models: it is self-describing. That is, it carries a certain amount of metadata with the data. In the case of XML, this is in the form of element type and attribute names. The legality of well-formed documents mirrors another feature found in many semi-structured data models: the data model is not required to have a definitive schema, and the model can be extended at will by the addition of new fields.

Lore is a database designed for storing semi-structured data. Although it predates XML, it has recently been migrated for use as an XML database. It includes a query language (Lorel), multiple indexing techniques, a cost-based query optimizer, multi-user support, logging, and recovery, as well as the ability to import external data. Because Lore is designed for use with semi-structured data, XML documents without DTDs can be easily stored.

An interesting feature of Lore is a DataGuide, which is a "structural summary of all paths in the database". Unlike structured databases, in which the structure is specified first and data is added according to that structure, data is entered first into Lore and the structure is then summarized. The resulting information useful for query processing.

The Lore executables are "available for public use". Source code may be available in some circumstances.

MarkLogic Server

Developer: Mark Logic Corp.
URL: http://www.marklogic.com/product/marklogic-server.html
License: Commercial
Database type: Proprietary
Entry last updated: February, 2009

MarkLogic Server is a native XML database that stores data in a compressed, preparsed form. It can store XML, text, and binary documents. During input, it converts Word, PowerPoint, Excel, PDF, and HTML documents to XML and can correct "common markup errors." It can also use built-in or third-party tools to mark up phrases in the text, such as people, places, and dates. In addition, predefined or custom XML-based metadata can be associated with each document.

Documents can be organized into directories and collections. Directories provide hierarchical organization, similar to a file system. (On most native XML databases, these are called collections.) Collections are sets of documents that have something in common, such as all documents about dogs or all documents written by a particular author. The topic for a given collection is specified by a URI, which is associated with all documents in the collection. Documents can belong to a directory and to zero or more collections.

MarkLogic Server supports XQuery, with extensions that include try/catch, private functions and variables, functions with side effects (necessary for updates), and a binary node type. It ships with a number of function modules. A full-text search module includes support for querying words and phrases, using wildcards, stemming, spell checking, thesauruses, and diacritics. Other modules include functions for working with geospatial data, HTTP, XInclude, XPointer, mathematics, and more. User-written XQuery modules can be pre-parsed and optimized for better performance.

Updates are supported through a module containing functions for inserting, updating, and deleting documents and nodes. Updates are stored as changes to existing document fragments. Older fragments can be deleted during database cleanup or saved to allow older versions of a document to be queried or recreated.

MarkLogic Server has a single index that combines full-text, values, and XML structure. The index is constructed at load time, and the entire document is indexed. In addition, range indexes can be constructed to improve performance of certain types of queries, such as whether a value is less than 100.

A content processing framework allows developers to create pipelines for processing documents. Pipelines are event-driven, and support operations such as conversion, classification, automated markup, Web services integration, and rendering output. They support both sequential and conditional processing and can be parallelized. For example, pipelines could be used to convert Microsoft Word documents to DocBook or vice versa.

MarkLogic Server has Java and .NET APIs and includes such common database functionality as transactions, triggers, journaling, role-based security, clustering, load balancing, failover, and backup. It includes built-in HTTP and WebDAV servers and supports SOA (SOAP and REST) and Web 2.0 technologies such as AJAX, JSON, and RSS.

MarkLogic Server ships with development, administration, and navigation tools. The latter supports navigation through "facets, tag clouds, heat maps, node edge graphs, temporal exploration, geospatial navigation and traditional pie and bar charts."

A free version of MarkLogic Server with limited storage and processing capabilities is available for personal and non-commercial use.

M/DB:X

Developer: M/Gateway Developments Ltd.
URL: http://gradvs1.mgateway.com/main/index.html?path=mdbx
License: Open Source
Database type: Hierarchical (GT.M)
Entry last updated: July, 2009

M/DB:X is a native XML database built on the GT.M schemaless hierarchical database. It stores documents as DOM trees. (Documents can be created in the database by passing in an XML document or a JSON string; JSON strings are converted to XML documents, then stored in the database.) Each document in the database has a unique, user-assigned name and a system-generated ID. Each node in a document has a unique, system-generated ID. These names and IDs are used to identify documents and nodes when accessing the database.

M/DB:X provides a REST-based API that is based on (and extends) the DOM API. The API can be used to create, access, modify, and query documents. Queries are made with XPath, which can be used to query a single document, all documents, all documents starting with a particular prefix, or a list of documents. Responses are returned as an XML document or a JSON string. The response can contain information about a request (such as whether the request succeeded or failed), or the requested data itself (such as when the request is for the contents of a document). Note that M/DB:X does not support XML namespaces.

Security is provided by a mechanism that is similar to that used by Amazon's SimpleDB. This uses a user ID and digital signatures.

MonetDB/XQuery

Developer: CWI Database Group
URL: http://monetdb.cwi.nl/XQuery/index.html
License: Open Source
Database type: Proprietary
Entry last updated: June, 2010

MonetDB is a database built on a proprietary data store. The data store vertically fragments data, storing data in Binary Association Tables (BATs); each BAT holds the data for a single column. BATs are designed to be held in main memory to increase query speed. The data store is accessed with the MonetDB Assembly Language (MAL). Because queries are compiled into MAL instructions, MonetDB is able to support multiple query languages. Currently, it supports SQL (MonetDB/SQL) and XQuery (MonetDB/XQuery).

MonetDB/XQuery organizes XML documents into collections. Each collection is stored in a fixed set of tables: a main table for XML nodes and supporting tables for text and attribute values, QNames, and so on. Because of the storage overhead required by each collection, it generally requires less memory to store documents in a small set of collections, rather than many collections. However, there is a trade-off with respect to locking and index maintenance, which are performed on a per-collection basis.

By default, collections are read-only. This allows collections to use fully ordered inverted lists as indexes, which increases query performance. Because such indexes need to be rebuilt each time a new document is added, read-only collections are not useful for collections to which documents are frequently added. Updatable collections use hash tables for indexes. These have better maintenance speed, but poorer query performance. The decision whether to use read-only or updatable collections must be made at design time.

MonetDB/XQuery supports XQuery and the XQuery Update Facility. It has incomplete support for the XQuery standard functions, notably the date/time functions. MonetDB/XQuery has a number of extensions to XQuery. These include a pragma for caching expression results; the use of fn:put() to store query results as temporary documents; language extensions for recursion and calls to XRPC (see below); and new XPath axes for stand-off queries. Proprietary functions include functions to add and delete documents, query database metadata (such as collection and document names), retrieve internal IDs that can be used to retrieve nodes directly, and perform full-text queries through PF/Tijah. In addition, MonetDB/XQuery can precompile queries that consist of a call to a single function with atomic parameters.

XRPC is a way to call XQuery functions (including updating functions) on remote servers using SOAP. MonetDB/XQuery has a SOAP client that sends XRPC requests and a SOAP server that interprets XRPC messages, an XQuery extension for invoking XRPC calls from XQuery expressions, and a SOAP schema for XRPC messages. It ships with a Java class that acts as an XRPC server for other XQuery implementations, such as Saxon or Galax. XRPC allows MonetDB/XQuery users to construct XQuery expressions that distribute function calls across multiple MonetDB/XQuery installations or other XQuery implementations. It also allows other applications, such as Web servers, to access data in MonetDB/XQuery through XRPC calls to MonetDB/XQuery.

Applications can call MonetDB/XQuery through three different APIs:

  • MonetDB API (MAPI). This is a low-level API upon which the other APIs are built. It is not to be confused with Microsoft MAPI.

  • XRPC. This is described above. MonetDB/XQuery includes libraries so Java and JavaScript applications can use XRPC to call a single XQuery function.

  • JDBC. This has been extended to accept XQuery expressions. Expression results are returned in a result set that has a single row containing a single string column. The JDBC driver makes no attempt to interpret those results, such as parsing them into individual nodes.

MonetDB/XQuery ships with two clients: a command line client that executes XQuery expressions and a Web-based client that can be used for database administration.

MonetDB/XQuery supports ACID transactions and logging. However, it is not designed for high-performance OLTP applications. It has minimal security support. By default, it is accessible only from the machine on which it is running; it can be made available from other machines. There is a single user with a fixed password.

MonetDB/SQL supports SQL/XML. However, it does not use the capabilities of MonetDB/XQuery. For more information, click here.

myXMLDB

Developer: Mladen Adamovic
URL: http://sourceforge.net/projects/myxmldb/
License: Open Source
Database type: MySQL
Entry last updated: January, 2005

myXMLDB is a native XML database implemented on top of MySQL. It stores documents as BLOBs and can store documents up to 256 MB in size. It supports XPath and XQuery through Saxon and provides a Java implementation of the XML:DB API. A GUI interface is provided through XMLdbGUI.

Natix

Developer: University of Mannheim
URL: http://db.informatik.uni-mannheim.de/natix.html
http://www.dataexmachina.de/natix.html
License: Free / non-commercial
Database type: Proprietary
Entry last updated: May, 2009

Natix is a native XML database that uses a proprietary data store. It indexes documents in two ways: a full-text index that stores XML node information and a structural index that stores parent/child and ancestor/descendant information.

Natix uses XPath as a query language and can return documents using SAX, DOM, WebDAV, or the file system. Node-level updates are supported, apparently through a live DOM tree. Natix supports multi-user access and transactions, with locking performed at a variety of levels (segment, document, or record) as needed. Natix has C++ and Java APIs.

Note that Natix is the engine behind Xyleme XML Server.

ozone

Developer: ozone-db.org
URL: http://ozone-db.org/frames/home/what.html
License: Open Source
Database type: Object-oriented
Entry last updated: March, 2004

From the Web site:

"ozone is a fully featured, object-oriented database management system completely implemented in Java ... ozone includes a fully W3C compliant DOM implementation that allows you to store XML data. You can use any XML tool to provide and access these data. Support classes for Apache Xerces-J and Xalan-J are included."

"Besides the native API, ozone provides a ODMG 3.0 interface. Although not fully ODMG compliant it helps you to port applications to/from ozone."

"ozone does not depend on any back-end database or mapping technology to actually save objects. It contains its own clustered storage and cache system to handle persistent Java objects."

"[ozone] includes the following features:
o multi-user, multi-thread support
o object level access rights
o fully transaction based
o JTA/XA support
o deadlock recognition
o BLOB support
o XML (DOM) support
o ODMG 3.0 support
o Garbage collection"

ozone is part of the Infozone framework.

Qizx

Developer: XMLMind, a division of Pixware
URL: http://www.xmlmind.com/qizx/
License: Commercial
Database type: Proprietary
Entry last updated: March, 2010

Qizx is a native XML database written in Java. It stores data using a representation that is based on the XPath 2.0 data model. The representation uses compression so that a document and its indexes often use less room than the original XML. A set of documents is stored in a library, which is organized as a hierarchical set of collections. Each collection contains XML documents.

By default, Qizx constructs four indexes for each XML document. The element index indexes element names and the hierarchical relationships between elements. Attribute indexes index attribute names and values; they convert attribute values to doubles or dates if possible. Simple content indexes index the names and values of elements that contain a single token, converting values to doubles or dates if possible. The full-text index indexes all tokens in text nodes. Indexes can be customized in a number of ways, such as using different formats for dates and numbers, specifying element- or attribute-specific conversions, providing user-written converters, and configuring full-text indexing.

Libraries, collections, and documents can have metadata associated with them. Metadata is in the form of name/value pairs, and values can have a data type of boolean, integer, double, string, date, XQuery node, or serializable Java object. Qizx defines a number of system properties; users can define properties as well. Properties can be queried through XQuery using Qizx-specific functions.

Qizx supports XQuery, the XQuery Update Facility, and XQuery Full-Text. Qizx includes functions to serialize query results, call XSLT stylesheets, evaluate XQuery expressions constructed at query time, perform pattern matching, parse XML strings, handle documents and collections, and perform transactions. It also extends XQuery to include a try/catch construct. Finally, it can call Java functions directly from XQuery.

Qizx supports concurrent ACID transactions, document and collection locking, backups, and journaling. It has a Java API and can also be run from the command line or a GUI tool.

Qizx Free Engine contains all of the functionality of Qizx, but is limited to 1 GB of storage. Qizx/open is an Open Source version of Qizx that is basically an XQuery engine -- it does not support persistent storage or transactions.

Sedna XML DBMS

Developer: Management Of Data & Information Systems, Institute for System Programming of the Russian Academy of Sciences
URL: http://modis.ispras.ru/sedna/index.htm
License: Free
Database type: Proprietary
Entry last updated: June, 2004

From the Web site:

"Sedna XML DBMS is a native full-featured data management system. It is designed having the following main goals in mind:

o Support for all traditional DBMS features (such as update and query languages, query optimization, fine-grain concurrency control, various indexing techniques, recovery and security),

o Efficient support for unlimited volumes of document-centric and data-centric XML documents that may have a complex and irregular structure,

o Full support for the W3C XQuery language in such a way that the system can be efficiently used for solving problems from different domains such as XML data querying, XML data transformations and even business logic computation (in this case XQuery is regarded as a general-purpose functional programming language)."

"[Features include:]"

o Support for the W3C XQuery language

o Support for a declarative update language

o Native XML data storage structures designed for efficient support for both queries and updates (no underlying relational or another DBMS). The XML data storage is based on descriptive schema (also called DataGuide)

o JAVA API and Scheme API for application development

o Open client/server protocol over sockets that allows implementing APIs for other programming languages

o Administration via easy-to-use command line utilities"

[Ed. -- The declarative update language is based on the extensions to XQuery proposed by the W3C and Patrick Lehti.]

Sekaiju (known as Yggdrasill in Japan)

Developer: Media Fusion
URL: http://www.mediafusion.co.jp/usa/seihin/sekaiju/index.html
License: Commercial
Database type: Proprietary
Entry last updated: February, 2002

Sekaiju is a native XML database that has a proprietary data store designed to store well-formed XML documents. This uses "baskets" and "pockets" (the latter are "like a table" in a relational database), supports two-byte characters, and can store documents that are up to 2 GB in size.

Sekaiju has local and remote COM interfaces, making it accessible via Visual Basic, as well as an HTTP interface. Its query language is XBath, a proprietary language based on XQL. Indexes are automatically built for all nodes (element tags, attributes, and PCDATA) in version 1.0 and for user-specified nodes in version 1.5. Updates are supported only by replacing entire documents.

Transactions are supported through a versioning (log file) mechanism which is designed to minimize conflicts due to reading and writing the same document at the same time. Locking is done at the pocket level in version 1.0 and at the pocket or document level in version 1.5. Rollbacks occur automatically when problems occur in version 1.0; users can also request them directly in version 1.5. Security features include 256-bit encryption and password protection, with access controllable at the pocket level.

Tools include a forms editor, a GUI-based management tool, backup/restore tools, and a toolkit for parallel processing.

SQL/XML-IMDB

Developer: QuiLogic
URL: http://www.quilogic.cc/
License: Commercial
Database type: Proprietary XML store plus relational store
Entry last updated: February, 2003

SQL/XML-IMDB is an in-memory database with both native XML and relational data stores. While both data stores organize data in tables, a "table" in the XML data store is what most other native XML databases refer to as a collection, with one XML document per "row". Tables can be created as either local to a particular process or shared among processes and use compression to minimize memory use. Both types of tables are indexed with TST-trees, which "combine the speed advantage of a hash table with the ordered access of a binary tree", and XML tables are also indexed with "Reverse-Lookup" and "Token-Segment-Build-Up" mechanisms. While there does not appear to be a way to directly store the entire database to disk, individual relational tables can be saved as text files and individual XML tables can be saved as XML documents.

SQL/XML-IMDB supports both XQuery and a "significant subset" of SQL92. This allows XML queries against XML data and SQL queries against relational data. In addition, it extends XQuery so that users can mix XML and relational data. To do this, it allows SQL statements in "any part of [an] XQuery statement where an expression is allowed". From a practical standpoint, it appears that this means SELECT statements are used anywhere except in a RETURN clause and INSERT, UPDATE, and DELETE statements are used in RETURN clauses.

When a SELECT statement is used, the returned result set is mapped to an XML document with a table-based mapping. That is, each row in the result set is mapped to a <row> element and each column is mapped to a child of that <row> element. This allows XQuery variables to be bound to individual rows or columns in the result set. When any type of SQL statement is used, it can include XQuery variables. For example, these can be used in the WHERE clause of a SELECT statement to correlate relational and XML data, or in the VALUES clause of an INSERT statement to transfer data from XML documents to relational tables.

SQL/XML-IMDB also extends XQuery with operators to update XML documents. Supported operations include deleting nodes, renaming nodes, updating node values, replacing nodes, and inserting new nodes before or after existing nodes. Note that these operations cannot be performed inside a transaction.

SQL/XML-IMDB has a proprietary API for interacting with the database. This includes functions for preparing and executing SQL and XQuery statements, beginning, committing, and rolling back transactions, transferring data between internal tables and external files or application variables, and bi-directional iteration over result sets. It is worth noting that XQuery results are returned in result sets just like SQL results. Each item in an XQuery sequence is returned as a separate column, with atomic values mapped to columns of the appropriate data type and nodes mapped to XML strings. When an XQuery statement returns multiple sequences, these are mapped to multiple rows in the result set.

SQL/XML-IMDB can be used from Microsoft .NET, Visual C++, Visual Basic, Office, and IIS/ASP, Borland C++ and Delphi, Perl, and PHP.

Sonic XML Server (formerly eXcelon)

Developer: Sonic Software (who bought eXcelon Corp.)
URL: http://www.sonicsoftware.com/products/sonic_xml_server/index.ssp
License: Commercial
Database type: Object-oriented (ObjectStore). Relational and other data through Data Junction
Entry last updated: April, 2003

[Note: The following is a description of eXcelon's eXtensible Information Server (XIS). Sonic Software bought eXcelon and renamed XIS as Sonic XML Server. It is not known whether the following description is still accurate, since the Sonic Web site has little technical information about Sonic XML Server. Ed. -- 4/04]

eXtensible Information Server is a native XML database built on top of ObjectStore. Documents are parsed on import, with individual nodes stored as hierarchically linked objects. This means that documents do not have to be parsed at run time and large documents can be processed without having to read the entire document into memory. Documents are not required to have a DTD or conform to a predescribed schema. They can be indexed using both value and structural indexes. (Value indexes index element and attribute values; structural indexes index element and attribute names.) They can also be arranged in collections; these can be nested, resulting in a file system metaphor.

eXtensible Information Server supports queries through XQuery, XPath with extension functions, and a proprietary update language (updategrams). Updategrams consist of an XPath to a node, an operation on that node (insert before/after, update, delete), and any data needed to carry out the operation. As an add-on, eXtensible Information Server supports full-text search through the Verity engine. Users pass queries (using Verity's query language) to eXtensible Information Server, which passes them to Verity. Verity executes the queries (using its own indexes) and returns pointers to the relevant documents in eXtensible Information Server.

eXtensible Information Server supports two kinds of server-side functions, which can be written in Java, VB, or COM. The first, known as server extensions, run inside the current transaction and are commonly used in XPath expressions or as triggers associated with inserts, updates, or deletes. These can directly manipulate data in the cache using a server-side DOM implementation. The second, known as servlets, must define their own transaction boundaries and are generally used to implement extensions to the database as a whole, such as a JMS queue.

eXtensible Information Server also supports a concept called "Binder Documents". This allows users to link existing documents as well as to build virtual documents that consist of nothing but links. Links are traversed transparently during queries and update operations, which means that virtual documents can be used to perform queries and updates over multiple documents in a single operation. Note that the application must currently enforce the referential integrity of links (such as through triggers). That is, it must ensure that the document/fragment to which a link points actually exists.

eXtensible Information Server can integrate backend data through the XConnects Integration Engine, which uses the Data Junction Universal Translation Suite. This provides links to many different data formats, including relational databases. Because the links are two-way, it means that backend data sources can be updated through eXtensible Information Server. Users can also write their own XConnects connectors with a Java API, a scripting language, and Stylus Studio (an IDE for XSLT and XML).

eXtensible Information Server supports transactions and can participate in XA transactions. However, it cannot currently manage XA transactions, so the application must coordinate any XA transactions that include eXtensible Information Server and other data sources, such as backend data stores. Other database features include distributed caching, partitioning, online backup and restore, and clustering support.

Finally, eXtensible Information Server comes with Java, COM, and .NET APIs, a JCA-compliant driver, a built-in XSLT processor, and a set of GUI development tools. These include an XML editor, an XSLT editor, a schema editor (XML Schemas and DTDs), an XSLT/Java debugger, an XML-to-XML mapping tool, and tools for mapping backend data to XML documents.

Tamino

Developer: Software AG
URL: http://www.softwareag.com/Corporate/products/wm/tamino/default.asp
License: Commercial
Database type: Proprietary. Relational through ODBC.
Entry last updated: November, 2002

Tamino XML Server is a suite of products built in three layers -- core services, enabling services, and solutions (third-party applications) -- which may be purchased in a variety of combinations. Core services include a native XML database, an integrated relational database, schema services, security, administration tools, and Tamino X-Tension, a service that allows users to write extensions that customize server functionality.

The XML engine uses the Data Map, which describes where the data in a given XML document is stored. This allows individual XML documents to be composed of data from multiple, heterogeneous sources, such as the native XML data store, relational databases, and the file system. Since the connections to external data (made through the X-Node module) are live and bidirectional, Tamino may thus be used to perform heterogeneous joins and updates.

Tamino's XML support includes the DOM, JDOM, SAX, and XML:DB APIs, an extended XPath implementation called X-Query (not to be confused with W3C XQuery, which it predates), full-text retrieval, processing of XML documents with server-side XSL and CSS, and limited support for SOAP. It can store schema-less documents and can use schema information (including a subset of XML Schemas) if it is available.

The internal SQL engine is directly addressable through ODBC, JDBC, and OLE DB. However, when addressed via these APIs, it cannot integrate data from the internal XML data store or from external data sources. (As noted above, the reverse is true. That is, with the help of the X-Node, the XML engine can integrate data from the XML data store and other databases, including the internal SQL engine.)

Enabling services include X-Port, X-Plorer, X-Application, various APIs (mentioned above), X-Node (also mentioned above), and the WebDAV Server. X-Port provides URL-based data transfer through various standard HTTP servers, X-Plorer is a browser-based navigation tool for documents stored in Tamino, and X-Application is a set of JSP tags for accessing Tamino through Web pages.

The WebDAV Server adds namespace management (nested collections or directories), additional properties (such as last-modified, content length or content type) and overwrite protection (persistent locking) to the existing Tamino XML Server functionality. This allows Tamino to serve as a virtual file system (Web folder) where the information can be stored and retrieved using a standard Web browser and the common drag and drop metaphor.

(Note: In spite of rumors to the contrary, Tamino is not built on top of Adabas, a hierarchical database from Software AG. Instead, the Tamino data store was built from the ground up as a native XML database, obviously drawing on the knowledge gained from developing Adabas.)

TeraText DBS (formerly SIM (Structured Information Manager))

Developer: TeraText Solutions (A Division of SAIC)
URL: http://www.teratext.com/get/page/browser/browser?category=Products/TeraText%20DBS,
http://www.saic.com/products/software/teratext/
License: Commercial
Database type: Proprietary
Entry last updated: August, 2002

From the Web site:

"TeraText DBS was designed specifically to store, retrieve and manipulate structured text. ... [It] also indexes all or part of the document using XML standards, enabling complex and comprehensive searching."

"[TeraText DBS is] designed to support XML, SGML, Unicode, Z39.50, HTTP and other industry standards, [and its] components are modular. They can be installed as a suite or as individual modules to work with existing database management and document-authoring systems."

"A content server enables searches on structural elements or document characteristics ... [It] also supports the ... worldwide industry standard protocol for information retrieval, Z39.50."

"A unique applications server provides immediate access to any TeraText database. TeraText DBS supports plug and play modules for complex value added Web services."

"Java , C++ and SOAP APIs as well as WebDav, LDAP, Microsoft Word, PDF and other plug-in adapters are available."

TEXTML Server

Developer: IXIASOFT, Inc.
URL: http://www.ixiasoft.com/default.asp?xml=/xmldocs/webpages/textml-server.xml
License: Commercial
Database type: Proprietary (Document-based)
Entry last updated: June, 2005

TEXTML Server is a native XML database that stores, indexes, and retrieves whole XML documents. A TEXTML Server installation consists of one or more document bases, each of which consists of a document repository and a set of indexes. The document repository is organized as a hierarchical set of collections and can store both XML and non-XML documents. All documents are stored intact. The major difference between XML and non-XML documents is that XML documents are parsed at insert time to create indexes. While non-XML documents are not parsed, they can be associated with an XML document that provides indexable metadata for the non-XML document.

Unlike most native XML databases, the indexes in TEXTML Server effectively form an additional schema layer on top of the documents stored in the database. This is because indexes are defined using one or more XPath expressions. Since these can refer to any document in the database, the effect is that a single index can refer to more than one field. For example, an author index might refer to the AuthorName element in one set of documents and the StoryAuthor attribute in another set of documents. Furthermore, because indexes are defined using XPath expressions, it is possible to transform values and index the transformed values. TEXTML Server supports five different types of indexes: word (token), string, numeric, date, and time.

TEXTML Server has its own, XML-based query language. Queries are defined as a series of boolean tests over specific indexes or the full text of the documents. Tests are generally for equality. In addition, numeric, date, and time indexes support range tests, and word and string indexes support wild-card tests. Tests can then be joined with a number of operators, including And, Or, And Not, Near, adjacency, and frequency. Queries return whole documents and can sort results based on index values, document properties, and hit counts.

In addition to being able to associate XML documents with non-XML documents, TEXTML Server also has a Universal Converter that can convert more than 225 file formats (word processor, spreadsheet, presentation, drawing, bitmap, and so on) to XML. This uses Stellent's Outside In XML Export and extracts document "contents, presentation information, and metadata". Extracted information is stored in a document that uses the SearchML schema, also defined by Stellent. Converted documents can then be searched directly or associated with the original documents as indexing documents.

Other features of TEXTML Server include check-in/check-out, versioning, support for plug-ins that are run at insert time, and COM, Java, .NET, WebDAV, and OLE DB APIs. Security can be specified at the document, collection, or document-base level. System features include fault tolerance, replication, load management, and automated recovery.

TigerLogic XML Data Management Server (XDMS)

Developer: Raining Data
URL: http://www.rainingdata.com/products/tl/abouttl.html
License: Commercial
Database type: Pick
Entry last updated: January, 2003

TigerLogic XML Data Management Server (XDMS) is a database designed to store multiple kinds of data, including "structured, XML, and unstructured information". (Examples of the latter are office documents, email, and graphics.) Data is stored in the TigerLogic Native XML Data Store, which "leverages the Pick Universal Data Model". As XML documents are inserted into the database, an XML Profiler reads the incoming documents and gathers information to build indexes. These are used by the query processor, which supports XPath. TigerLogic XDMS also supports XSLT.

TigerLogic XDMS has a Java API and is also accessible over SOAP, HTTP, and JCA. It supports both DTDs and XML Schemas. Of interest, it supports XA transactions, and provides "on-line backup and recovery".

Timber

Developer: University of Michigan
URL: http://www.eecs.umich.edu/db/timber
License: Open Source (for non-commercial users)
Database type: Shore, Berkeley DB
Entry last updated: October, 2005

Timber is a native XML database that has an architecture "as close as possible to that of a relational database," in order to "reuse, where appropriate, the technologies developed for relational databases over the past several decades". The basis of Timber is "an XML algebra that manipulates sets of ordered, labeled trees". The primary difficulties of such an algebra include the "complex and variable structure of trees in a set, and issues of ordering."

By default, Timber uses Shore as its underlying data store. It can also use Berkeley DB. It supports a number of different types of indexes, including element, attribute, text, inverted, parent, and join indexes.

Timber supports a subset of XQuery. Users can enter queries either as XQuery expressions or as logical or physical query plans using Timber's logical or physical plan syntax. The latter allows advanced users to optimize queries by hand, as well as to perform some operations not supported through XQuery. Timber extends XQuery with functions for deleting nodes or their contents, updating the contents of a node, and inserting elements or attributes. In addition, Timber has a command line option for appending the contents of an XML document to a document already in the database.

Timber has command line, GUI, SOAP, and Web interfaces for performing both queries and administrative functions.

TOTAL XML (formerly Socrates XML)

Developer: Cincom
URL: http://tiger.cincom.com/pages/aboutTotalXML.html
License: Commercial
Database type: Object-relational, external relational through ODBC
Entry last updated: July, 2003

TOTAL XML is a native XML database that can store documents as objects or text. It can store data in its own object-relational data store, an external relational database, or a combination of the two. It is therefore possible to distribute the data for a document across multiple databases. In addition, TOTAL XML can store non-XML data, such as "standard relational data" and BLOBs.

Unlike other native XML databases, the objects used to store XML documents are specific to each DTD, but inherit from an object model that supports the Infoset. Thus, TOTAL XML has characteristics similar to both native XML databases and XML-enabled relational databases. Like an XML-enabled relational database, it is possible to query the data directly with SQL. However, documents cannot be stored until the user has defined a map from the DTD to the database. (A utility is available for generating maps for DTD-less documents.) Like a native XML database, the database stores information about the full physical structure of a document and it is possible to round-trip documents.

TOTAL XML supports three different query languages. XML documents can be queried with XPath or an extended form of SQL, which can query relational data and BLOBs as well. Text data can be queried with regular expressions. TOTAL XML also supports the XML:DB API.

When XML documents are stored as DTD-specific objects, applications can access these objects through the object-oriented capabilities of JDBC 2.0. The objects can be used directly or converted to a DOM tree using the previously defined maps. The DOM is lazily populated, so data is retrieved from the database only when needed. When documents are stored as text, applications can access them with JDBC or ODBC and they are returned as text.

TOTAL XML can integrate data from legacy databases (including VSAM, IMS, IDM, and Adabas) using Striva DETAIL. The integrated data can be live or a copy stored in TOTAL XML.

TOTAL XML ships with a number of tools, including utilities to generate classes and maps from DTDs and administration tools.

Virtuoso

Developer: OpenLink Software
URL: http://www.openlinksw.com/virtuoso/
License: Commercial
Database type: Proprietary. Relational through ODBC
Entry last updated: November, 2000

Virtuoso is a heterogeneous join engine featuring security, transactions (including two-phase commit), and replication. Its query engine supports heterogeneous views, stored procedures, scrollable cursors, and full-text search. It accesses external data sources through ODBC, as well as having its own relational data store.

Virtuoso supports XML in a number of ways. First, it contains a native XML data store, which is non-relational and can store and index XML documents in parsed or unparsed form. Second, it can transfer data from relational databases to XML documents (although not the other direction), using the same mapping found in the FOR XML clause in Microsoft SQL Server. Third, it includes an implementation of XPath. Although this only works on "native" XML data, relational data can be included by first transferring it to XML. Finally, it includes support for XSLT, executing stored procedures through SOAP, and WebDAV.

[February, 2002] Virtuoso has a demo implementation of XQuery that runs over its database. Of interest, this can query virtual documents, such as those built at run time from a relational database.

Xindice (see also dbXML)

Developer: Apache Software Foundation
URL: http://xml.apache.org/xindice/
License: Open Source
Database type: Proprietary (Node-based)
Entry last updated: June, 2005

Xindice is a native XML database written in Java that is designed to store large numbers of small XML documents, as well as non-XML documents. It can index element and attribute values and compresses documents to save space. Documents are arranged into a hierarchy of collections and can be queried with XPath. (Collection names can be used as part the XPath query syntax, meaning it is possible to perform XPath queries across documents.) For updates, Xindice supports the XUpdate language from the XML:DB Initiative. Finally, Xindice comes with an experimental linking language that is similar to XLinks, and allows users to replace or insert content in an XML document at query time.

Xindice supports three APIs: the XML:DB API (also from the XML:DB Initiative), a CORBA API, and an XML-RPC plugin which supports access from languages such as PHP, Perl, and Applescript. In addition, Xindice provides XMLObjects, which allows users to extend the server functionality.

Xindice comes with a set of command line tools for using and administering the database, as well as complete documentation.

xml.gax.com (formerly NaX Base)

Developer: GAX Technologies (bought from Naxoft)
URL: http://www.gax.com/web/en/html/nodes_main/4_1063_231/5_1070_54.htm
License: Commercial
Database type: Proprietary
Entry last updated: April, 2009

From the Web site:

"xml.gax.com is a native XML database. It is targeted at projects where high speed searches though very large amounts of data are required."

"xml.gax.com is based on the NaX Base database, the source code of which GAX Technologies acquired in November 2005. GAX Technologies intends to continue developing the features of xml.gax.com and will focus on adding intelligent search capabilities while maintaining its rapid search speed."

Xpriori XMS

Developer: Xpriori
URL: http://www.xpriori.com/index.html
License: Commercial
Database type: Proprietary
Entry last updated: Summer, 2001

Xpriori XMS is a native XML database that uses a proprietary data store that uses Digital Pattern Processing, which is a form of hashing. It can store values as integers, doubles, strings, date, or date-times. It can store binary objects using XML-binary Optimized Packaging (XOP). XMS supports structural, full-text, and value indexes, all of which can be optionally created for a given set of nodes. Xpriori XML supports XQuery, XPath, and XSLT. It supports node-level inserts, updates, and deletes, apparently using a proprietary syntax. indexes - structural, full-text, value, all optional xml schemas Xpriori supports XQuery and XPath and XSLT, Store, Query, Insert, Delete and Modify. transactions, Web Services, web and command line admin tools, JAVA API, C++ API, .NET API, a COM Wrapper API, and WebDAV or using XML directly through HTTP(S) / SSL Security is supported by extensive access control at the group and user level. Users and/or groups can be added, deleted or modified with permissions applied to global, document, or node level access From the company:

"Xpriori XMS is a fully transactional native XML database system that serves as a bi-directional web server, accepting and returning XML documents and fragments via HTTP(S). It supports all basic database functions, including storage, delete, copy, and query for XML documents, and insert, modify, and query for XML data elements. Xpriori XMS is schema-independent and requires no database or schema design before using the system. That is, rows, columns, tables, fields, or indexing instructions do not need to be created before documents are added to the database. When new documents are added to the database, their structure and data - metadata and data - are derived and automatically indexed. Users then can change the structure of existing documents without database system redesign. Specific features of Xpriori XMS include XPath-based query support, access control, user-defined document data management, GUI-based administration, and session control."

"Xpriori XMS uses a variant of XPath as its query language; query responses return elements, document fragments, full documents, and multiple documents. Queries can be made without knowing the document structure - context can be queried for data, and data can be queried for context. Boolean and wildcard options are fully supported and all query results are well-formed XML. For query processing, Xpriori XMS uses Digital Pattern Processing, a patented technology that streamlines queries by using fixed-length icons."

"Xpriori XMS has built-in access control to set permissions at the document or fragment level, and at user or group levels. Xpriori XMS also supports access control by specifying IP addresses and supporting X.509 certificates via Netscape Server."

"Interfaces to Xpriori XMS include HTTP(S)/SSL, Java, C++, and Microsoft COM. Xpriori XMS also can interface with existing databases through the X-Aware data integration tool."

XQuantum XML Database Server

Developer: Cognetic Systems
URL: http://www.cogneticsystems.com/server.html
License: Commercial
Database type: Proprietary
Entry last updated: June, 2006

XQuantum XML Database Server is a native XML database built on a proprietary data store. It supports a subset of XQuery, a subset of the XQuery full-text specification, and XSLT.

XQuantum optimizes queries with a cost-based algorithm, which uses statistics about the data to optimize the search process. The query processor also relies on "recursive XML indexing" (a schemaless indexing method), lazy query evaluation, and stream processing of queries.

XQuantum supports static typing through its own typing mechanism, which "generalizes XQuery's sequence type syntax to include full regular expression types" and is used instead of XML Schemas. Types (effectively schemas for individual XML documents) can be declared in the prolog of an XQuery query or in external type modules. They are applied in the query through explicit validation and are used to provide type information to the query processor.

XQuantum includes a Web server, which allows it to use HTTP as its API. That is, queries are embedded in URLs and results are returned as an XML stream. Queries can also be placed in XQuery Server Pages. These are preferrable for URLs exposed to the public, as they are more secure (the query is not exposed to the public) and less fragile (the query can be changed without changing the URL).

XQuantum is also available as the XQuantum XML Database Appliance, a dedicated server running Linux and XQuantum.

XStreamDB Native XML Database

Developer: Bluestream Database Software Corp.
URL: http://www.bluestream.com/products/xstreamdb32
License: Commercial
Database type: Proprietary (Node-based)
Entry last updated: May, 2003

From the company:

"Bluestream XStreamDB(tm) version 3.0 is a native XML database, built in pure Java with XQuery, full text search, Java API, and support for schemas, DTDs, and binary and other non-XML datatypes. XStreamDB is accessible using a JDBC-like Java API, the XStreamDB Explorer GUI application, scripter, or using WebDAV to reach documents exposed as URIs. Security is enforced using MD5 message digest authentication and a user permissions scheme."

"XML documents are stored in a compressed object representation, using Bluestream's Streamstore database storage engine (also available separately). The database has a full transactions architecture that meets the four ACID requirements: Atomic, Consistent, Isolated, and Durable. Transaction support includes read, write, and update locks, as well as deadlock detection and victim selection. Commits and rollbacks are supported so that the system can recover in the case of a crash."

"It supports multiple, concurrent sessions, as well as session pooling, and recycles free space automatically, so compaction is not required. In addition, it allows partial document updates and document fragment insertion."

"Documents are stored in 'roots' in 'databases' on the XStreamDB server. A root is equivalent to a collection. Schemas or DTDs can be loaded and stored in a collection of 'schemas', and users are kept in a collection of 'users'. Access permissions can be assigned on documents with the built-in user permission scheme. XStreamDB stores both XML and binary document types, with associated mimetypes."

"Collections of XML documents in document roots can be forced to be schema valid by attaching a schema to the root. XStreamDB supports both W3C XML Schemas and DTDs. The XStreamDB resource manager can assign resource information to documents to expose them as URI unique identifiers (Universal Resource Identifier) through WebDAV, or the Resources API. Databases and roots are exposed as 'categories', and documents are exposed as 'resources' within those categories. The resource manager supports mimetypes, created sub-categories, locking, and naming. Resources can also be checked out and checked in to the file system by users."

"XStreamDB supports the XQuery query language for XML data, and has extended it to support insert, update, and full text searching capabilites."

"XStreamDB supports both value indexes and full text indexing. XML document roots with value indexes, will index on the value of data in a specified element or attribute. Full text indexes store a complete index of all content in all documents in the root."

"XQuery queries with full text expressions will finds text within XML document content using wildcard matching, word proximity, and phrase matching. Results are matched to the element or attribute in matching documents, and can be automatically marked."

A note about the history of XStreamDB, also from the company:

"XStreamDB was introduced by Bluestream Database Software Corp. in the spring of 2000. Soon after its introduction, Bluestream was acquired by XML Global and its XML database product renamed renamed GoXML DB. In September 2002, XML Global spun off the XML database division, reinstating the original company and product names. Bluestream XStreamDB version 3.0 is built by Bluestream and marketed by XML Global and other authorized resellers."

Xyleme Zone Server

Developer: Xyleme SA
URL: http://www.xyleme.com/xml_server
License: Commercial
Database type: Proprietary (Natix)
Entry last updated: July, 2002

Xyleme Zone Server is a native XML database that uses Natix as its engine. It supports XQuery and indexes documents at run time as they are added to the database. Xyleme Zone Server can run in clusters and can distribute queries across multiple machines. Local applications can access the server directly from C++ or Java, and remote applications can access it with SOAP. Security is provided on a per-document basis and the product ships with a set of administration tools.

Users can categorize documents according to their semantic type -- financial statements, product documentation, legal documents, etc. Each category is defined by an "abstract view", which is mapped to the schema of each class of documents in the category. This allows users to query all documents in a category by querying the view, rather than having to each class of documents separately. The query processor translates the query against the view into queries against each schema and returns results that correspond to the view.

Users can also subscribe to a service that notifies them of changes to documents. Individual subscriptions are defined as queries, using a proprietary language that (apparently) extends XQuery. Subscription queries run at specified individuals and applications check the output of these queries to determine what has changed.

Of interest, Xyleme SA provides an online repository of Web pages. This may be queried across the Web, presumably as part of queries that also query local data.

Copyright (c) 2010, Ronald Bourret