XML and Your Database. Or Not
The other day I watched an ODTUG webinar "Thinking Clearly about XML" presented by Marco Gralike. You can view an archived version of the presentation at https://www3.gotomeeting.com/register/660686966.
The majority of the presentation discussed how to store the data while retaining its XML structure. Marco discussed various methods, most of which were built using the XMLType container.
Now before I start in here, I want to make it clear that Marco was specifically tasked with covering Oracle's native XML functionality in this presentation. And before he got into the meat of things, he hinted that he would do things differently in his own projects. That said...
XML is used in a lot of different ways - most notably web API output, RSS, even application configuration files.
XML works reasonably well as a transport format - well enough, at least, that most APIs offer it as an option.
But when it comes to your DBMS, XML is a poor choice for data storage.
First of all, query performance is horrendous. In Marco's presentation, the second best performing option took 12 seconds to query an XMLType container populated from a 100MB XML file. (The insert performance is pretty bad, too. But since any given piece of data only gets inserted once, whereas the number of queries against that data is theoretically unlimited, insert performance is of less concern.)
Second, you need to query the data using Xpath or Xquery. Developers have enough trouble writing SQL. They don't need to be burdened with another query language/paradigm.
Third, one of the biggest complaints against XML is its verbosity. Indeed, it is often the case where the XML markup within a file is larger than the data itself - and not just in the trivial case. XML bloat is a significant contributing factor in the growing popularity of JSON as an alternative transport markup. If you've got a lot of data, you're going to end up wasting a lot of disk space storing markup.
Fourth, what do you do if the source party changes format - say, to JSON? What if you have to turn around and deliver this data to other parties in a format other than XML? (And just because you're not sharing the data today, doesn't mean that you won't have to tomorrow, or next month, or next year...)
If you are regularly receiving XML data that is consistently structured, the best way to source and store that data in your database is to:
1) develop an appropriate relational data model
2) write a load script in python (perl, java, ruby...) that strips out the formatting and stores the raw data in the relational model.
All of these languages have XML parsers that make this a fairly straightforward process.
This isn't just about XML, either. The same advice applies to data sourced in JSON.
I see this becoming a running theme - just because you can do something in the database, doesn't necessarily mean that you should do that thing in the database.
- Michael Reichner's blog
- Login to post comments