Is it "bad" to store XML in a database? Is it "bad" to store XML in a database? xml xml

Is it "bad" to store XML in a database?


There are some really stupid answers here - just because a database supports a data type does not mean you should be using it. These things are invariably added in as features because the competition have them, not because they are the right thing to do. Global variables? Triggers? Would anyone like to defend them too just because you can use them and they're there?

If you have multiple attributes, the best way to handle them in a relational database is with a one to many relationship. Parse out your useful data from the XML overhead. You then just store the ID (primary key) of the parent record with each of the rows stored in a second table, one row per attribute. You can have any number of attributes per parent record. It's database design 101, nothing clever. Storing it as unstructured XML just to store a variable number of attributes is not the way to go, it's a sledgehammer to crack a peanut. A one to many relationship between two tables is simpler, easier to understand, much faster to query, much less effort coding, and less storage (which means faster queries). Everyone wins, apart from the storage vendors.

XML is a data transfer protocol; as GolezTrol rightly said, "It is a way to export (and import) data" - i.e.: it is simply an overhead used to facilitate the communication of the structure of the data between different systems. Once received, the tags should be stripped out and the data (and only the data) stored in your database engine of choice, whatever that might be. Not the XML itself. The overhead for XML is ~10x that of the data it's describing. Want to tell your boss why that 100GB of data is occupying 1TB of space on your hyper expensive SAN? Or taking all night to back up over a saturated network link? Or causing performance problems in production? If you don't parse out the data from the now pointless tags, you will just push the problem and ongoing, daily support costs onto operational support for the next ten years. Sloppy, sloppy, sloppy. This keeps vendors like EMC in business.

XML is metadata. Nothing clever, just a schema descriptor. Once it's transferred and parsed it's lost its usefulness and is just clutter that clogs up whatever database you use. Get rid of it, unless you're compulsively addicted to hording yesterday's pointless crappy description metadata, stored many times over. Wake up. It's typical "Emperor's New Clothes" syndrome, stopped being conned by something simple and disposable. It's only metadata and it should not be stored or worshipped, it's junk once it's parsed. And what's better? To parse it once, or to uselessly parse it every time you need data from it? The answer's pretty darned obvious to me.


It's not bad at all. Microsoft SQL Server has an XML data type. One use case for storing XML is a situation we found ourselves in. For each row in a particular table, we needed to store a variable number of attributes related to that row. And the number of these attributes can change over time, and with each row. We found it more efficient to store these attributes, and their values in an XML format. In the future, each time we adjust the number of attributes, we don't need to make schema changes.


Storing XML, JSON, YAML, comma-seperated lists, binary blobs, or anything else in a database is not bad ... per se.

It can indicate a lack of understanding of what a database is for (storing data that is related to other data) and conjures up visions of databases with single column tables called data1, data2, etc. ... with each table row holding a +5 MB entry of XML encoded relational data.

On the other hand, there are many valid cases that can be made for such a structure -- rapidly changing configurations might be represented in JSON and stored in a two column table structured like this:

dbo.good_tableApplicationID (bigint)Configuration (varchar(max))

The difference between the above table and a table like this:

dbo.bad_tableApplicationID (bigint)ApplicationMembers(xml)

Is that good_table is enabling rapid access to a piece of data (the configuration), while the bad_table is using the database as an ofttimes expensive (and slow) hard disk.