What is the performance penalty of XML data type in SQL Server when compared to NVARCHAR(MAX)?
If you do have XML, and you can be sure it's always XML, I would definitely recommend going that way. SQL Server stores XML in an optimized format - you don't even need any XML indices on that to benefit from it.
If you insert 5000 rows of a 5KB XML into an XML column, you get roughly 1250 pages = 9 MB. Inserting the same 5000 rows with the same 5KB XML into NVARCHAR(MAX) uses over 3700 pages or 29 MB - quite a difference!
And that difference should be even more pronounced if you can associate your XML with a XML schema stored in SQL Server. Plus you're also guaranteed that the XML stored conforms to a schema - can be very helpful at times! Can't do that with a plain NVARCHAR(MAX) column...
And I don't agree that using XML over NVARCHAR(MAX) has any performance penalty - quite the contrary. Since you're potentially retrieving less data from SQL Server when you're about to display or fetch the content, I would argue it's even a tad faster than NVARCHAR(MAX).
This benchmark shows XML data type using less IO but a little more CPU than VARCHAR(MAX). I would think NVARCHAR(MAX) would take even more IO since it's unicode.
http://searchsqlserver.techtarget.com/tip/XML-data-type-in-SQL-Server-2005-vs-VARCHAR-MAX