Escape invalid XML characters in C#
As the way to remove invalid XML characters I suggest you to use XmlConvert.IsXmlChar method. It was added since .NET Framework 4 and is presented in Silverlight too. Here is the small sample:
void Main() { string content = "\v\f\0"; Console.WriteLine(IsValidXmlString(content)); // False content = RemoveInvalidXmlChars(content); Console.WriteLine(IsValidXmlString(content)); // True}static string RemoveInvalidXmlChars(string text) { var validXmlChars = text.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray(); return new string(validXmlChars);}static bool IsValidXmlString(string text) { try { XmlConvert.VerifyXmlChars(text); return true; } catch { return false; }}
And as the way to escape invalid XML characters I suggest you to use XmlConvert.EncodeName method. Here is the small sample:
void Main() { const string content = "\v\f\0"; Console.WriteLine(IsValidXmlString(content)); // False string encoded = XmlConvert.EncodeName(content); Console.WriteLine(IsValidXmlString(encoded)); // True string decoded = XmlConvert.DecodeName(encoded); Console.WriteLine(content == decoded); // True}static bool IsValidXmlString(string text) { try { XmlConvert.VerifyXmlChars(text); return true; } catch { return false; }}
Update:It should be mentioned that the encoding operation produces a string with a length which is greater or equal than a length of a source string. It might be important when you store a encoded string in a database in a string column with length limitation and validate source string length in your app to fit data column limitation.
using System;using System.Security;class Sample { static void Main() { string text = "Escape characters : < > & \" \'"; string xmlText = SecurityElement.Escape(text);//output://Escape characters : < > & " ' Console.WriteLine(xmlText); }}
If you are writing xml, just use the classes provided by the framework to create the xml. You won't have to bother with escaping or anything.
Console.Write(new XElement("Data", "< > &"));
Will output
<Data>< > &</Data>
If you need to read an XML file that is malformed, do not use regular expression. Instead, use the Html Agility Pack.