How to change character encoding of XmlReader How to change character encoding of XmlReader xml xml

How to change character encoding of XmlReader


To force .NET to read the file in as ISO-8859-9, just use one of the many XmlReader.Create overloads, e.g.

using(XmlReader r = XmlReader.Create(new StreamReader(fileName, Encoding.GetEncoding("ISO-8859-9")))) {    while(r.Read()) {        Console.WriteLine(r.Value);    }}

However, that may not work because, IIRC, the W3C XML standard says something about when the XML declaration line has been read, a compliant parser should immediately switch to the encoding specified in the XML declaration regardless of what encoding it was using before. In your case, if the XML file has no XML declaration, the encoding will be UTF-8 and it will still fail. I may be talking nonsense here so try it and see. :-)


The XmlTextReader class (which is what the static Create method is actually returning, since XmlReader is the abstract base class) is designed to automatically detect encoding from the XML file itself - there's no way to set it manually.

Simply insure that you include the following XML declaration in the file you are reading:

<?xml version="1.0" encoding="ISO-8859-9"?>


If you can't ensure that the input file has the right header, you could look at one of the other 11 overloads to the XmlReader.Create method.

Some of these take an XmlReaderSettings variable or XmlParserContext variable, or both. I haven't investigated these, but there is a possibility that setting the appropriate values might help here.

There is the XmlReaderSettings.CheckCharacters property - the help for this states:

Instructs the reader to check characters and throw an exception if any characters are outside the range of legal XML characters. Character checking includes checking for illegal characters in the document, as well as checking the validity of XML names (for example, an XML name may not start with a numeral).

So setting this to false might help. However, the help also states:

If the XmlReader is processing text data, it always checks that the XML names and text content are valid, regardless of the property setting. Setting CheckCharacters to false turns off character checking for character entity references.

So further investigation is warranted.