An invalid XML character (Unicode: 0xc) was found An invalid XML character (Unicode: 0xc) was found xml xml

An invalid XML character (Unicode: 0xc) was found


There are a few characters that are dissallowed in XML documents, even when you encapsulate data in CDATA-blocks.

If you generated the document you will need to entity encode it or strip it out. If you have an errorneous document, you should strip away these characters before trying to parse it.

See dolmens answer in this thread: Invalid Characters in XML

Where he links to this article: http://www.w3.org/TR/xml/#charsets

Basically, all characters below 0x20 is disallowed, except 0x9 (TAB), 0xA (CR?), 0xD (LF?)


public String stripNonValidXMLCharacters(String in) {    StringBuffer out = new StringBuffer(); // Used to hold the output.    char current; // Used to reference the current character.    if (in == null || ("".equals(in))) return ""; // vacancy test.    for (int i = 0; i < in.length(); i++) {        current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.        if ((current == 0x9) ||            (current == 0xA) ||            (current == 0xD) ||            ((current >= 0x20) && (current <= 0xD7FF)) ||            ((current >= 0xE000) && (current <= 0xFFFD)) ||            ((current >= 0x10000) && (current <= 0x10FFFF)))            out.append(current);    }    return out.toString();}    


Whenever invalid xml character comes xml, it gives such error. When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. I m using xml version 1.0 and i validate text data before entering it in database by pattern

Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+"); retunContent = p.matcher(retunContent).replaceAll("");

It will ensure that no invalid special char will enter in xml