How to validate an XML file using Java with an XSD having an include? How to validate an XML file using Java with an XSD having an include? xml xml

How to validate an XML file using Java with an XSD having an include?


you need to use an LSResourceResolver for this to work. please take a look at the sample code below.

a validate method:

// note that if your XML already declares the XSD to which it has to conform, then there's no need to declare the schemaName herevoid validate(String xml, String schemaName) throws Exception {    DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();    builderFactory.setNamespaceAware(true);    DocumentBuilder parser = builderFactory            .newDocumentBuilder();    // parse the XML into a document object    Document document = parser.parse(new StringInputStream(xml));    SchemaFactory factory = SchemaFactory            .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);    // associate the schema factory with the resource resolver, which is responsible for resolving the imported XSD's    factory.setResourceResolver(new ResourceResolver());            // note that if your XML already declares the XSD to which it has to conform, then there's no need to create a validator from a Schema object    Source schemaFile = new StreamSource(getClass().getClassLoader()            .getResourceAsStream(schemaName));    Schema schema = factory.newSchema(schemaFile);    Validator validator = schema.newValidator();    validator.validate(new DOMSource(document));}

the resource resolver implementation:

public class ResourceResolver  implements LSResourceResolver {public LSInput resolveResource(String type, String namespaceURI,        String publicId, String systemId, String baseURI) {     // note: in this sample, the XSD's are expected to be in the root of the classpath    InputStream resourceAsStream = this.getClass().getClassLoader()            .getResourceAsStream(systemId);    return new Input(publicId, systemId, resourceAsStream);} }

The Input implemetation returned by the resource resolver:

public class Input implements LSInput {private String publicId;private String systemId;public String getPublicId() {    return publicId;}public void setPublicId(String publicId) {    this.publicId = publicId;}public String getBaseURI() {    return null;}public InputStream getByteStream() {    return null;}public boolean getCertifiedText() {    return false;}public Reader getCharacterStream() {    return null;}public String getEncoding() {    return null;}public String getStringData() {    synchronized (inputStream) {        try {            byte[] input = new byte[inputStream.available()];            inputStream.read(input);            String contents = new String(input);            return contents;        } catch (IOException e) {            e.printStackTrace();            System.out.println("Exception " + e);            return null;        }    }}public void setBaseURI(String baseURI) {}public void setByteStream(InputStream byteStream) {}public void setCertifiedText(boolean certifiedText) {}public void setCharacterStream(Reader characterStream) {}public void setEncoding(String encoding) {}public void setStringData(String stringData) {}public String getSystemId() {    return systemId;}public void setSystemId(String systemId) {    this.systemId = systemId;}public BufferedInputStream getInputStream() {    return inputStream;}public void setInputStream(BufferedInputStream inputStream) {    this.inputStream = inputStream;}private BufferedInputStream inputStream;public Input(String publicId, String sysId, InputStream input) {    this.publicId = publicId;    this.systemId = sysId;    this.inputStream = new BufferedInputStream(input);}}


The accepted answer is perfectly ok, but does not work with Java 8 without some modifications. It would also be nice to be able to specify a base path from which the imported schemas are read.

I have used in my Java 8 the following code which allows to specify an embedded schema path other than the root path:

import com.sun.org.apache.xerces.internal.dom.DOMInputImpl;import org.w3c.dom.ls.LSInput;import org.w3c.dom.ls.LSResourceResolver;import java.io.InputStream;import java.util.Objects;public class ResourceResolver implements LSResourceResolver {    private String basePath;    public ResourceResolver(String basePath) {        this.basePath = basePath;    }    @Override    public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {        // note: in this sample, the XSD's are expected to be in the root of the classpath        InputStream resourceAsStream = this.getClass().getClassLoader()                .getResourceAsStream(buildPath(systemId));        Objects.requireNonNull(resourceAsStream, String.format("Could not find the specified xsd file: %s", systemId));        return new DOMInputImpl(publicId, systemId, baseURI, resourceAsStream, "UTF-8");    }    private String buildPath(String systemId) {        return basePath == null ? systemId : String.format("%s/%s", basePath, systemId);    }}

This implementation also gives to the user a meaningful message in case the schema cannot be read.


As user "ulab" points out in a comment on another answer the solution described in this answer (to a separate stackoverflow question) will work for many. Here's the rough outline of that approach:

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);URL xsdURL = this.getResource("/xsd/my-schema.xsd");Schema schema = schemaFactory.newSchema(xsdURL);

The key to this approach is avoiding handing the schema factory a stream and instead giving it a URL. This way it gets information about the location of the XSD file.

One thing to keep in mind here is that the "schemaLocation" attribute on include and/or import elements will be treated as relative to the classpath location of the XSD file whose URL you've handed to the validator when you use simple file paths in the form "my-common.xsd" or "common/some-concept.xsd".

Notes: - In the example above I've placed the schema file into a jar file under an "xsd" folder. - The leading slash in the "getResource" argument tells Java to start at the root of the classloader instead of at the "this" object's package name.