EXI (efficient XML interchange) coming... Are XML APIs ready? EXI (efficient XML interchange) coming... Are XML APIs ready? xml xml

EXI (efficient XML interchange) coming... Are XML APIs ready?


You don't need any new APIs to get the performance gains of EXI. All the EXI testing and performance measurements the W3C has conducted use the standard SAX APIs built into the JDK. For the latest tests, see http://www.w3.org/TR/exi-evaluation/#processing-results. EXI parsing was on average 14.5 times faster than XML in these tests without any special APIs.

One day, if people think its worthwhile, we may see some typed XML APIs emerge. If and when that happens, you will get even better performance from EXI. However, this is not required to get excellent performance like that reported by the W3C.


Let's see EXI as a "better GZIP for XML". FYI, it has no impact on the APIs as you can still used all of them (DOM, SAX, StAX, JAXB ...). Only that in order to get EXI you have to get a streamwriter that writes to it or a streamreader that reads it.

The most efficient way to perform EXI is StAX. But it is true that new API might arise because of EXI. But who said DOM is efficient and well designed for modern languages ;-)

If you are handling big XML files (I got some of them that are few hundreds of MB), you definitively knows why you need EXI : saving tons of space, saving huge amount of memory and processing time.

This is nothing different than HTTP Content-Encoding purpose : you are not required to use it, simply that if both parties understand it, it is a much efficient way to perform the exchange.

By the way, EXI will become the prefered way to content-encore any XML over HTTP IMHO because of SOAP bloat ;-) As soon as EXI settle on the browsers, it could also benefit any enduser : faster transfert, faster analysis = best experience ever for same machine!

EXI does not deprecate string representation, only makes it a bit different. Oh and by the way, when doing UTF (think default UTF8 for instance), you are already using a "compression encoding" for the 32bits unicode code point ... this means, that on the wire data is not the same as real data already ;-)


I'm dealing with EXI right now.

There's no good universal tool for processing EXI. Once you get into the guts of EXI, you realize there is a bunch of needless delimiters in the binary stream which are absolutely and completely unnecessary with a schema. Some of it is humorous.

How would you think the following would be encoded in EXI if both values are specified?

<xs:complexType name="example">  <xs:sequence>    <xs:element name="bool1" type="xs:boolean" minOccurs="0" />    <xs:element name="bool2" type="xs:boolean" minOccurs="0" />  </xs:sequence></xs:complexType>

Would you think it might be maximum 4 bits? 1 bit to indicate if bool1 is defined, and that the value of bool1, followed by another bit to indicate if bool2 is defined, then the value of bool2?

Good golly no!

Well let me tell you boys and girls! This is how it's actually encoded

+---- A value of 0 means this element (bool1) is not specified,|       1 indicates it is specified|+--- A value of x means this element is undefined,||      0 means the bool is set to false, 1 is set to true||+-- A value of 0 means this element (bool2) is not specified,|||     1 indicates it is specified|||+- A value of x means this element is undefined||||    0 means the bool is set to false, 1 is set to true||||0x0x  4 0100           # neither bools are specified0x10  8 00100000       # bool1 is not specified, bool2 is set to false0x11  8 00101000       # bool1 is not specified, bool2 is set to true100x  9 000000010      # bool1 is set to false, bool2 is not specified110x  9 000010010      # bool1 is set to true, bool2 is not specified1010 13 0000000000000  # bool1 is set to false, bool2 is set to false1011 13 0000000001000  # bool1 is set to false, bool2 is set to true1110 13 0000100000000  # bool1 is set to true, bool2 is set to false1111 13 0000100001000  # bool1 is set to true, bool2 is set to true        ^           ^        +-encoding--+Which can be represented with this tree  0-0-0-0-0-0-0-0-0-0-0-0-0 (1010)   \ \   \     \   \    | |   |     |   1-0-0-0 (1011)    | |   |     |    | |   |     1-0 (100x)    | |   |    | |   1-0-0-0-0-0-0-0-0 (1110)    | |        \   \    | |         |   1-0-0-0 (1111)    | |         |    | |         1-0 (110x)    | |    | 1-0-0-0-0-0 (0x10)     |    \    |     1-0-0-0 (0x11)    |    1-0-0 (0x0x)

A minimum of 4 bits, MINIMUM in order not to define either. Now I'm being a little unfair, because I'm including delimiters - delimiters which are entirely unnecessary.

I understand how this works, now. Here's the spec:

https://www.w3.org/TR/exi/

Have fun reading that! It was a GREAT DEAL OF FUN FOR ME!!!!@@##!@

Now this is just with a schema, and the EXI spec specifically says that you can still encode XML that does NOT conform with a schema. Which is hilarious because this is supposed to be for small little web devices. What do you do with unexpected data that you have no provisions for handling in an embedded device?

Why, you just die of course. There's no recovery for something you don't expect. It's not like these things have a screen, I'm lucky if I can log into it through a serial port.

I have used 4 different XSD generators/parsers/XML generators. 3 of them choke on the Schema I have to use. Data marshaling for C and C++ (remember this is for EMBEDDED system with very little memory and CPU power) are awful.

XSD describes basically a structure or class architecture and there isn't a single tool I can find that will just create the classes. The XSD example I gave above should create a structure with a 4 bools, 2 bools are the values, and 2 bools indicate if they even are defined.

But does THAT exist? Well heck no.

I like XML, for describing documents. Really I do - but here is what I hate about XML - for a widely adopted standard, the available tools for it are absolutely terrible. Just reading a schema is a difficult thing to do when it's spread across multiple namespaces and documents.

Rant rant, huff huf

The only reason we are using this is some standards committee insisted upon it. What it's done is created a monopoly for a small group of companies that already implemented this, that's the only purpose.

EXI is not a widely adopted standard, XML is a poor encapsulator for numeric data, and it's a pain to implement it and there are no decent tools for it. EXIP is at version 5.0 - anything that works that is open source is in Java - at least I have that.

For my field of work, EXI is just a bad design decision. I've worked on tons of communications protocols on various embedded systems. I worked on DOCSIS, which all modern cable modems use - they use a simple, and extensible, Type/Length/Value protocol with provisions for dealing with unrecognized types - which is why the Length is always included. It's simple, it takes literally days to implement the entire stack.

EXI is very difficult to hand code, there are no decent processors for it, and worst of all, all the processors I have found that actually work well with it, just transform it from EXI<->XML - which is totally useless.

I have resorted to writing my own XSD parser, which means I have to understand at least the entire XML specification for those parts of this design that use it - and that's extensive. What would have taken me 2 weeks to do with any reasonable spec, took me 10. Nobody in my world is going to use this unless it's shoved down their throat and they shouldn't, it's a square peg for a round hole.