Length of an XML file
31 gigs is a really big text file. I bet it would compress down to about 1.5 gigs. I would create these files in a compressed format to begin with then you can stream a decompressed version of the file through wc. This will greatly reduce the amount of i/o and memory used to process this file. gzip can read and write compressed streams.
But I would also make the following comments:
- Line numbers are not really that informative for XML as whitespace between elements is ignored (except for mixed content). What do you really want to know about the dataset? I bet counting elements would be more useful.
- Make sure your xml file is not unnecessarily redunant, for example are you repeating the same namespace declarations all over the document?
- Perhaps XML is not the best way to represent this document, if it is try looking into something like Fast Infoset