How to work around lack of NUL terminator in strings returned from mmap()? How to work around lack of NUL terminator in strings returned from mmap()? unix unix

How to work around lack of NUL terminator in strings returned from mmap()?


I would suggest undergoing a paradigm shift here.

You're looking at the entire universe consisting of '\0'-delimited strings that define your text. Instead of looking at the world this way, why don't you try looking at the world where text is defined as a sequence defined by a beginning and an ending iterator.

You mmap your file, then initially set the beginning iterator, call it beg_iter to the start of the mmap-ed segment, and the ending iterator, call it end_iter, to the first byte following the last byte in the mmap-ed segment, or beg_iter+number_of_pages*pagesize, then until either

A) end_iter equals beg_iter, or

B) beg_iter[-1] is not a null character, then

C) decrement end_iter, and go back to step A.

When you're done, you have a pair of iterators, the beginning iterator value, and the ending iterator value that define your text string.

Of course, in this case, your iterators are plain char *, but that's really not very important. What is important is that now you find yourself with a rich set of algorithms and templates from the C++ standard library at your disposal, that let you implement many complicated operations, both mutable (like std::transform), and non-mutable, (like std::find).

Null-terminated strings are really a holdover from the days of plain C. With C++, null-terminated strings are somewhat archaic, and mundane. Modern C++ code should use std::string objects, and sequences defined by beginning and ending iterators.

One small footnote: instead of figuring out how much NULL padding you ended up mmap-ing(), you might find it easier to fstat() the file, and get the file's exact length, in bytes, before mmap-ing it. Then you'll now exactly know much got mmaped, and you don't have to reverse-engineer it, by looking at the padding.