How to work around lack of NUL terminator in strings returned from mmap()?
I would suggest undergoing a paradigm shift here.
You're looking at the entire universe consisting of '\0'-delimited strings that define your text. Instead of looking at the world this way, why don't you try looking at the world where text is defined as a sequence defined by a beginning and an ending iterator.
You mmap
your file, then initially set the beginning iterator, call it beg_iter
to the start of the mmap-ed segment, and the ending iterator, call it end_iter
, to the first byte following the last byte in the mmap-ed segment, or beg_iter+number_of_pages*pagesize
, then until either
A) end_iter
equals beg_iter
, or
B) beg_iter[-1]
is not a null character, then
C) decrement end_iter
, and go back to step A.
When you're done, you have a pair of iterators, the beginning iterator value, and the ending iterator value that define your text string.
Of course, in this case, your iterators are plain char *
, but that's really not very important. What is important is that now you find yourself with a rich set of algorithms and templates from the C++ standard library at your disposal, that let you implement many complicated operations, both mutable (like std::transform
), and non-mutable, (like std::find
).
Null-terminated strings are really a holdover from the days of plain C. With C++, null-terminated strings are somewhat archaic, and mundane. Modern C++ code should use std::string
objects, and sequences defined by beginning and ending iterators.
One small footnote: instead of figuring out how much NULL
padding you ended up mmap-ing(), you might find it easier to fstat() the file, and get the file's exact length, in bytes, before mmap-ing it. Then you'll now exactly know much got mmaped, and you don't have to reverse-engineer it, by looking at the padding.