What is the actual impact of calling socket.recv with a bufsize that is not a power of 2? What is the actual impact of calling socket.recv with a bufsize that is not a power of 2? python python

What is the actual impact of calling socket.recv with a bufsize that is not a power of 2?


I'm pretty sure the 'power of 2' advice is based on an error in editing, and should not be taken as a requirement.

That specific piece of advice was added to the Python 2.5 documentation (and backported to Python 2.4.3 docs), in response to Python issue #756104. The reporter was using an unreasonably large buffer size for socket.recv(), which prompted the update.

It was Tim Peters that introduced the 'power of 2' concept:

I expect you're the only person in history to try passing such a large value to recv() -- even if it worked, you'd almost certainly run out of memory trying to allocate buffer space for 1.9GB. sockets are a low-level facility, and it's common to pass a relatively small power of 2 (for best match with hardware and network realities).

(Bold emphasis mine). I've worked with Tim and he has a huge amount of experience with network programming and hardware, so generally speaking I'd take him on his word when making a remark like that. He was particularly 'fond' of the Windows 95 stack, he called it his canary in a coalmine for its ability to fail under stress. But note that he says it is common, not that it is required to use a power of 2.

It was that wording that then led to the documentation update:

This is a documentation bug; something the user should be "warned" about.

This caught me once, and two different persons asked about this in #python, so maybe we should put something like the following in the recv() docs.

"""
For best match with hardware and network realities, the
value of "buffer" should be a relatively small power of 2,
for example, 4096.
"""

If you think the wording is right, just assign the bug to me, I'll take care of it.

No one challenged the 'power of 2' assertion here, but the editor moved from it is common to should be in the space of a few replies.

To me, those proposing the documentation update were more concerned with making sure you use a small buffer, and not whether or not it is a power of 2. That's not to say it is not good advice however; any low-level buffer that interacts with the kernel benefits with alignment with the kernel data structures.

But although there may well be an esoteric stack where buffers with a size that is a power of 2 matters even more, I doubt Tim Peters ever meant for his experience (that it is common practice) to be cast in such iron-clad terms. Just ignore it if a different buffer size makes more sense for your specific use cases.


In regards to: "if you have a protocol where the incoming packet length is exactly known, it is obviously preferrable to only read "at most" what is needed for the packet you are dealing with, otherwise you could potentially eat into the next packet and that would be irritating."

This may be preferable for the application developer, but is probably inefficient for the underlying network stack. First, it ties up socket buffer space that can be used for additional network I/Os. Second, each recv() you make means dipping into a system call/kernel space and there is a performance penalty for the transition. It is always preferable to get as much data as you can out of kernel space and into user space with as few system calls as possible and do your message parsing there. This adds more complexity to the application code and message handling but is probably the most efficient.

That said, given the speed of today's processors and amount of available memory, this may not be an issue for most applications, but this was a common recommendation for network applications back in the "old days".

I am not sure about the power of 2 recommendation from a user-space application. I have seen these types requirements for drivers due to alignment and page size issues, etc. but its not clear what effect this has from user space unless it somehow aids in copying data out of kernel buffers into user buffers. Maybe somebody with more OS development knowledge could comment.