Python Socket Receive Large Amount of Data
TCP/IP is a stream-based protocol, not a message-based protocol. There's no guarantee that every send()
call by one peer results in a single recv()
call by the other peer receiving the exact data sent—it might receive the data piece-meal, split across multiple recv()
calls, due to packet fragmentation.
You need to define your own message-based protocol on top of TCP in order to differentiate message boundaries. Then, to read a message, you continue to call recv()
until you've read an entire message or an error occurs.
One simple way of sending a message is to prefix each message with its length. Then to read a message, you first read the length, then you read that many bytes. Here's how you might do that:
def send_msg(sock, msg): # Prefix each message with a 4-byte length (network byte order) msg = struct.pack('>I', len(msg)) + msg sock.sendall(msg)def recv_msg(sock): # Read message length and unpack it into an integer raw_msglen = recvall(sock, 4) if not raw_msglen: return None msglen = struct.unpack('>I', raw_msglen)[0] # Read the message data return recvall(sock, msglen)def recvall(sock, n): # Helper function to recv n bytes or return None if EOF is hit data = bytearray() while len(data) < n: packet = sock.recv(n - len(data)) if not packet: return None data.extend(packet) return data
Then you can use the send_msg
and recv_msg
functions to send and receive whole messages, and they won't have any problems with packets being split or coalesced on the network level.
The accepted answer is fine but it will be really slow with big files -string is an immutable class this means more objects are created every time you use the +
sign, using list
as a stack structure will be more efficient.
This should work better
while True: chunk = s.recv(10000) if not chunk: break fragments.append(chunk)print "".join(fragments)