Relation between input and ciphertext length in AES Relation between input and ciphertext length in AES php php

Relation between input and ciphertext length in AES


The relation depends on the padding and the chaining modes you are using, and the algorithm block size (if it is a block cipher).

Some encryption algorithms are stream ciphers which encrypt data "bit by bit" (or "byte by byte"). Most of them produce a key-dependent stream of pseudo-random bytes, and encryption is performed by XORing that stream with the data (decryption is identical). With a stream cipher, the encrypted length is equal to the plain data length.

Other encryption algorithms are block ciphers. A block cipher, nominally, encrypts a single block of data of a fixed length. AES is a block cipher with 128-bit blocks (16 bytes). Note that AES-256 also uses 128-bit blocks; the "256" is about the key length, not the block length. The chaining mode is about how the data is to be split into several such blocks (this is not easy to do it securely, but CBC mode is fine). Depending on the chaining mode, the data may require some padding, i.e. a few extra bytes added at the end so that the length is appropriate for the chaining mode. The padding must be such that it can beunambiguously removed when decrypting.

With CBC mode, the input data must have a length multiple of the block length, so it is customary to add PKCS#5 padding: if the block length is n, then at least 1 byte is added, at most n, such that the total size is a multiple of n, and the last added bytes (possibly all of them) have numerical value k where k is the number of added bytes. Upon decryption, it suffices to look at the last decrypted byte to recover k and thus know how many padding bytes must be ultimately removed.

Hence, with CBC mode and AES, assuming PKCS#5 padding, if the input data has length d then the encrypted length is (d + 16) & ~15. I am using C-like notation here; in plain words, the length is between d+1 and d+16, and multiple of 16.

There is a mode called CTR (as "counter") in which the block cipher encrypts successive values of a counter, yielding a stream of pseudo-random bytes. This effectively turns the block cipher into a stream cipher, and thus a message of length d is encrypted into d bytes.

Warning: about all encryption systems (including stream ciphers) and modes require an extra value called the IV (Initial Value). Each message shall have its IV, and no two messages encrypted with the same key shall use the same IV. Some modes have extra requirements; in particular, for both CBC and CTR, the IV shall be selected randomly and uniformly with a cryptographically strong pseudo-random number generator. The IV is not secret, but must be known by the decrypter. Since each message gets its own IV, it is often needed to encode the IV along with the encrypted message. With CBC or CTR, the IV has length n, so, for AES, that's an extra 16 bytes. I do not know what mcrypt does with the IV, but, cryptographically speaking, the IV must be managed at some point.

As for Base64, it is good for transferring binary data over text-only media, but this should not be necessary for a proper database. Also, Base64 enlarges data by about 33%, so it should not be applied blindly. I think you are best avoiding Base64 here.


From my understanding, in block modes (cbc, ecb) output length will be rounded to the block size, as returned by mcrypt_enc_get_block_size. Plus, you need to store IV along with the data, so the size will be rounded strlen(data) + mcrypt_enc_get_iv_size().

As for the base64 encoding, I wouldn't bother (but make sure to use hex encoding when dumping your db).


For AES CBC block cipher with PKCS#5 padding,

#define BLOCKSIZE 16size_t CipherTextLen = (PlainTxtLen / BLOCKSIZE + 1) * BLOCKSIZE;

This doesn't take into account the initialisation vector