Would same file from various docker images be page-cached in k8s node just once? Would same file from various docker images be page-cached in k8s node just once? kubernetes kubernetes

Would same file from various docker images be page-cached in k8s node just once?


Upon further investigation it became clear that content from unrelated containers/pods is shared on a node that may be a reasonable security risk.

Each line in a Dockerfile can represent 0,1 or many layers as per https://docs.docker.com/storage/storagedriver/. For example, my current build FROM ubuntugives three base layers in my image:

# docker inspect ubuntu :

"GraphDriver": {            "Data": {                "LowerDir": "/var/lib/docker/overlay2/22abb0d6b77061cc1e3a04de4d3c83be15e60b87adebf9b7b2fa9adc0fbb0f2d/diff:/var/lib/docker/overlay2/7ab02c0180d53cfa2f444a10650a688c8cebd0368ddb2cea1dba7f01b2008d37/diff:/var/lib/docker/overlay2/3ee0e4ab0518c76376a4023f7c438cc6a8d28121eba9cdbed9440cfc7474204e/diff",

If I further say RUN apt-get -y install python, docker will create a layer containing all folders, files, and timestamps produced by that command. The layer will be tar'd and sha256 will be taken from the tar file. In the ubuntu example above you can see the mezzanine layer has the sha256sum: 3ee0e4ab0518c76376a4023f7c438cc6a8d28121eba9cdbed9440cfc7474204e

Once my image is orchestrated by Kubernetes cluster, the layer will be inflated to a standard location on the node where the image is run. A link will be created to the layer folder - only reason for the link is to make the path shorter as explained here: https://docs.docker.com/storage/storagedriver/overlayfs-driver/. So a node running an image built from ubuntu will have something similar to:

# ls -l /var/lib/docker/overlay2/l |grep 3ee0e4ab0518c76lrwxrwxrwx    1 root     root            72 Dec 13 15:40 VGN2ARTYLKI6LQWXSZSMKUQOQL -> ../3ee0e4ab0518c76376a4023f7c438cc6a8d28121eba9cdbed9440cfc7474204e/diff

Note, that VGN2ARTYLKI6LQWXSZSMKUQOQL here is a/ node unique and b/ node specific. And this identifier will appear in mounts for containers. Root cgroup sees all the mounts on a node, and pid 1 would normally belong to the root cgroup. So the layer in question is shared like so:

# grep `ls -l /var/lib/docker/overlay2/l |grep 3ee0e4ab0518c76 |awk '{print$9}'` /proc/1/mountsoverlay /var/lib/docker/overlay2/84ec5295eb902ab01b37451f9063987f5803a0ff4bc53ee27c1838f783f61f48/merged overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/7RBRYLLCPECAY5IXIQWNNFMT4L:/var/lib/docker/overlay2/l/LK4X5JGJE327XH6STN6DHMQZUI:/var/lib/docker/overlay2/l/2RODCFKARIMWO2NUPHVP7HREVF:/var/lib/docker/overlay2/l/DH43WT4W2DPJTMMKHJL46IPIXM:/var/lib/docker/overlay2/l/DQBSRPR7QCKCXNT4QQHHC6L2TO:/var/lib/docker/overlay2/l/N3NL6BAOEKFZYIAXCCFEHMRJC2:/var/lib/docker/overlay2/l/VGN2ARTYLKI6LQWXSZSMKUQOQL,upperdir=/var/lib/docker/overlay2/84ec5295eb902ab01b37451f9063987f5803a0ff4bc53ee27c1838f783f61f48/diff,workdir=/var/lib/docker/overlay2/84ec5295eb902ab01b37451f9063987f5803a0ff4bc53ee27c1838f783f61f48/work 0 0overlay /var/lib/docker/overlay2/89ce211716bd81100b99ecacc3c9da7af602029b2724d01db41d5efad37f43e6/merged overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/SQEWZDFCQQX6EKH7IZHSFXKLBN:/var/lib/docker/overlay2/l/TJFM5IIGAQIKCMA5LDT6X4NUJK:/var/lib/docker/overlay2/l/DQBSRPR7QCKCXNT4QQHHC6L2TO:/var/lib/docker/overlay2/l/N3NL6BAOEKFZYIAXCCFEHMRJC2:/var/lib/docker/overlay2/l/VGN2ARTYLKI6LQWXSZSMKUQOQL,upperdir=/var/lib/docker/overlay2/89ce211716bd81100b99ecacc3c9da7af602029b2724d01db41d5efad37f43e6/diff,workdir=/var/lib/docker/overlay2/89ce211716bd81100b99ecacc3c9da7af602029b2724d01db41d5efad37f43e6/work 0 0

Two overlay mounts mean that the layer is shared between 2 running containers built from this version of ubuntu image. Or more concise:

# grep `ls -l /var/lib/docker/overlay2/l |grep 3ee0e4ab0518c76 |awk ‘{print$9}’` /proc/1/mounts |wc -l2

This confirms that the content is shared between unrelated containers and explains the difference in page cache utilization I see. Is it a security risk? In theory, an adversary could plant malicious code in ubuntu wrappers and devise a nonce yielding the same sha256. Is it a risk in practice? Probably not so much...