When setting up jails, I commonly end up with structures like this in my /etc/fstab:
/dev /jail/test/dev auto bind 0 0 /dev/pts /jail/test/dev/pts auto bind 0 0 /dev/shm /jail/test/dev/shm auto bind 0 0 /proc /jail/test/proc auto bind 0 0 /sys /jail/test/sys auto bind 0 0 /dev /jail/test2/dev auto bind 0 0 /dev/pts /jail/test2/dev/pts auto bind 0 0 /dev/shm /jail/test2/dev/shm auto bind 0 0 /proc /jail/test2/proc auto bind 0 0 /sys /jail/test2/sys auto bind 0 0
Now, if you take a bind mount and attempt to mount it while it’s already mounted, your system will most likely let you…
# findmnt /jail/test/dev TARGET SOURCE FSTYPE OPTIONS /jail/test/dev udev devtmpfs rw,relatime,size=10240k,nr_inodes=1012462,mode=755 # mount /jail/test/dev # findmnt /jail/test/dev TARGET SOURCE FSTYPE OPTIONS /jail/test/dev udev devtmpfs rw,relatime,size=10240k,nr_inodes=1012462,mode=755 /jail/test/dev udev devtmpfs rw,relatime,size=10240k,nr_inodes=1012462,mode=755
That’s just an effect of how bind mounts happen to work. However, I found that as soon as I had several of these jails, all bind mounting /dev and various paths in it into the jails, I began seeing duplicates. Lots of duplicates.
# findmnt /jail | wc -l 5172
…that’s bad
Turns out it’s caused by the mount propagation feature, in which a bind mount is supposed to receive additional binds if something is mounted in its parent. Here from man mount (8):
Since Linux 2.6.15 it is possible to mark a mount and its submounts as shared, private, slave or unbindable. A shared mount provides the ability to create mirrors of that mount such that mounts and unmounts within any of the mirrors propagate to the other mirror. A slave mount receives propagation from its master, but not vice versa. A private mount carries no propagation abilities. An unbindable mount is a private mount which cannot be cloned through a bind operation. The detailed semantics are documented in Documentation/filesystems/sharedsubtree.txt file in the kernel source tree.
What happens is that the default “shared” propagation causes any bind below /jail/test/dev, such as /jail/test/dev/pts get propagated up to /dev. It gets even worse when there are many jails, as the propagation down to /dev continues back up to any previously mounted jails:
# mount | grep "type dev" udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) # mount /jail/test/dev && mount | grep "type dev" && echo "so far so good..." udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) udev on /jail/test/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) so far so good... # mount /jail/test/dev/pts && mount | grep "type dev" && echo "wait, wtf?" udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) udev on /jail/test/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) devpts on /jail/test/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) wait, wtf? # mount /jail/test2/dev && mount /jail/test2/dev/pts && mount | grep "type dev" udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) udev on /jail/test/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) devpts on /jail/test/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) udev on /jail/test2/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755) devpts on /jail/test2/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) devpts on /jail/test/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
As seen in the documentation, a private mount “carries no propagation abilities”. Thus, all I had to do was make all my jail bind mounts private, as so:
/dev /jail/test/dev auto bind,private 0 0 /dev/pts /jail/test/dev/pts auto bind,private 0 0 /dev/shm /jail/test/dev/shm auto bind,private 0 0 /proc /jail/test/proc auto bind,private 0 0 /sys /jail/test/sys auto bind,private 0 0 /dev /jail/test2/dev auto bind,private 0 0 /dev/pts /jail/test2/dev/pts auto bind,private 0 0 /dev/shm /jail/test2/dev/shm auto bind,private 0 0 /proc /jail/test2/proc auto bind,private 0 0 /sys /jail/test2/sys auto bind,private 0 0
And, after a while…
# findmnt /jail | wc -l 27
That’s better 🙂
Now, I could probably make mount propagation work for me and not manually specify binds for pts and shm and such, but I didn’t feel like spending the time to work out the implications of that.