Duplicate bind mounts with chroots on systemd

When setting up jails, I commonly end up with structures like this in my /etc/fstab:

/dev /jail/test/dev auto bind 0 0
/dev/pts /jail/test/dev/pts auto bind 0 0
/dev/shm /jail/test/dev/shm auto bind 0 0
/proc /jail/test/proc auto bind 0 0
/sys /jail/test/sys auto bind 0 0

/dev /jail/test2/dev auto bind 0 0
/dev/pts /jail/test2/dev/pts auto bind 0 0
/dev/shm /jail/test2/dev/shm auto bind 0 0
/proc /jail/test2/proc auto bind 0 0
/sys /jail/test2/sys auto bind 0 0

Now, if you take a bind mount and attempt to mount it while it’s already mounted, your system will most likely let you…

# findmnt /jail/test/dev
TARGET            SOURCE FSTYPE   OPTIONS
/jail/test/dev udev   devtmpfs rw,relatime,size=10240k,nr_inodes=1012462,mode=755
# mount /jail/test/dev
# findmnt /jail/test/dev
TARGET            SOURCE FSTYPE   OPTIONS
/jail/test/dev udev   devtmpfs rw,relatime,size=10240k,nr_inodes=1012462,mode=755
/jail/test/dev udev   devtmpfs rw,relatime,size=10240k,nr_inodes=1012462,mode=755

That’s just an effect of how bind mounts happen to work. However, I found that as soon as I had several of these jails, all bind mounting /dev and various paths in it into the jails, I began seeing duplicates. Lots of duplicates.

# findmnt /jail | wc -l
5172

…that’s bad

Turns out it’s caused by the mount propagation feature, in which a bind mount is supposed to receive additional binds if something is mounted in its parent. Here from man mount (8):

Since Linux 2.6.15 it is possible to mark a mount and its submounts
as shared, private, slave or unbindable. A shared mount provides the
ability to create mirrors of that mount such that mounts and unmounts
within any of the mirrors propagate to the other mirror. A slave
mount receives propagation from its master, but not vice versa. A
private mount carries no propagation abilities. An unbindable mount
is a private mount which cannot be cloned through a bind operation.
The detailed semantics are documented in
Documentation/filesystems/sharedsubtree.txt file in the kernel source
tree.

What happens is that the default “shared” propagation causes any bind below /jail/test/dev, such as /jail/test/dev/pts get propagated up to /dev. It gets even worse when there are many jails, as the propagation down to /dev continues back up to any previously mounted jails:

# mount | grep "type dev"
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
# mount /jail/test/dev && mount | grep "type dev" && echo "so far so good..."
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
udev on /jail/test/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
so far so good...

# mount /jail/test/dev/pts && mount | grep "type dev" && echo "wait, wtf?"
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
udev on /jail/test/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
devpts on /jail/test/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
wait, wtf?

# mount /jail/test2/dev && mount /jail/test2/dev/pts && mount | grep "type dev"
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
udev on /jail/test/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
devpts on /jail/test/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
udev on /jail/test2/dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=61165,mode=755)
devpts on /jail/test2/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /jail/test/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

As seen in the documentation, a private mount “carries no propagation abilities”. Thus, all I had to do was make all my jail bind mounts private, as so:

/dev /jail/test/dev auto bind,private 0 0
/dev/pts /jail/test/dev/pts auto bind,private 0 0
/dev/shm /jail/test/dev/shm auto bind,private 0 0
/proc /jail/test/proc auto bind,private 0 0
/sys /jail/test/sys auto bind,private 0 0

/dev /jail/test2/dev auto bind,private 0 0
/dev/pts /jail/test2/dev/pts auto bind,private 0 0
/dev/shm /jail/test2/dev/shm auto bind,private 0 0
/proc /jail/test2/proc auto bind,private 0 0
/sys /jail/test2/sys auto bind,private 0 0

And, after a while…

# findmnt /jail | wc -l
27

That’s better 🙂

Now, I could probably make mount propagation work for me and not manually specify binds for pts and shm and such, but I didn’t feel like spending the time to work out the implications of that.

Leave a Reply

Your email address will not be published. Required fields are marked *