Issues with inotify MOVE events and Kubernetes Volumes

TL;DR: inotify move events (IN_MOVE, IN_MOVED_FROM, IN_MOVED_TO) are triggered by rename(2), which does not work when the source and destination paths are on two different mount points.

The mountpoint of a path can be checked with findmnt -n -o SOURCE --target /path/to/FILE (ref).

At work we use Envoy to route HTTP requests between our services. Recently we have the need to reload its configuration at runtime, which is supported by the filesystem xDS protocol.

With the following configuration,

dynamic_resources:
  lds_config:
    path_config_source:
      path: /etc/envoy/lds.yml

      watched_directory:
        path: /etc/envoy

  cds_config:
    path_config_source:
      path: /etc/envoy/cds.yml

      watched_directory:
        path: /etc/envoy

Envoy can be reconfigured at runtime simply by replacing/editing the referenced xDS config files, e.g.:

$ cp /home/user/lds.new.yml /etc/envoy/lds.yml
$ cp /home/user/cds.new.yml /etc/envoy/cds.yml

However, Envoy documentation states that “Envoy only updates when the configuration file is replaced by a file move, and not when the file is edited in place. It is implemented this way to ensure configuration consistency.”, which means we have to trigger an inotify MOVED_TO event within the watched directory for Envoy to apply the new configs like this:

$ touch /tmp/aa.txt
$ mv /tmp/aa.txt /etc/envoy/aa.txt

Which, as expected, will cause Envoy to reload the config files:

[2024-12-28 05:31:24.190][1][debug][file] [source/common/filesystem/inotify/watcher_impl.cc:75] notification: fd: 1 mask: 80 file: aa.txt
[2024-12-28 05:31:24.190][1][debug][file] [source/common/filesystem/inotify/watcher_impl.cc:91] matched callback: directory: aa.txt
[2024-12-28 05:31:24.191][1][info][upstream] [source/common/upstream/cds_api_helper.cc:32] cds: add 1 cluster(s), remove 0 cluster(s)
[2024-12-28 05:31:24.191][1][info][upstream] [source/common/upstream/cds_api_helper.cc:71] cds: added/updated 0 cluster(s), skipped 1 unmodified cluster(s)

The 0x80 mask indicates an IN_MOVED_TO event. We can also confirm that the configs have been reloaded by making sure that the (cds|lds)_update_success metrics have been incremented from the initial value of 1:

$ curl -s 172.18.0.125:29901/stats/prometheus \
  | grep -Pe '^(envoy_cluster_manager_cds_update_success|envoy_listener_manager_lds_update_success)'
envoy_cluster_manager_cds_update_success{} 2
envoy_listener_manager_lds_update_success{} 2

At least, that’s how I managed to make it work on Envoy running on Docker on my local machine. However, those exact steps were not working when Envoy was running on Kubernetes.

The Setup on Kubernetes

Due to one reason or another, we don’t update the config files by updating the contents of Envoy’s ConfigMap. Instead, we:

Have an init container that copies config files from the ConfigMap into an emptyDir volume,
Mount that emptyDir volume to Envoy’s container and configure Envoy to load config files from that volume,
Set up a sidecar container that will periodically update config files in the emptyDir volume. This sidecar is also responsible for triggering config reloads with the inotify MOVED_TO event.

This sounds unnecessarily complex, but I swear there’s a reason for it (whether the reason is a good one is out of the scope of this post).

The sidecar container is running a simple bash script like this:

while true; do
    # Update the config files. The touch commands are a placeholder for our
    # actual implementation.
    touch "${ENVOY_CONFIG_DIR}/cds.yml"
    touch "${ENVOY_CONFIG_DIR}/lds.yml"

    # Reload envoy by triggering a MOVED_TO inotify event
    touch "/tmp/envoy-reload.1.txt"
    mv "/tmp/envoy-reload.1.txt" "${ENVOY_CONFIG_DIR}/envoy-reload.2.txt"

    # Delay
    sleep 300
done

$ENVOY_CONFIG_DIR refers to the mount path of the emptyDir volume, which is also the directory watched by Envoy.

However, this setup does not work, despite being practically identical to the proof-of-concept running in Docker. Envoy failed to detect any MOVED_TO event (indicated by the lack of logs from watcher_impl.cc), and thus did not reload its configs.

`inotify` and Mounts

After spending an embarrassing amount of time googling the issue, I found out that the issue was caused by /tmp and ${ENVOY_CONFIG_DIR} (which is backed by an emptyDir volume) being located in different mount points (overlay on / and /dev/mapper/... on /etc/envoy respectively):

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
overlay                12G  7.8G  2.9G  74% /           <-- /tmp is here,
tmpfs                  64M     0   64M   0% /dev
/dev/mapper/...        12G  7.8G  2.9G  74% /etc/envoy  <-- but /etc/envoy is here
shm                    64M     0   64M   0% /dev/shm
tmpfs                 3.9G   12K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                 2.0G     0  2.0G   0% /proc/asound
tmpfs                 2.0G     0  2.0G   0% /proc/acpi
tmpfs                 2.0G     0  2.0G   0% /proc/scsi
tmpfs                 2.0G     0  2.0G   0% /sys/firmware

That is important because inotify move events (INMOVE, IN_MOVED_FROM, IN_MOVED_TO) are triggered by rename(2), which does not work when the source and destination paths are on two different mount points. In that case, rename(2) will return an error instead:

ERRORS

  EXDEV  oldpath and newpath are not on the same mounted
         filesystem.  (Linux permits a filesystem to be mounted at
         multiple points, but rename() does not work across
         different mount points, even if the same filesystem is
         mounted on both.)

In the local Docker container used during testing, /tmp and /etc/envoy were located in the same mount point, the overlay FS mounted on /, which allows MOVED_TO events to be triggered properly during local testing.

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
overlay               126G   13G  107G  11% /      <-- both directories are here
tmpfs                  64M     0   64M   0% /dev
shm                    64M  4.0K   64M   1% /dev/shm
/dev/mapper/...       126G   13G  107G  11% /etc/hosts
tmpfs                  16G     0   16G   0% /proc/asound
tmpfs                  16G     0   16G   0% /proc/acpi
tmpfs                  16G     0   16G   0% /proc/scsi
tmpfs                  16G     0   16G   0% /sys/firmware
tmpfs                  16G     0   16G   0% /sys/devices/virtual/powercap

Thus, the workaround was pretty simple; create the trigger file inside ${ENVOY_CONFIG_DIR} instead:

 # Reload envoy by triggering a MOVED_TO inotify event
-touch "/tmp/envoy-reload.1.txt"
+touch "${ENVOY_CONFIG_DIR}/envoy-reload.1.txt"

-mv "/tmp/envoy-reload.1.txt" "${ENVOY_CONFIG_DIR}/envoy-reload.2.txt"
+mv "${ENVOY_CONFIG_DIR}/envoy-reload.1.txt" "${ENVOY_CONFIG_DIR}/envoy-reload.2.txt"

The Setup on Kubernetes#

inotify and Mounts#

The Setup on Kubernetes

`inotify` and Mounts