Skip to content

Use named Docker volumes instead of bind mounts for QLever index performance #297

@ddeboer

Description

@ddeboer

The QLever Docker task runner currently bind-mounts the host dataDir into the container. On macOS, this introduces significant I/O overhead because bind mounts go through the virtualisation filesystem layer (VirtioFS/gRPC FUSE), adding latency to every read/write.

There's already a TODO in importer.js:

// TODO: write index to named volume instead of bind mount for better performance.

In benchmarks, native QLever runs ~3x faster than Docker for the same queries on the same data (e.g. 4.1s vs 13s for a stage with 51 batches). Most of this overhead comes from filesystem I/O on bind mounts — QLever's index files are accessed with random I/O patterns that are particularly slow through the virtualisation layer.

Audience

This affects macOS and Windows users, where Docker Desktop runs containers inside a Linux VM and bind mounts cross the virtualisation boundary (VirtioFS/gRPC FUSE on macOS, 9P/VirtioFS on WSL2 on Windows). On Linux, Docker containers access bind-mounted host directories natively (no VM layer), so there's no I/O penalty.

Suggested approach

Use a named Docker volume for the QLever index and working directory instead of a bind mount. Named volumes are stored inside the Docker VM and bypass the host filesystem layer, offering near-native I/O performance.

The downloaded data files could still use a bind mount (sequential writes are less affected), or they could be copied into the volume before indexing.

Impact on output

This change only affects the QLever index files and imported data — pipeline output is unaffected. The query results flow over HTTP (localhost:7001/sparql), so they're already on the host. The data flow would be:

  1. Download data file to host (bind mount or copy into volume)
  2. QLever indexes inside the named volume (fast)
  3. QLever serves queries over HTTP (fast random I/O on the volume)
  4. Node.js receives results over HTTP and writes output to host filesystem (unchanged)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions