Skip to content

Bug: Concurrent add() creates nodes unreachable by search() #735

@DavIvek

Description

@DavIvek

Describe the bug

When multiple threads call add() in parallel, some nodes become unreachable via search() despite being correctly stored in the index. contains() and get() confirm the nodes exist with correct data, but graph traversal never visits them.

Steps to reproduce

#include <iostream>
#include <random>
#include <thread>
#include <vector>

#include <usearch/index_dense.hpp>

using namespace unum::usearch;


int main() {
    constexpr auto num_sessions = 10;
    constexpr auto vectors_per_session = 100;
    constexpr auto total_vectors = num_sessions * vectors_per_session;
    constexpr auto dimensions = 128;

    auto index_result = index_dense_gt<>::make(metric_punned_t(dimensions, metric_kind_t::cos_k, scalar_kind_t::f32_k));
    auto& index = index_result.index;
    index.try_reserve({total_vectors, num_sessions});

    // Generate vectors
    auto all_vectors = std::vector<std::vector<float>>(total_vectors);
    auto rng = std::mt19937(42);
    auto dist = std::uniform_real_distribution<float>(-1.0f, 1.0f);
    for (auto& v : all_vectors) {
        v.resize(dimensions);
        for (auto& x : v) x = dist(rng);
    }

    // Parallel insert — jthreads auto-join at scope exit
    {
        auto threads = std::vector<std::jthread>();
        for (auto s = 0; s < num_sessions; s++) {
            threads.emplace_back([&, s] {
                auto start = s * vectors_per_session;
                for (auto i = 0; i < vectors_per_session; i++)
                    index.add(start + i, all_vectors[start + i].data(), s);
            });
        }
    }

    // Verify: search for every vector requesting ALL results
    auto misses = 0;
    for (auto i = 0; i < total_vectors; i++) {
        auto sr = index.search(all_vectors[i].data(), total_vectors);
        auto keys = std::vector<std::uint64_t>(total_vectors);
        auto distances = std::vector<float>(total_vectors);
        auto found = sr.dump_to(keys.data(), distances.data());

        auto found_self = false;
        for (auto r = 0; r < found; r++)
            if (keys[r] == i) { found_self = true; break; }
        if (!found_self) misses++;
    }

    // Confirm all nodes are stored correctly
    auto contains_ok = 0;
    for (auto i = 0; i < total_vectors; i++)
        if (index.contains(i)) contains_ok++;

    std::cout << "Search misses: " << misses << " / " << total_vectors << "\n";
    std::cout << "contains() confirms " << contains_ok << " / " << total_vectors << " present\n";

    if (misses > 0)
        std::cout << "BUG: " << misses << " nodes stored but unreachable via HNSW graph traversal\n";
    else
        std::cout << "PASS: all nodes reachable\n";
}

Expected behavior

Nodes to be reachable.

USearch version

v2.24.0

Operating System

Ubuntu 24.04

Hardware architecture

Arm

Which interface are you using?

C++ implementation

Contact Details

david.ivekovic@memgraph.io

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingv3Breaking changes planned for v3

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions