GitHub - habedi/multi-vector-hnsw: A Java implementation of HNSW with multi-vector search support

Multi-Vector HNSW

A Java implementation of HNSW with multi-vector search support

Multi-Vector HNSW is a Java library that implements the Hierarchical Navigable Small World (HNSW) algorithm with built-in support for multi-vector indexing. It lets you index and search objects represented by multiple high-dimensional vectors, using standard distance functions like Euclidean, cosine, and dot product.

Most vector search libraries assume every object has a single embedding. But in real-world use cases (like document search, multi-modal AI, or hybrid dense/sparse setups) you often have multiple embeddings per item. This library extends HNSW to support that: multi-vector indexing, custom distance aggregation, and a clean Java API.

Features

Simple and extendable API for multi-vector indexing and search
Fast, configurable, and thread-safe HNSW implementation
Built-in support for cosine, (squared) Euclidean, and dot product distances
Bulk inserts and soft delete support
Save and load support for persisting indexes to disk
Fast distance calculations using SIMD instructions via Java Vector API
Pure Java 17 implementation with no native dependencies

Getting Started

If you are using Maven, add this dependency to your pom.xml:

<dependency>
    <groupId>io.github.habedi</groupId>
    <artifactId>multi-vector-hnsw</artifactId>
    <version>0.2.0-beta</version>
</dependency>

If you are using Gradle, add this dependency to your build.gradle:

dependencies {
    implementation 'io.github.habedi:multi-vector-hnsw:0.2.0-beta'
}

Alternatively, you can download the latest JAR files directly from the releases page and add it to your classpath.

Basic Usage

Below is an example of how to create an index, add items with multiple vectors, and perform a search.

import io.github.habedi.mvhnsw.common.FloatVector;
import io.github.habedi.mvhnsw.distance.Cosine;
import io.github.habedi.mvhnsw.distance.SquaredEuclidean;
import io.github.habedi.mvhnsw.index.Index;
import io.github.habedi.mvhnsw.index.MultiVectorHNSW;
import io.github.habedi.mvhnsw.index.SearchResult;

import java.util.List;

public class SimpleExample {
    public static void main(String[] args) {

        // 1. Configure the index for items with two vectors.
        // We'll weight the first vector (squared Euclidean distance) as 70% important
        // and the second vector (Cosine distance) as 30% important.
        Index index = MultiVectorHNSW.builder()
            .withM(16)
            .withEfConstruction(200)
            .withWeightedAverageDistance()
            .addDistance(new SquaredEuclidean(), 0.7f)
            .addDistance(new Cosine(), 0.3f)
            .and()
            .build();

        // 2. Add items to the index
        index.add(1L, List.of(FloatVector.of(1.5f, 2.5f), FloatVector.of(0.9f, 0.1f)));
        index.add(2L, List.of(FloatVector.of(9.1f, 8.2f), FloatVector.of(0.2f, 0.8f)));
        index.add(3L, List.of(FloatVector.of(1.6f, 2.4f), FloatVector.of(0.8f, 0.3f)));

        // 3. Create a query and search for the top 2 nearest neighbors
        List<FloatVector> query = List.of(FloatVector.of(1.4f, 2.6f), FloatVector.of(0.7f, 0.2f));
        // The third parameter, `efSearch`, controls the accuracy/speed trade-off.
        List<SearchResult> results = index.search(query, 2, 20);

        // 4. Print the results
        System.out.println("Search results:");
        results.forEach(System.out::println);
    }
}

Output:

Search results:
SearchResult[id=1, score=0.018205724472503872] # Smaller score means closer match
SearchResult[id=3, score=0.05697077252055386]

Documentation

Project documentation is available at the docs directory.

Examples

Check out the examples directory for more usage examples.

Benchmarks

See the benches directory for information on how to run project benchmarks.

Sample Results

The table below shows benchmark results for different distance functions using the se_cs_768 dataset on a machine with 32GB RAM and an AMD Ryzen 5 7600X CPU, running on GraalVM JDK 21.

Each item is represented by three 768-dimensional vectors. The index was built with M=16 and efConstruction=200. Searches were performed with efSearch=100 to find the top 100 nearest neighbors.

For each distance function, we report:

Average Query Time: The average time in milliseconds to perform a single search.
Recall@100: How many of the top 100 true nearest neighbors were found, on average.

Distances are aggregated using a uniformly-weighted average across the three vectors.

In this setup, the average query time is ~1.17–1.37 ms, with recall around 89%.

Distance Function	Train Size	Test Size	Avg Query Time (ms)	Recall@100
Squared Euclidean	36,712	4,080	1.37	89.30%
Cosine	36,712	4,080	1.19	89.54%
Dot Product	36,712	4,080	1.17	89.19%

You can reproduce these results by running:

make bench-run BENCHMARK_DATASET=se_cs_768 ARGS="--ef-search=100"

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

Logo

Talon logo is from SVG Repo.

License

This project is available under either of the following licenses:

MIT License (LICENSE-MIT)
Apache License, Version 2.0 (LICENSE-APACHE)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
benches		benches
docs		docs
examples		examples
scripts		scripts
src		src
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
codecov.yml		codecov.yml
logo.svg		logo.svg
pom.xml		pom.xml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Multi-Vector HNSW

Features

Getting Started

Basic Usage

Documentation

Examples

Benchmarks

Sample Results

Contributing

Logo

License

About

Licenses found

Uh oh!

Releases 3

Uh oh!

Languages

License

Licenses found

habedi/multi-vector-hnsw

Folders and files

Latest commit

History

Repository files navigation

Multi-Vector HNSW

Features

Getting Started

Basic Usage

Documentation

Examples

Benchmarks

Sample Results

Contributing

Logo

License

About

Topics

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Languages