Skip to content

Embeddings

Every fact is embedded into a vector for semantic search, clustering, dedup, and discovery. Embeddings run locally and in-process (ONNX Runtime via onnxruntime_go + daulet/tokenizers) — there is no external embedding API.

PropertyValue
Default idembeddinggemma (KNOMIT_EMBED_MODEL / [embeddings] model)
Dimensions768
Max tokens2048
ONNX inputs / outputsinput_ids, attention_masksentence_embedding
Poolingnone — the export emits an already pooled + normalized sentence_embedding
Query templatetask: search result | query: {content}
Doc templatetitle: {title} | text: {content}

Source (Hugging Face onnx-community/embeddinggemma-300m-ONNX): model_fp16.onnx (+ .onnx_data weights) and tokenizer.json. A legacy nomic-v1.5 model remains in the registry for historical comparison.

Model files are cached under KNOMIT_HOME/models/<id>/ (e.g. ~/.knomit/models/embeddinggemma/). Pre-download without booting the server:

Terminal window
knomit warm-models # configured model
knomit warm-models --model embeddinggemma

The ONNX Runtime shared library is located at runtime via ONNXRUNTIME_SHARED_LIBRARY (or onnx_lib_path); it is fetched into dist/<platform>/lib/ at build time by fetchlibs.

Cosine-similarity distributions differ sharply between models — EmbeddingGemma runs much cooler than nomic. So all six retrieval thresholds are per-model fields on the model descriptor, not global constants:

ThresholdEmbeddingGemmaUsed for
Dedup0.82near-duplicate detection in review prune
ReflectNovelty0.69reject near-duplicate methodologies (KNOMIT_REFLECT_NOVELTY_THRESHOLD overrides)
SimilarTo0.18”related facts”
SearchFloor0.05recall floor for min_similarity=0
RerankHigh0.43rerank band (high)
RerankLow0.10rerank band (low)

These are derived empirically by the build-only calibrate tool, which measures a model’s geometry against a real corpus. When swapping embedding models, re-calibrate — do not reuse another model’s thresholds.