Skip to content
knomit

Concepts

How knomit works

knomit is a distributed, decentralized knowledge base built from facts on git. A handful of ideas explain the whole system — each one falls out of a single decision: make git the only source of truth, and represent knowledge as facts, not documents.

A fact is a markdown file — not a document

Knowledge in knomit is a fact: one concise, atomic claim — not a chunk of an ingested document. Each is a plain markdown file: YAML frontmatter for structured metadata, a markdown body for the claim. The file is the fact — no database rows, no binary blobs, no opaque embeddings.

Confidence, domains, entities, and provenance refs travel in the frontmatter. A confidence is a degree of belief that can rise and fall; refs relate the fact to others, forming a graph. Because it's just a file, humans edit it with any text editor and agents write it over MCP — both paths land at the same file, through the same operations.

kb/technology/ai/agents/124c68ec.md
---
type: synthesis
confidence: 0.72
domain: [ai, agents, governance]
entities: [OpenAI, KPMG, Anthropic]
refs:
  - knomit:/kb/technology/ai/trends/73f515b5.md
---

By mid-2026 a consistent pattern emerges across independent sources: agentic AI is displacing chatbots as the primary mode of enterprise AI use, simultaneously driving workforce and safety governance responses.

Epistemic vs pragmatic

Every fact is one of two kinds: epistemic facts describe what is — observations, concepts, syntheses, hypotheses — knowledge to weigh as evidence, with a confidence that can move; pragmatic facts prescribe what to do — policies and heuristics — rules to follow or lean on.

The leaf type refines this further (observation, principle, pattern, synthesis, hypothesis, methodology, and more), but the epistemic/pragmatic split is the load-bearing one: it is the difference between knowing and acting.

Ontology & inheritance

The ontology is a directory tree under kb/ that is not just organization — it carries meaning: a fact placed at a higher level applies to everything beneath it, so a fact at kb/invariants/ is inherited by kb/invariants/concurrency/branch-lock/.

The ontology is configurable. knomit ships with two: General Knowledge — a broad taxonomy derived from Wikipedia's main topic classifications, covering technology, science, geography, history, and more — and Source Code Knowledge — a codebase taxonomy for AI agents, with topics like invariants, conventions, decisions, and gotchas.

How knowledge clusters

Clustering is emergent structure the author never declared: facts group by meaning — embedding similarity puts facts about the same thing together, wherever they were filed — and by classification — shared ontology paths, domains, and entities group facts by how they were declared.

The distill pass synthesizes each cluster into a synthesis fact, then re-clusters and distills again, up to depth 3. The result is a hierarchy of increasingly abstract knowledge: raw observations at the leaves, higher-order insight at the top.

Subsumption — never learned twice

When a fact is written via knomit_learn, knomit runs a near-duplicate check in the same category using embedding similarity before committing. If a match is found, the incoming fact subsumes the existing one — or is absorbed by it — producing a single fact that carries refs to both sources. Two chunks that say the same thing never pile up.

There is a special case worth knowing: a hypothesis is subsumed by a newer observation when the world catches up to the prediction. The fact survives, its type transitions, and the git history records the exact moment evidence closed the loop.

Peers, branches & consensus

knomit is distributed and decentralized by construction. Every peer — human or agent — operates on a long-lived personal branch (agent/<id>, derived from machine hostname plus a short hash of its key). All learn, update, and retract operations land on that branch. No peer writes main directly.

Consensus is reached deliberately: a peer's facts are reviewed, approved, and merged into main — by a Librarian agent, a CI policy, or a human merge. No peer merges another's branch directly; main is the single point of agreement. Each peer then pulls main and merges it locally, inheriting the agreed truth instead of re-deriving it. Learning effort is shared, not re-paid.

Provenance & signing

Every write — learn, update, retract, subsume, sync — is a commit authored under an identity-carrying address. Agent commits use <agent-id>+<operation>@agents.knomit.io as the author; humans use their own email with +operation subaddressing. Commits are signed with the agent's Ed25519 key.

That makes the (who, what, when, why) tuple cryptographic rather than hand-waved. The same key that signs commits also names the branch and authenticates remote sync — one identity, end to end.

The temporal graph

knomit treats git history as a first-class temporal axis, not an implementation detail. Each commit is a moment of belief: the state of the fact graph as some agent understood it at that instant. The git log of a fact is its epistemic history — how the claim evolved, when confidence shifted, when sources were added.

So you can time-travel. Ask what was known as-of any commit, and the entire graph rewinds: every ref resolves to whatever its target was at that point in time, even if it has since been updated or retracted. The graph is a record of what was believed and when — never silently overwritten with "now".

The knomit: URI scheme

Refs anchor a fact to its source material — the evidence behind the claim. A ref points to another fact in the same knowledge base, or to any web resource: an article, a paper, a specific commit.

FormMeaning
knomit:/kb/…Fact in the current knowledge base
https://…Any web resource — article, paper, commit

Synthesis & hypotheses

Review (knomit_review) is a session-based maintenance loop with three passes: prune — remove redundant or stale facts; distill — cluster related facts and derive a synthesis fact (a higher-order claim that no individual source fact makes on its own); reflect — record methodology facts from resolved hypothesis transitions. Synthesis produces new knowledge from what the corpus already holds.

Hypothesize (knomit_hypothesize) is a separate, explicit pass over synthesis facts. For each one it decides whether a hypothesis is warranted — a falsifiable, forward-looking prediction with a concrete settlement criterion. Skipping is the expected default; a hypothesis is only written when it fills a genuinely load-bearing gap that new evidence could close.

Origin & discovery

Every fact records how it came to exist — its origin: authored (written or asserted), distilled (synthesized from sources), or discovered (emergent). It is orthogonal to type and kind: a discovered fact is still a normal synthesis or hypothesisorigin only records that knomit inferred it.

Most systems retrieve what is similar to a query. knomit also discovers: an effort dial seeds from bridges — facts sharing a domain or entity yet sitting in different similarity clusters — and proposes the keystone those bridges imply. Load-bearing facts hide in exactly the cross-cluster links that similarity-only retrieval is structurally blind to. How discovery works →

similarity cluster A similarity cluster B shared token domain / entity · different clusters fact A fact B E — keystone unstated · origin: discovered
Embedding similarity only ever draws the dense within-cluster edges — it is structurally blind to the cross-cluster token. The bridge is that missed link; the keystone is the load-bearing fact it implies — the one nobody wrote down precisely because it underwrites things that look unrelated.

The MCP operational loop

MCP is not an optional add-on — it is the primary interface for humans (via Claude Code and editors) and agents alike. Seven tools cover the full fact lifecycle; their descriptions carry all the behavioral guidance a model needs, with no prompt scaffolding required.

Each agent connects to a branch-scoped endpoint, so reads and writes land on its own branch automatically — isolation is structural, not enforced by convention. Three profiles (code, chat, generic) tailor the tool descriptions to the kind of client connecting.

Ready to go deeper?