Skip to content

Synthesis & hypotheses

Beyond storing facts, knomit actively maintains and grows the corpus. All of it runs through one engine, driven by either the MCP tools (knomit_review, knomit_hypothesize) or the HTTP synthesis-run endpoints. The work is work-stealing: it borrows cycles from the calling model, one work item at a time, rather than running a separate headless service.

A session presents one work item at a time; the model responds, the next item is served, until the phase completes. Sessions track three independent axes:

  • status — lifecycle: active · completed · abandoned
  • phase — workflow: work · reflect · done (advanced by atomic CAS transitions)
  • effort — the discovery dial: normal · medium · high

knomit_review — prune · distill · reflect

Section titled “knomit_review — prune · distill · reflect”
StageWhat it does
Prune (dedup)Detects near-duplicate facts and merges them. Tiebreak: a non-hypothesis always wins; then higher confidence; then more sources. Domains and entities are unioned.
Distill (synthesis)Clusters related facts and distills them into a higher-order synthesis fact. Evidence weight uses SumProductNorm = Σ(c·s) / (Σ(c·s)+1); hypothesis sources are excluded from the weight.
Reflect (methodology)Reflects on hypothesis→outcome transitions to record reasoning lessons as methodology facts. Reinforcement appends the methodology’s path to the transition fact’s refs — git is the only source of truth, no side-channel counter. New-methodology proposals are hard-capped (KNOMIT_REFLECT_PROPOSE_CAP, default 1) and gated by a novelty/cosine floor (KNOMIT_REFLECT_NOVELTY_THRESHOLD).

knomit_review does not generate new hypotheses.

Distillation is recursive: synthesis facts can themselves be clustered and distilled again at greater depth, RAPTOR-style. This runs through the same work-item queue; the item’s priority orders the depth, so deeper summaries build on shallower ones without a separate scheduler.

knomit_hypothesize — generate predictions

Section titled “knomit_hypothesize — generate predictions”

Walks synthesis facts on the agent branch and, per item, lets the model decide whether to write a falsifiable hypothesis fact (skipping is the expected outcome for most). It is a distinct, user-initiated operation — never an auto-follow-up to review. On a later dedup collision with a confirmed observation, the hypothesis is retracted and the observation links to it via refs.

When the model reasons, knomit can surface relevant past methodology. RelevantMethodology ranks methodology facts by a composite score (0.6·vector + 0.4·tag_overlap, floored at KNOMIT_METHODOLOGY_MIN_SCORE, default 0.15). Methodology facts are identified by type=methodology, never by path. The server never auto-cites — it injects candidate methodology into the prompt and the model decides.

Synthesis and distillation use an LLM, configured by [llm] (default provider gemini, model gemini-2.5-flash). Embeddings are a separate, local model — see Embeddings and Configuration.