[Discover] EleutherAI's Bergson Lets You Interrogate a Model's Training Memory | T|EUM Community | T|EUM

editorialTEUM Admin

2mo ago49 vues0 réponses

[Discover] EleutherAI's Bergson Lets You Interrogate a Model's Training Memory

Influence functions at scale: finally know which training tokens made your LLM do that.

Why This Matters Right Now

Every time an LLM hallucinates, regurgitates sensitive data, or behaves unexpectedly, the honest answer to "why?" is somewhere in the training corpus. The problem is that finding it has historically cost you N full retraining runs — one per data point you want to test. For any serious model, that's not a research budget, that's a fantasy.

Bergson, a quietly-shipping library from EleutherAI, is trying to make that question actually answerable. With 51 stars and a last push dated April 2026, this is not a viral moment. It's a tool that serious ML engineers are starting to reach for before the crowd notices it exists.

What It Actually Does

Bergson implements influence functions — mathematical estimates of how much each training example contributed to a model's behavior on a specific query. Remove a training point, retrain, measure the delta. Influence functions approximate that delta without the retraining.

The library ships three tiers of methods:

Gradient cosine similarity — the fast, cheap baseline. Useful for sanity checks.
EK-FAC and TrackStar — roughly equivalent to 1 training run of compute. Correlates with leave-one-out retraining at around ρ=0.3 against leave-k-out retraining. Honest enough to be useful for data filtering and LESS-style dataset curation.
MAGIC (their 2025 paper) — the serious option. 3–5 training runs of compute, but produces per-token and per-sequence scores that hit ρ=0.9 correlation with both leave-one-out and leave-k-out retraining. That's not a toy number.

The ρ=0.9 figure for MAGIC is the headline claim worth stress-testing, but even the weaker EK-FAC/TrackStar tier at ρ=0.3 is genuinely useful for the practical task most developers actually need: "help me filter my 100GB dataset before I waste a training run on garbage."

Technical Deep-Dive

Bergson's architecture splits cleanly into two modes: train-time and post-hoc.

Train-time with MAGIC uses a custom bergson magic CLI command driven by a YAML config (see examples/magic/gpt2_wikitext_tiny.yaml in the repo). The MAGIC Trainer builds what they call a "metasmooth" model — one that's been trained in a way that makes it near-optimally attributable by backpropagating through the training process itself, computing gradients with respect to an implicit per-item weighting. This is the expensive path, but it's the one that gets you to ρ=0.9.

Post-hoc attribution is more surgical. The bergson build command constructs an on-disk gradient store with FAISS integration for KNN search. Critically, collection-time gradient compression keeps this from eating your entire SSD. A typical workflow:

bash

bergson build runs/index --model EleutherAI/pythia-14m \

--dataset NeelNanda/pile-10k \

--projection_dim 16

bergson query --index runs/index --unit_norm

The --projection_dim 16 flag is doing a lot of work there — random projection to compress gradients before storage. For in-memory queries

[Read full article on The Gap →](https://blog.teum.io/eleutherai-s-bergson-lets-you-interrogate-a-model-s-training-memory/)

#interpretability#data-attribution#influence-functions#llm-tooling#eleutherai

réponses (0)

No replies yet. Be the first!

Connectez-vous pour répondre