strudel-fim-5m

A 5.5M-parameter fill-in-the-middle (FIM) model for Strudel live-coded music patterns. Given a prefix and suffix of a Strudel program plus a // LABEL: description comment, it generates the instrument block that belongs between them.

Model


Params	5,548,800 (~5.5M)
Architecture	Decoder-only transformer: RMSNorm, GeLU MLP, tied I/O embeddings, causal SDPA
`d_model`	320
`n_layers`	4
`n_heads`	5 (d_head = 64)
`d_ff`	1280
`max_seq_len`	1536
Vocab	435 (256 bytes + 4 control + 3 FIM sentinels + 172 Strudel function tokens)
FIM layout	PSM — `<FIM_PRE> prefix <FIM_SUF> suffix <FIM_MID> middle <EOT>`
Released weights	EMA shadow (decay 0.999)

Training

Three-phase schedule, all multimask (5 prefix/suffix split variants per block) on Claude-generated Strudel pieces. Total ~45 min on one RTX 5080 (16 GB).

Phase	Dataset	Steps	LR peak	Val loss (EMA)
0 — pretrain	sonnet multimask (53K examples)	20,000	3e-4	0.2200
1 — continue	sonnet + haiku multimask (86K examples)	20,000	3e-4	0.1963
2 — sonnet fine-tune	sonnet multimask (53K examples)	5,000	1e-4	0.1651

Loss is computed only on tokens after <FIM_MID> (inclusive of <EOT>) so the model learns when to stop.

Files

best.pt — PyTorch checkpoint ({model, model_cfg, step, val_loss, from_ema})
config.json — hyperparameters + training metadata
strudel_tokens.py — tokenizer (byte-level + Strudel function tokens)
train.py / train_infill.py — model + training loop
infill_generate.py — FIM decoding
serve_infill.py — HTTP server for the Strudel REPL

Usage

Clone the repo and serve locally:

hf download hidude562/strudel-fim-5m --local-dir ./strudel-fim-5m
cd ./strudel-fim-5m
python3 serve_infill.py --ckpt best.pt --port 8081

Then point your Strudel editor at http://localhost:8081.

Request shape:

POST /infill
{
  "text": "<full Strudel program containing a `// LABEL: description` line>",
  "line": <1-indexed line number of the label comment>,
  "temperature": 0.4,
  "top_k": 10,
  "max_new_tokens": 120
}

Response:

{"text": "<full program with the masked block filled in>", "ms": 1039, "request_id": "..."}

Streaming variant available at /infill-stream (Server-Sent Events).

Intended use

Drop-in FIM completion for the Strudel live-coding editor. The model is tiny and CPU-inferable for toy prompts but is tuned for a single RTX-class GPU (~1 s latency at max_new_tokens=120).

Limitations

Only trained on Claude-generated patterns — real human Strudel idiom may differ.
Context window 1536 bytes; longer programs are prefix-truncated server-side.
Output quality degrades on prompts whose shape is far from the training distribution (e.g. single-block programs with no surrounding stack). Works best on typical multi-block stack(...) compositions.
Val loss of 0.1651 is on the held-out 10% of Claude-generated pieces; it is not a guarantee of musical quality.

Citation

Internal project — no paper. If you use it, link back to this repo.

Downloads last month: 22

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support