strudel-fim-5m
A 5.5M-parameter fill-in-the-middle (FIM) model for Strudel live-coded music patterns. Given a prefix and suffix of a Strudel program plus a // LABEL: description comment, it generates the instrument block that belongs between them.
Model
| Params | 5,548,800 (~5.5M) |
| Architecture | Decoder-only transformer: RMSNorm, GeLU MLP, tied I/O embeddings, causal SDPA |
d_model |
320 |
n_layers |
4 |
n_heads |
5 (d_head = 64) |
d_ff |
1280 |
max_seq_len |
1536 |
| Vocab | 435 (256 bytes + 4 control + 3 FIM sentinels + 172 Strudel function tokens) |
| FIM layout | PSM β <FIM_PRE> prefix <FIM_SUF> suffix <FIM_MID> middle <EOT> |
| Released weights | EMA shadow (decay 0.999) |
Training
Three-phase schedule, all multimask (5 prefix/suffix split variants per block) on Claude-generated Strudel pieces. Total ~45 min on one RTX 5080 (16 GB).
| Phase | Dataset | Steps | LR peak | Val loss (EMA) |
|---|---|---|---|---|
| 0 β pretrain | sonnet multimask (53K examples) | 20,000 | 3e-4 | 0.2200 |
| 1 β continue | sonnet + haiku multimask (86K examples) | 20,000 | 3e-4 | 0.1963 |
| 2 β sonnet fine-tune | sonnet multimask (53K examples) | 5,000 | 1e-4 | 0.1651 |
Loss is computed only on tokens after <FIM_MID> (inclusive of <EOT>) so the model learns when to stop.
Files
best.ptβ PyTorch checkpoint ({model, model_cfg, step, val_loss, from_ema})config.jsonβ hyperparameters + training metadatastrudel_tokens.pyβ tokenizer (byte-level + Strudel function tokens)train.py/train_infill.pyβ model + training loopinfill_generate.pyβ FIM decodingserve_infill.pyβ HTTP server for the Strudel REPL
Usage
Clone the repo and serve locally:
hf download hidude562/strudel-fim-5m --local-dir ./strudel-fim-5m
cd ./strudel-fim-5m
python3 serve_infill.py --ckpt best.pt --port 8081
Then point your Strudel editor at http://localhost:8081.
Request shape:
POST /infill
{
"text": "<full Strudel program containing a `// LABEL: description` line>",
"line": <1-indexed line number of the label comment>,
"temperature": 0.4,
"top_k": 10,
"max_new_tokens": 120
}
Response:
{"text": "<full program with the masked block filled in>", "ms": 1039, "request_id": "..."}
Streaming variant available at /infill-stream (Server-Sent Events).
Intended use
Drop-in FIM completion for the Strudel live-coding editor. The model is tiny and CPU-inferable for toy prompts but is tuned for a single RTX-class GPU (~1 s latency at max_new_tokens=120).
Limitations
- Only trained on Claude-generated patterns β real human Strudel idiom may differ.
- Context window 1536 bytes; longer programs are prefix-truncated server-side.
- Output quality degrades on prompts whose shape is far from the training distribution (e.g. single-block programs with no surrounding stack). Works best on typical multi-block
stack(...)compositions. - Val loss of 0.1651 is on the held-out 10% of Claude-generated pieces; it is not a guarantee of musical quality.
Citation
Internal project β no paper. If you use it, link back to this repo.
- Downloads last month
- 22