strudel-fim-5m

A 5.5M-parameter fill-in-the-middle (FIM) model for Strudel live-coded music patterns. Given a prefix and suffix of a Strudel program plus a // LABEL: description comment, it generates the instrument block that belongs between them.

Model

Params 5,548,800 (~5.5M)
Architecture Decoder-only transformer: RMSNorm, GeLU MLP, tied I/O embeddings, causal SDPA
d_model 320
n_layers 4
n_heads 5 (d_head = 64)
d_ff 1280
max_seq_len 1536
Vocab 435 (256 bytes + 4 control + 3 FIM sentinels + 172 Strudel function tokens)
FIM layout PSM β€” <FIM_PRE> prefix <FIM_SUF> suffix <FIM_MID> middle <EOT>
Released weights EMA shadow (decay 0.999)

Training

Three-phase schedule, all multimask (5 prefix/suffix split variants per block) on Claude-generated Strudel pieces. Total ~45 min on one RTX 5080 (16 GB).

Phase Dataset Steps LR peak Val loss (EMA)
0 β€” pretrain sonnet multimask (53K examples) 20,000 3e-4 0.2200
1 β€” continue sonnet + haiku multimask (86K examples) 20,000 3e-4 0.1963
2 β€” sonnet fine-tune sonnet multimask (53K examples) 5,000 1e-4 0.1651

Loss is computed only on tokens after <FIM_MID> (inclusive of <EOT>) so the model learns when to stop.

Files

  • best.pt β€” PyTorch checkpoint ({model, model_cfg, step, val_loss, from_ema})
  • config.json β€” hyperparameters + training metadata
  • strudel_tokens.py β€” tokenizer (byte-level + Strudel function tokens)
  • train.py / train_infill.py β€” model + training loop
  • infill_generate.py β€” FIM decoding
  • serve_infill.py β€” HTTP server for the Strudel REPL

Usage

Clone the repo and serve locally:

hf download hidude562/strudel-fim-5m --local-dir ./strudel-fim-5m
cd ./strudel-fim-5m
python3 serve_infill.py --ckpt best.pt --port 8081

Then point your Strudel editor at http://localhost:8081.

Request shape:

POST /infill
{
  "text": "<full Strudel program containing a `// LABEL: description` line>",
  "line": <1-indexed line number of the label comment>,
  "temperature": 0.4,
  "top_k": 10,
  "max_new_tokens": 120
}

Response:

{"text": "<full program with the masked block filled in>", "ms": 1039, "request_id": "..."}

Streaming variant available at /infill-stream (Server-Sent Events).

Intended use

Drop-in FIM completion for the Strudel live-coding editor. The model is tiny and CPU-inferable for toy prompts but is tuned for a single RTX-class GPU (~1 s latency at max_new_tokens=120).

Limitations

  • Only trained on Claude-generated patterns β€” real human Strudel idiom may differ.
  • Context window 1536 bytes; longer programs are prefix-truncated server-side.
  • Output quality degrades on prompts whose shape is far from the training distribution (e.g. single-block programs with no surrounding stack). Works best on typical multi-block stack(...) compositions.
  • Val loss of 0.1651 is on the held-out 10% of Claude-generated pieces; it is not a guarantee of musical quality.

Citation

Internal project β€” no paper. If you use it, link back to this repo.

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support