LateOn-Code-edge (GGUF Q8_0 + Projection)

Quantized GGUF conversion of lightonai/LateOn-Code-edge for use with litembeddings.

Model Details

Property Value
Base model lightonai/LateOn-Code-edge
Architecture ModernBERT (17M params)
Output dimensions 48 (after projection)
Context length 8,192 tokens
Quantization Q8_0
GGUF size 19 MB
Projection 256 → 48 (composed from two PyLate Dense layers: 256→512→48)
Use case Fast, CPU-friendly code search with late interaction (ColBERT-style)

Variants

Variant Size Quality
f16 34 MB Lossless — 100% top-1 agreement, 240/300 weighted
f32 66 MB Original precision (lossless)
Q8_0 (this repo) 19 MB 79% weighted score, 96-100% top-1 agreement, 3.5× smaller

Files

File Size Description
lightonai-lateon-code-edge-Q8_0.gguf 19 MB ModernBERT encoder in GGUF Q8_0 format
lightonai-lateon-code-edge-Q8_0.projection 49 KB Composed projection matrix (48×256, float32)

Usage with litembeddings

.load ./build/litembeddings

-- Load model with projection
SELECT lembed_model('lightonai-lateon-code-edge-Q8_0.gguf',
    '{"colbert_projection": "lightonai-lateon-code-edge-Q8_0.projection"}');

-- Generate token embeddings for code
SELECT lembed_tokens('async fn get_connection(pool: &Pool) -> Result<Connection>');

-- Code search with MaxSim
SELECT
    id, code,
    lembed_maxsim(lembed_tokens('database connection pool'), token_emb) AS score
FROM code_embeddings
ORDER BY score DESC
LIMIT 10;

Quantization Quality Benchmark

Tested across 3 codebases (jq/C, Rails/Ruby, FastAPI/Python) with 150 questions total (15 easy + 20 medium + 15 hard per codebase). Weighted scoring: easy×1, medium×2, hard×3 = 100 points per codebase, 300 total.

Aggregate Weighted Scores

Variant Weighted Score Percentage
f32 240 / 300 80.0%
f16 240 / 300 80.0%
Q8_0 237 / 300 79.0%

Per-Corpus Scores

Corpus f32 f16 Q8_0
jq (C) 66/100 66/100 63/100
Rails (Ruby) 79/100 79/100 79/100
FastAPI (Python) 95/100 95/100 95/100

Quantization Quality (Top-1 Agreement vs f32)

Corpus f16 Q8_0
jq 100.0% 96.0%
Rails 100.0% 100.0%
FastAPI 100.0% 98.0%

Key Findings

  • f16 is lossless — identical weighted score (240/300) and 100% top-1 agreement across all codebases
  • Q8_0 loses only 1% — 237/300 vs 240/300, drops only on hard queries in jq corpus
  • Q8_0 is fastest — 2.5s avg query vs 3.4s f32 vs 13.4s f16 (CPU without FP16 hardware)
  • Easy/medium questions show zero quality difference between all variants

Conversion

Converted using litembeddings' ColBERT converter with PyLate projection support:

python scripts/convert_colbert_to_gguf.py lightonai/LateOn-Code-edge ./models \
    --name lightonai-lateon-code-edge-Q8_0 --quantize q8_0
Downloads last month
31
GGUF
Model size
16.8M params
Architecture
modern-bert
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for embedme/lightonai-lateon-code-edge-Q8_0

Quantized
(3)
this model