Qwen3-0.6B-T5-xxl-split

Model Description

This repository provides the components of the Qwen3-0.6B-T5-xxl model, split into two parts. This is intended for advanced users who wish to perform custom operations, such as GGUF conversion or other model architecture modifications.

Both components are provided in float32 format to ensure maximum precision for downstream tasks like quantization.

Repository Contents

/qwen_body/: Contains the fine-tuned Qwen3-0.6B model body. This is a standard Hugging Face model directory. The model weights are in float32.
/projection_head/: Contains the fine-tuned projection head as a single projection_head.pth file. This is a PyTorch state dictionary.

How to Use

To use these components, you need to load them separately and then combine them in a two-step inference process.

import torch
from torch import nn
from transformers import AutoTokenizer, AutoModel
import numpy as np

# --- 1. Load Components ---
device = "cuda"

# Load the model body
body_model = AutoModel.from_pretrained("./qwen_body").to(device)
tokenizer = AutoTokenizer.from_pretrained("./qwen_body")

# Load the projection head
# First, re-create the architecture
input_dim = body_model.config.hidden_size # 1024
hidden_dim = 2048
output_dim = 4096
head_model = nn.Sequential(
    nn.Linear(input_dim, hidden_dim), 
    nn.GELU(),
    nn.Dropout(0.1), 
    nn.Linear(hidden_dim, output_dim)
).to(device)
# Then, load the saved weights
head_model.load_state_dict(torch.load("./projection_head/projection_head.pth"))

body_model.eval()
head_model.eval()

# --- 2. Create a unified inference function ---
def get_final_embedding(text: str):
    # a) Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt").to(device)

    # b) Get the base embedding from the body model
    with torch.no_grad():
        outputs_body = body_model(**inputs)
        last_hidden_state = outputs_body.last_hidden_state
    
    # c) Perform mean pooling
    attention_mask = inputs['attention_mask']
    mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
    sum_embeddings = torch.sum(last_hidden_state * mask_expanded, 1)
    sum_mask = torch.clamp(mask_expanded.sum(1), min=1e-9)
    pooled_embedding = sum_embeddings / sum_mask
    
    # d) Pass the pooled embedding through the projection head
    with torch.no_grad():
        final_embedding = head_model(pooled_embedding)
        
    return final_embedding

# --- 3. Test the pipeline ---
prompt = "A high-tech laboratory with glowing vials and holographic displays."
embedding = get_final_embedding(prompt)

print("Inference successful!")
print(f"Output shape: {embedding.shape}")
# Expected output shape: (1, 4096)

License

This repository is licensed under the Apache license 2.0.

JusteLeo
/

Qwen3-0.6B-T5-xxl-split

Qwen3-0.6B-T5-xxl-split

Model Description

Repository Contents

How to Use

License

Model tree for JusteLeo/Qwen3-0.6B-T5-xxl-split