Qwen3-0.6B-T5-xxl-split
Model Description
This repository provides the components of the Qwen3-0.6B-T5-xxl
model, split into two parts. This is intended for advanced users who wish to perform custom operations, such as GGUF conversion or other model architecture modifications.
Both components are provided in float32 format to ensure maximum precision for downstream tasks like quantization.
Repository Contents
- /qwen_body/: Contains the fine-tuned
Qwen3-0.6B
model body. This is a standard Hugging Face model directory. The model weights are infloat32
. - /projection_head/: Contains the fine-tuned projection head as a single
projection_head.pth
file. This is a PyTorch state dictionary.
How to Use
To use these components, you need to load them separately and then combine them in a two-step inference process.
import torch
from torch import nn
from transformers import AutoTokenizer, AutoModel
import numpy as np
# --- 1. Load Components ---
device = "cuda"
# Load the model body
body_model = AutoModel.from_pretrained("./qwen_body").to(device)
tokenizer = AutoTokenizer.from_pretrained("./qwen_body")
# Load the projection head
# First, re-create the architecture
input_dim = body_model.config.hidden_size # 1024
hidden_dim = 2048
output_dim = 4096
head_model = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.GELU(),
nn.Dropout(0.1),
nn.Linear(hidden_dim, output_dim)
).to(device)
# Then, load the saved weights
head_model.load_state_dict(torch.load("./projection_head/projection_head.pth"))
body_model.eval()
head_model.eval()
# --- 2. Create a unified inference function ---
def get_final_embedding(text: str):
# a) Tokenize the input text
inputs = tokenizer(text, return_tensors="pt").to(device)
# b) Get the base embedding from the body model
with torch.no_grad():
outputs_body = body_model(**inputs)
last_hidden_state = outputs_body.last_hidden_state
# c) Perform mean pooling
attention_mask = inputs['attention_mask']
mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
sum_embeddings = torch.sum(last_hidden_state * mask_expanded, 1)
sum_mask = torch.clamp(mask_expanded.sum(1), min=1e-9)
pooled_embedding = sum_embeddings / sum_mask
# d) Pass the pooled embedding through the projection head
with torch.no_grad():
final_embedding = head_model(pooled_embedding)
return final_embedding
# --- 3. Test the pipeline ---
prompt = "A high-tech laboratory with glowing vials and holographic displays."
embedding = get_final_embedding(prompt)
print("Inference successful!")
print(f"Output shape: {embedding.shape}")
# Expected output shape: (1, 4096)
License
This repository is licensed under the Apache license 2.0.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for JusteLeo/Qwen3-0.6B-T5-xxl-split
Base model
Qwen/Qwen3-0.6B-Base
Finetuned
Qwen/Qwen3-Embedding-0.6B
Finetuned
JusteLeo/Qwen3-0.6B-T5-xxl