WAN LightX2V T2V LoRA Adapters (720p - All Ranks)
Complete collection of LoRA (Low-Rank Adaptation) adapters for the LightX2V 14B text-to-video generation model at 720p resolution. This repository contains all 7 rank variants (4, 8, 16, 32, 64, 128, 256) enabling flexible quality/performance trade-offs through CFG (Classifier-Free Guidance) step distillation.
π Model Description
These LoRA adapters enable efficient text-to-video generation at 720p resolution (1280x720) using the powerful LightX2V T2V 14B base model. Through CFG step distillation, these adapters achieve 2-3x faster generation while maintaining high quality output. The complete rank collection (4-256) provides flexibility to optimize for speed, quality, or VRAM constraints.
Key Features:
- 7 complete rank variants for flexible deployment
- CFG step distillation v2 for faster inference (15-25 steps vs 50-100)
- BF16 precision for stability and hardware optimization
- 720p native resolution (1280x720)
- Compatible with Diffusers and ComfyUI workflows
π Repository Contents
This repository contains 7 LoRA adapter models totaling ~4.7GB:
wan21-lightx2v-t2v-14b-720p/
βββ loras/
βββ wan/
βββ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank4-bf16.safetensors (45MB)
βββ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank8-bf16.safetensors (82MB)
βββ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank16-bf16.safetensors (156MB)
βββ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank32-bf16.safetensors (305MB)
βββ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank64-bf16.safetensors (602MB)
βββ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank128-bf16.safetensors (1.2GB)
βββ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank256-bf16.safetensors (2.4GB)
File Sizes:
- Total repository size: ~4.7GB
- Individual adapters: 45MB to 2.4GB
- Recommended adapter (rank-32): 305MB
π» Hardware Requirements
Minimum Requirements (Rank 4-16)
- GPU: NVIDIA RTX 3060 (12GB VRAM) or equivalent AMD
- System RAM: 16GB DDR4
- Storage: 500MB free space (individual adapter) + base model
- OS: Windows 10/11, Linux (Ubuntu 20.04+), macOS 12+
- Architecture: NVIDIA Ampere or newer (BF16 support)
Recommended (Rank 32-64) β
- GPU: NVIDIA RTX 4070 Ti (16GB VRAM) or RTX 3090 (24GB VRAM)
- System RAM: 32GB DDR4/DDR5
- Storage: 1GB free space + base model (~30GB)
- CUDA: 11.8+ or 12.1+
- OS: Windows 11 or Linux (Ubuntu 22.04+)
High-End (Rank 128-256)
- GPU: NVIDIA RTX 4090 (24GB VRAM) or A100 (40GB VRAM)
- System RAM: 64GB DDR5
- Storage: 5GB free space (all adapters) + base model
- Use Case: Maximum quality research/production work
VRAM Usage by Rank (720p, 24 frames)
- Rank 4-8: ~14-15GB VRAM
- Rank 16-32: ~15-16GB VRAM (recommended)
- Rank 64: ~18GB VRAM
- Rank 128: ~20GB VRAM
- Rank 256: ~24GB VRAM (requires RTX 4090 or better)
π Usage Examples
Basic Text-to-Video Generation (Diffusers)
from diffusers import DiffusionPipeline
import torch
# Load base LightX2V T2V 14B model
pipe = DiffusionPipeline.from_pretrained(
"lightx2v/lightx2v-t2v-14b",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load LoRA adapter (rank-32 recommended for balanced quality/speed)
pipe.load_lora_weights(
"E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank32-bf16.safetensors"
)
# Generate 720p video from text prompt
prompt = "A serene mountain landscape at sunset with golden light, cinematic camera movement, 720p HD quality"
video = pipe(
prompt=prompt,
num_inference_steps=20, # Reduced steps thanks to distillation
guidance_scale=7.5,
num_frames=24, # ~3 seconds at 8 fps
height=720,
width=1280
).frames
# Export video file
from diffusers.utils import export_to_video
export_to_video(video, "output_720p.mp4", fps=8)
Rank Selection and Comparison
import os
# Base path to LoRA adapters
LORA_PATH = "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan"
# Select rank based on your hardware and quality needs
# Options: 4, 8, 16, 32, 64, 128, 256
rank = 32 # Recommended starting point
lora_file = f"{LORA_PATH}/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank{rank}-bf16.safetensors"
pipe.load_lora_weights(lora_file)
# Generate video
video = pipe(
prompt="Aerial drone shot rising above misty forest at sunrise, cinematic 720p quality",
num_inference_steps=20,
num_frames=24
).frames
export_to_video(video, f"output_rank{rank}.mp4", fps=8)
Testing Multiple Ranks
# Compare different ranks to find optimal balance for your use case
ranks_to_test = [16, 32, 64, 128]
for rank in ranks_to_test:
print(f"Testing rank {rank}...")
lora_file = f"{LORA_PATH}/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank{rank}-bf16.safetensors"
pipe.load_lora_weights(lora_file)
video = pipe(
prompt="Lightning storm over desert landscape, dramatic clouds, cinematic 720p",
num_inference_steps=20,
num_frames=24
).frames
export_to_video(video, f"comparison_rank{rank}.mp4", fps=8)
Memory-Efficient Loading
# For systems with limited VRAM
pipe = DiffusionPipeline.from_pretrained(
"lightx2v/lightx2v-t2v-14b",
torch_dtype=torch.bfloat16,
)
# Enable CPU offloading to reduce VRAM usage
pipe.enable_model_cpu_offload()
# Use lower rank for minimal VRAM
pipe.load_lora_weights(
"E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank16-bf16.safetensors"
)
# Generate with reduced frames/resolution if needed
video = pipe(
prompt="City street at night with neon lights, 720p quality",
num_frames=16, # Reduced from 24
height=720,
width=1280
).frames
ComfyUI Integration
Copy LoRA to ComfyUI:
ComfyUI/models/loras/wan/ βββ wan21-lightx2v-t2v-rank32-bf16.safetensorsWorkflow Setup:
- Add "Load LoRA" node
- Select adapter:
wan21-lightx2v-t2v-rank32-bf16.safetensors - Set LoRA strength: 0.8-1.0
- Connect to LightX2V T2V model nodes
- Set resolution: 1280x720 (720p)
Recommended Parameters:
- Steps: 15-25 (distilled model)
- CFG Scale: 6.0-8.0
- LoRA Strength: 0.8-1.0
- Resolution: 1280x720 (native)
π Model Specifications
| Specification | Details |
|---|---|
| Model Type | LoRA Adapters for Video Diffusion |
| Architecture | Low-Rank Adaptation (LoRA) |
| Base Model | LightX2V T2V 14B |
| Training Method | CFG Step Distillation v2 |
| Precision | BF16 (Brain Floating Point 16) |
| Resolution | 720p (1280x720) native |
| Rank Variants | 4, 8, 16, 32, 64, 128, 256 (complete set) |
| Parameter Count | 4M to 256M (varies by rank) |
| File Format | .safetensors (secure tensor storage) |
| Total Size | ~4.7GB (all 7 adapters) |
| Pipeline | Text-to-Video (T2V) |
| Framework | Diffusers, ComfyUI compatible |
Rank Selection Guide
| Rank | Size | Quality | Speed | VRAM | Best For |
|---|---|---|---|---|---|
| 4 | 45MB | Basic | Fastest | 14GB | Prototyping, minimal hardware |
| 8 | 82MB | Good | Very Fast | 14GB | Quick testing, low VRAM |
| 16 | 156MB | Better | Fast | 15GB | Balanced efficiency |
| 32 β | 305MB | High | Moderate | 16GB | Production (recommended) |
| 64 | 602MB | Very High | Slower | 18GB | Quality-focused work |
| 128 | 1.2GB | Excellent | Slow | 20GB | High-fidelity output |
| 256 | 2.4GB | Maximum | Slowest | 24GB | Research, maximum quality |
Recommendation: Start with rank-32 for optimal quality/performance balance. Scale up (64/128/256) for maximum quality or down (16/8/4) for speed and resource constraints.
β‘ Performance Tips and Optimization
Speed Optimization
# 1. Use lower ranks for faster generation
pipe.load_lora_weights("...rank16-bf16.safetensors")
# 2. Reduce inference steps (distilled model enables this)
video = pipe(prompt, num_inference_steps=15) # Instead of 20-25
# 3. Enable torch.compile() for PyTorch 2.0+
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
# 4. Reduce frame count for faster iteration
video = pipe(prompt, num_frames=16) # ~2 seconds instead of 3
# 5. Use mixed precision
torch.set_float32_matmul_precision('high')
Quality Optimization
# 1. Use higher ranks for maximum quality
pipe.load_lora_weights("...rank128-bf16.safetensors")
# 2. Increase inference steps
video = pipe(prompt, num_inference_steps=25)
# 3. Tune CFG scale for your prompt
video = pipe(prompt, guidance_scale=7.5) # 6.5-8.0 range
# 4. Add quality keywords to prompt
prompt = "A majestic eagle soaring, cinematic camera movement, 720p HD quality, professional cinematography"
# 5. Generate multiple candidates and select best
for i in range(3):
video = pipe(prompt).frames
export_to_video(video, f"candidate_{i}.mp4", fps=8)
Memory Optimization
# 1. Enable CPU offloading
pipe.enable_model_cpu_offload()
# 2. Use sequential CPU offload for extreme constraints
pipe.enable_sequential_cpu_offload()
# 3. Lower rank selection
pipe.load_lora_weights("...rank8-bf16.safetensors")
# 4. Clear cache between generations
torch.cuda.empty_cache()
# 5. Use attention slicing
pipe.enable_attention_slicing()
CFG Step Distillation Benefits
- Faster inference: 15-25 steps vs 50-100 (2-3x speedup)
- Maintained quality: Distillation preserves output fidelity
- Better guidance: Optimized CFG behavior for prompt adherence
- Consistency: More stable across different CFG scale values
- Lower cost: Reduced compute requirements per generation
π¨ Prompting Best Practices
Text-to-Video (T2V) Prompting
Essential Elements:
- Subject: Clear description of main content
- Camera movement: Specify motion style and direction
- Lighting/atmosphere: Time of day, mood, lighting quality
- Quality modifiers: Include "720p", "HD", "cinematic"
- Temporal dynamics: Motion speed, transitions
Example Prompts for 720p
"A majestic eagle soaring through mountain valleys at golden hour, cinematic camera movement following the bird, 720p HD quality, professional wildlife cinematography"
"City street time-lapse with traffic flowing, neon lights reflecting on wet pavement, camera slowly panning right, high detail, 720p resolution, urban cinematography"
"Underwater coral reef with tropical fish swimming, gentle camera movement, clear blue water, sunlight filtering from above, smooth motion, cinematic 720p quality"
"Drone shot rising above a misty forest at sunrise, rays of light breaking through trees, smooth camera ascent, aerial cinematography, HD quality 720p"
"Lightning storm over desert landscape, dramatic clouds, time-lapse motion, cinematic wide shot, 720p quality, epic natural phenomenon"
"Cherry blossom petals falling in slow motion, gentle breeze, soft pink lighting, camera tracking downward, beautiful spring scene, 720p HD quality"
Camera Movement Keywords
- Basic: "camera pans left/right", "camera tilts up/down"
- Dynamic: "dolly zoom", "tracking shot", "crane shot", "steadicam"
- Aerial: "drone shot", "aerial view", "bird's eye view", "flyover"
- Complex: "orbit around subject", "slow push-in", "reveal shot"
Temporal Keywords
- Speed: "slow motion", "time-lapse", "real-time", "gradual"
- Transitions: "smooth transition", "gradual change", "progressive"
- Motion: "gentle movement", "dynamic action", "flowing motion"
Quality Modifiers
- "720p HD quality", "high detail", "cinematic", "professional"
- "crisp", "clear", "sharp focus", "high fidelity"
- "broadcast quality", "production grade"
π§ Troubleshooting
Out of Memory (OOM) Errors
Solutions:
# 1. Use lower rank adapter
pipe.load_lora_weights("...rank16-bf16.safetensors") # or rank8, rank4
# 2. Enable CPU offloading
pipe.enable_model_cpu_offload()
# 3. Reduce frame count
video = pipe(prompt, num_frames=16) # Instead of 24
# 4. Enable attention slicing
pipe.enable_attention_slicing()
# 5. Use sequential CPU offload (extreme cases)
pipe.enable_sequential_cpu_offload()
# 6. Clear CUDA cache between generations
import torch
torch.cuda.empty_cache()
Poor Quality Results
Diagnose and Fix:
Issue: Blurry or low-detail output
- Solution: Increase rank (try 64, 128, or 256)
- Solution: Add "720p HD quality, high detail" to prompt
Issue: Inconsistent motion or artifacts
- Solution: Adjust CFG scale (try 6.5-8.0 range)
- Solution: Increase inference steps to 25
Issue: Poor prompt adherence
- Solution: Increase guidance_scale to 8.0
- Solution: Make prompt more specific and descriptive
Issue: Wrong resolution output
- Solution: Explicitly set height=720, width=1280
Slow Generation Speed
Optimize Performance:
# Use lower ranks
pipe.load_lora_weights("...rank4-bf16.safetensors") # Fastest
# Reduce steps (distillation enables this)
video = pipe(prompt, num_inference_steps=15)
# Fewer frames
video = pipe(prompt, num_frames=16)
# Enable torch.compile (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet)
# Use xformers memory efficient attention
pipe.enable_xformers_memory_efficient_attention()
Model Loading Errors
Common Issues:
# Issue: "File not found"
# Solution: Use absolute paths with forward slashes or raw strings
lora_path = r"E:\huggingface\wan21-lightx2v-t2v-14b-720p\loras\wan\..."
# or
lora_path = "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/..."
# Issue: "BF16 not supported"
# Solution: Check GPU architecture (requires Ampere or newer)
# Fallback to FP16 if needed:
pipe = DiffusionPipeline.from_pretrained(
"lightx2v/lightx2v-t2v-14b",
torch_dtype=torch.float16 # Instead of bfloat16
)
# Issue: "CUDA out of memory on load"
# Solution: Use CPU offloading before loading
pipe.enable_model_cpu_offload()
π License
These LoRA adapters follow the license terms of the LightX2V base model. Please review the base model license for usage restrictions:
- Base Model: LightX2V T2V 14B
- License: See https://huggingface.co/lightx2v for complete terms
Important: Verify license compliance for your intended use case (commercial, research, etc.) with the base model license.
π Citation
If you use these LoRA adapters in your research or projects, please cite:
@software{wan21_lightx2v_t2v_lora_720p,
title={WAN LightX2V T2V LoRA Adapters for 720p Video Generation},
author={WAN Team},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/wan21-lightx2v-t2v-14b-720p}},
note={CFG Step Distillation LoRA adapters (ranks 4-256) for LightX2V T2V 14B}
}
@software{lightx2v_base_model,
title={LightX2V: Text-to-Video Generation Model},
author={LightX2V Team},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/lightx2v}}
}
π Related Resources
- Base Model: LightX2V T2V 14B
- 480p I2V LoRAs: wan21-lightx2v-i2v-14b-480p (image-to-video)
- WAN Models: WAN 2.1 and WAN 2.2 video generation models
- Diffusers Documentation: https://huggingface.co/docs/diffusers
- Model Cards Guide: https://huggingface.co/docs/hub/model-cards
π Acknowledgments
- LightX2V Team for the exceptional T2V 14B base model
- WAN Team for LoRA adapter development and CFG distillation
- Hugging Face for hosting infrastructure and diffusers library
- Community contributors for testing, feedback, and improvements
π§ Support and Contact
For issues or questions:
- Model-specific issues: Open an issue in this repository
- Base model questions: See LightX2V documentation
- Technical support: Diffusers GitHub issues
π Summary
Complete 720p T2V LoRA Collection:
- β 7 rank variants: 4, 8, 16, 32, 64, 128, 256 (complete set)
- β Total size: ~4.7GB (all adapters included)
- β Resolution: 720p (1280x720) native
- β Precision: BF16 for stability and performance
- β Speed: 2-3x faster than non-distilled (15-25 steps)
- β Flexibility: Choose rank for quality/speed/VRAM optimization
- β Recommended: Rank-32 (305MB) for balanced production use
- β Framework: Compatible with Diffusers and ComfyUI
Key Advantages:
- Complete rank collection from minimal (45MB) to maximum (2.4GB)
- CFG step distillation for efficient generation
- Native 720p resolution for HD video output
- Flexible deployment across different hardware configurations
- Production-ready with comprehensive documentation
Last Updated: October 2024 Repository Version: v1.1 Model Version: CFG Step Distillation v2 Total Repository Size: ~4.7GB (7 adapters) Recommended Rank: 32 (305MB, 16GB VRAM) Primary Use Case: Text-to-video generation at 720p with flexible quality/performance trade-offs
- Downloads last month
- -