WAN LightX2V T2V LoRA Adapters (720p - All Ranks)

Complete collection of LoRA (Low-Rank Adaptation) adapters for the LightX2V 14B text-to-video generation model at 720p resolution. This repository contains all 7 rank variants (4, 8, 16, 32, 64, 128, 256) enabling flexible quality/performance trade-offs through CFG (Classifier-Free Guidance) step distillation.

πŸ“‹ Model Description

These LoRA adapters enable efficient text-to-video generation at 720p resolution (1280x720) using the powerful LightX2V T2V 14B base model. Through CFG step distillation, these adapters achieve 2-3x faster generation while maintaining high quality output. The complete rank collection (4-256) provides flexibility to optimize for speed, quality, or VRAM constraints.

Key Features:

  • 7 complete rank variants for flexible deployment
  • CFG step distillation v2 for faster inference (15-25 steps vs 50-100)
  • BF16 precision for stability and hardware optimization
  • 720p native resolution (1280x720)
  • Compatible with Diffusers and ComfyUI workflows

πŸ“ Repository Contents

This repository contains 7 LoRA adapter models totaling ~4.7GB:

wan21-lightx2v-t2v-14b-720p/
└── loras/
    └── wan/
        β”œβ”€β”€ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank4-bf16.safetensors   (45MB)
        β”œβ”€β”€ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank8-bf16.safetensors   (82MB)
        β”œβ”€β”€ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank16-bf16.safetensors  (156MB)
        β”œβ”€β”€ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank32-bf16.safetensors  (305MB)
        β”œβ”€β”€ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank64-bf16.safetensors  (602MB)
        β”œβ”€β”€ wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank128-bf16.safetensors (1.2GB)
        └── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank256-bf16.safetensors (2.4GB)

File Sizes:

  • Total repository size: ~4.7GB
  • Individual adapters: 45MB to 2.4GB
  • Recommended adapter (rank-32): 305MB

πŸ’» Hardware Requirements

Minimum Requirements (Rank 4-16)

  • GPU: NVIDIA RTX 3060 (12GB VRAM) or equivalent AMD
  • System RAM: 16GB DDR4
  • Storage: 500MB free space (individual adapter) + base model
  • OS: Windows 10/11, Linux (Ubuntu 20.04+), macOS 12+
  • Architecture: NVIDIA Ampere or newer (BF16 support)

Recommended (Rank 32-64) ⭐

  • GPU: NVIDIA RTX 4070 Ti (16GB VRAM) or RTX 3090 (24GB VRAM)
  • System RAM: 32GB DDR4/DDR5
  • Storage: 1GB free space + base model (~30GB)
  • CUDA: 11.8+ or 12.1+
  • OS: Windows 11 or Linux (Ubuntu 22.04+)

High-End (Rank 128-256)

  • GPU: NVIDIA RTX 4090 (24GB VRAM) or A100 (40GB VRAM)
  • System RAM: 64GB DDR5
  • Storage: 5GB free space (all adapters) + base model
  • Use Case: Maximum quality research/production work

VRAM Usage by Rank (720p, 24 frames)

  • Rank 4-8: ~14-15GB VRAM
  • Rank 16-32: ~15-16GB VRAM (recommended)
  • Rank 64: ~18GB VRAM
  • Rank 128: ~20GB VRAM
  • Rank 256: ~24GB VRAM (requires RTX 4090 or better)

πŸš€ Usage Examples

Basic Text-to-Video Generation (Diffusers)

from diffusers import DiffusionPipeline
import torch

# Load base LightX2V T2V 14B model
pipe = DiffusionPipeline.from_pretrained(
    "lightx2v/lightx2v-t2v-14b",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load LoRA adapter (rank-32 recommended for balanced quality/speed)
pipe.load_lora_weights(
    "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank32-bf16.safetensors"
)

# Generate 720p video from text prompt
prompt = "A serene mountain landscape at sunset with golden light, cinematic camera movement, 720p HD quality"

video = pipe(
    prompt=prompt,
    num_inference_steps=20,      # Reduced steps thanks to distillation
    guidance_scale=7.5,
    num_frames=24,               # ~3 seconds at 8 fps
    height=720,
    width=1280
).frames

# Export video file
from diffusers.utils import export_to_video
export_to_video(video, "output_720p.mp4", fps=8)

Rank Selection and Comparison

import os

# Base path to LoRA adapters
LORA_PATH = "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan"

# Select rank based on your hardware and quality needs
# Options: 4, 8, 16, 32, 64, 128, 256
rank = 32  # Recommended starting point

lora_file = f"{LORA_PATH}/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank{rank}-bf16.safetensors"
pipe.load_lora_weights(lora_file)

# Generate video
video = pipe(
    prompt="Aerial drone shot rising above misty forest at sunrise, cinematic 720p quality",
    num_inference_steps=20,
    num_frames=24
).frames

export_to_video(video, f"output_rank{rank}.mp4", fps=8)

Testing Multiple Ranks

# Compare different ranks to find optimal balance for your use case
ranks_to_test = [16, 32, 64, 128]

for rank in ranks_to_test:
    print(f"Testing rank {rank}...")

    lora_file = f"{LORA_PATH}/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank{rank}-bf16.safetensors"
    pipe.load_lora_weights(lora_file)

    video = pipe(
        prompt="Lightning storm over desert landscape, dramatic clouds, cinematic 720p",
        num_inference_steps=20,
        num_frames=24
    ).frames

    export_to_video(video, f"comparison_rank{rank}.mp4", fps=8)

Memory-Efficient Loading

# For systems with limited VRAM
pipe = DiffusionPipeline.from_pretrained(
    "lightx2v/lightx2v-t2v-14b",
    torch_dtype=torch.bfloat16,
)

# Enable CPU offloading to reduce VRAM usage
pipe.enable_model_cpu_offload()

# Use lower rank for minimal VRAM
pipe.load_lora_weights(
    "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank16-bf16.safetensors"
)

# Generate with reduced frames/resolution if needed
video = pipe(
    prompt="City street at night with neon lights, 720p quality",
    num_frames=16,  # Reduced from 24
    height=720,
    width=1280
).frames

ComfyUI Integration

  1. Copy LoRA to ComfyUI:

    ComfyUI/models/loras/wan/
    └── wan21-lightx2v-t2v-rank32-bf16.safetensors
    
  2. Workflow Setup:

    • Add "Load LoRA" node
    • Select adapter: wan21-lightx2v-t2v-rank32-bf16.safetensors
    • Set LoRA strength: 0.8-1.0
    • Connect to LightX2V T2V model nodes
    • Set resolution: 1280x720 (720p)
  3. Recommended Parameters:

    • Steps: 15-25 (distilled model)
    • CFG Scale: 6.0-8.0
    • LoRA Strength: 0.8-1.0
    • Resolution: 1280x720 (native)

πŸ“Š Model Specifications

Specification Details
Model Type LoRA Adapters for Video Diffusion
Architecture Low-Rank Adaptation (LoRA)
Base Model LightX2V T2V 14B
Training Method CFG Step Distillation v2
Precision BF16 (Brain Floating Point 16)
Resolution 720p (1280x720) native
Rank Variants 4, 8, 16, 32, 64, 128, 256 (complete set)
Parameter Count 4M to 256M (varies by rank)
File Format .safetensors (secure tensor storage)
Total Size ~4.7GB (all 7 adapters)
Pipeline Text-to-Video (T2V)
Framework Diffusers, ComfyUI compatible

Rank Selection Guide

Rank Size Quality Speed VRAM Best For
4 45MB Basic Fastest 14GB Prototyping, minimal hardware
8 82MB Good Very Fast 14GB Quick testing, low VRAM
16 156MB Better Fast 15GB Balanced efficiency
32 ⭐ 305MB High Moderate 16GB Production (recommended)
64 602MB Very High Slower 18GB Quality-focused work
128 1.2GB Excellent Slow 20GB High-fidelity output
256 2.4GB Maximum Slowest 24GB Research, maximum quality

Recommendation: Start with rank-32 for optimal quality/performance balance. Scale up (64/128/256) for maximum quality or down (16/8/4) for speed and resource constraints.

⚑ Performance Tips and Optimization

Speed Optimization

# 1. Use lower ranks for faster generation
pipe.load_lora_weights("...rank16-bf16.safetensors")

# 2. Reduce inference steps (distilled model enables this)
video = pipe(prompt, num_inference_steps=15)  # Instead of 20-25

# 3. Enable torch.compile() for PyTorch 2.0+
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

# 4. Reduce frame count for faster iteration
video = pipe(prompt, num_frames=16)  # ~2 seconds instead of 3

# 5. Use mixed precision
torch.set_float32_matmul_precision('high')

Quality Optimization

# 1. Use higher ranks for maximum quality
pipe.load_lora_weights("...rank128-bf16.safetensors")

# 2. Increase inference steps
video = pipe(prompt, num_inference_steps=25)

# 3. Tune CFG scale for your prompt
video = pipe(prompt, guidance_scale=7.5)  # 6.5-8.0 range

# 4. Add quality keywords to prompt
prompt = "A majestic eagle soaring, cinematic camera movement, 720p HD quality, professional cinematography"

# 5. Generate multiple candidates and select best
for i in range(3):
    video = pipe(prompt).frames
    export_to_video(video, f"candidate_{i}.mp4", fps=8)

Memory Optimization

# 1. Enable CPU offloading
pipe.enable_model_cpu_offload()

# 2. Use sequential CPU offload for extreme constraints
pipe.enable_sequential_cpu_offload()

# 3. Lower rank selection
pipe.load_lora_weights("...rank8-bf16.safetensors")

# 4. Clear cache between generations
torch.cuda.empty_cache()

# 5. Use attention slicing
pipe.enable_attention_slicing()

CFG Step Distillation Benefits

  • Faster inference: 15-25 steps vs 50-100 (2-3x speedup)
  • Maintained quality: Distillation preserves output fidelity
  • Better guidance: Optimized CFG behavior for prompt adherence
  • Consistency: More stable across different CFG scale values
  • Lower cost: Reduced compute requirements per generation

🎨 Prompting Best Practices

Text-to-Video (T2V) Prompting

Essential Elements:

  1. Subject: Clear description of main content
  2. Camera movement: Specify motion style and direction
  3. Lighting/atmosphere: Time of day, mood, lighting quality
  4. Quality modifiers: Include "720p", "HD", "cinematic"
  5. Temporal dynamics: Motion speed, transitions

Example Prompts for 720p

"A majestic eagle soaring through mountain valleys at golden hour, cinematic camera movement following the bird, 720p HD quality, professional wildlife cinematography"

"City street time-lapse with traffic flowing, neon lights reflecting on wet pavement, camera slowly panning right, high detail, 720p resolution, urban cinematography"

"Underwater coral reef with tropical fish swimming, gentle camera movement, clear blue water, sunlight filtering from above, smooth motion, cinematic 720p quality"

"Drone shot rising above a misty forest at sunrise, rays of light breaking through trees, smooth camera ascent, aerial cinematography, HD quality 720p"

"Lightning storm over desert landscape, dramatic clouds, time-lapse motion, cinematic wide shot, 720p quality, epic natural phenomenon"

"Cherry blossom petals falling in slow motion, gentle breeze, soft pink lighting, camera tracking downward, beautiful spring scene, 720p HD quality"

Camera Movement Keywords

  • Basic: "camera pans left/right", "camera tilts up/down"
  • Dynamic: "dolly zoom", "tracking shot", "crane shot", "steadicam"
  • Aerial: "drone shot", "aerial view", "bird's eye view", "flyover"
  • Complex: "orbit around subject", "slow push-in", "reveal shot"

Temporal Keywords

  • Speed: "slow motion", "time-lapse", "real-time", "gradual"
  • Transitions: "smooth transition", "gradual change", "progressive"
  • Motion: "gentle movement", "dynamic action", "flowing motion"

Quality Modifiers

  • "720p HD quality", "high detail", "cinematic", "professional"
  • "crisp", "clear", "sharp focus", "high fidelity"
  • "broadcast quality", "production grade"

πŸ”§ Troubleshooting

Out of Memory (OOM) Errors

Solutions:

# 1. Use lower rank adapter
pipe.load_lora_weights("...rank16-bf16.safetensors")  # or rank8, rank4

# 2. Enable CPU offloading
pipe.enable_model_cpu_offload()

# 3. Reduce frame count
video = pipe(prompt, num_frames=16)  # Instead of 24

# 4. Enable attention slicing
pipe.enable_attention_slicing()

# 5. Use sequential CPU offload (extreme cases)
pipe.enable_sequential_cpu_offload()

# 6. Clear CUDA cache between generations
import torch
torch.cuda.empty_cache()

Poor Quality Results

Diagnose and Fix:

  • Issue: Blurry or low-detail output

    • Solution: Increase rank (try 64, 128, or 256)
    • Solution: Add "720p HD quality, high detail" to prompt
  • Issue: Inconsistent motion or artifacts

    • Solution: Adjust CFG scale (try 6.5-8.0 range)
    • Solution: Increase inference steps to 25
  • Issue: Poor prompt adherence

    • Solution: Increase guidance_scale to 8.0
    • Solution: Make prompt more specific and descriptive
  • Issue: Wrong resolution output

    • Solution: Explicitly set height=720, width=1280

Slow Generation Speed

Optimize Performance:

# Use lower ranks
pipe.load_lora_weights("...rank4-bf16.safetensors")  # Fastest

# Reduce steps (distillation enables this)
video = pipe(prompt, num_inference_steps=15)

# Fewer frames
video = pipe(prompt, num_frames=16)

# Enable torch.compile (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet)

# Use xformers memory efficient attention
pipe.enable_xformers_memory_efficient_attention()

Model Loading Errors

Common Issues:

# Issue: "File not found"
# Solution: Use absolute paths with forward slashes or raw strings
lora_path = r"E:\huggingface\wan21-lightx2v-t2v-14b-720p\loras\wan\..."
# or
lora_path = "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/..."

# Issue: "BF16 not supported"
# Solution: Check GPU architecture (requires Ampere or newer)
# Fallback to FP16 if needed:
pipe = DiffusionPipeline.from_pretrained(
    "lightx2v/lightx2v-t2v-14b",
    torch_dtype=torch.float16  # Instead of bfloat16
)

# Issue: "CUDA out of memory on load"
# Solution: Use CPU offloading before loading
pipe.enable_model_cpu_offload()

πŸ“„ License

These LoRA adapters follow the license terms of the LightX2V base model. Please review the base model license for usage restrictions:

Important: Verify license compliance for your intended use case (commercial, research, etc.) with the base model license.

πŸ“– Citation

If you use these LoRA adapters in your research or projects, please cite:

@software{wan21_lightx2v_t2v_lora_720p,
  title={WAN LightX2V T2V LoRA Adapters for 720p Video Generation},
  author={WAN Team},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/wan21-lightx2v-t2v-14b-720p}},
  note={CFG Step Distillation LoRA adapters (ranks 4-256) for LightX2V T2V 14B}
}

@software{lightx2v_base_model,
  title={LightX2V: Text-to-Video Generation Model},
  author={LightX2V Team},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/lightx2v}}
}

πŸ”— Related Resources

πŸ™ Acknowledgments

  • LightX2V Team for the exceptional T2V 14B base model
  • WAN Team for LoRA adapter development and CFG distillation
  • Hugging Face for hosting infrastructure and diffusers library
  • Community contributors for testing, feedback, and improvements

πŸ“§ Support and Contact

For issues or questions:

  • Model-specific issues: Open an issue in this repository
  • Base model questions: See LightX2V documentation
  • Technical support: Diffusers GitHub issues

πŸ“‹ Summary

Complete 720p T2V LoRA Collection:

  • βœ… 7 rank variants: 4, 8, 16, 32, 64, 128, 256 (complete set)
  • βœ… Total size: ~4.7GB (all adapters included)
  • βœ… Resolution: 720p (1280x720) native
  • βœ… Precision: BF16 for stability and performance
  • βœ… Speed: 2-3x faster than non-distilled (15-25 steps)
  • βœ… Flexibility: Choose rank for quality/speed/VRAM optimization
  • βœ… Recommended: Rank-32 (305MB) for balanced production use
  • βœ… Framework: Compatible with Diffusers and ComfyUI

Key Advantages:

  • Complete rank collection from minimal (45MB) to maximum (2.4GB)
  • CFG step distillation for efficient generation
  • Native 720p resolution for HD video output
  • Flexible deployment across different hardware configurations
  • Production-ready with comprehensive documentation

Last Updated: October 2024 Repository Version: v1.1 Model Version: CFG Step Distillation v2 Total Repository Size: ~4.7GB (7 adapters) Recommended Rank: 32 (305MB, 16GB VRAM) Primary Use Case: Text-to-video generation at 720p with flexible quality/performance trade-offs

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including wangkanai/wan21-lightx2v-t2v-14b-720p