OpenGVLab
/

pvt_v2_b4

Image Classification

Model card Files Files and versions

PVTv2

This is the Hugging Face PyTorch implementation of the PVTv2 model.

Model Description

The Pyramid Vision Transformer v2 (PVTv2) is a powerful, lightweight hierarchical transformer backbone for vision tasks. PVTv2 infuses convolution operations into its transformer layers to infuse properties of CNNs that enable them to learn image data efficiently. This mix transformer architecture requires no added positional embeddings, and produces multi-scale feature maps which are known to be beneficial for dense and fine-grained prediction tasks.

Vision models using PVTv2 for a backbone:

Segformer for Semantic Segmentation.
GLPN for Monocular Depth.
Deformable DETR for 2D Object Detection.
Panoptic Segformer for Panoptic Segmentation.

Downloads last month: 29

Safetensors

Model size

62.6M params

Tensor type

F32

·

Collection including OpenGVLab/pvt_v2_b4

PVT v2

Improved Baselines with Pyramid Vision Transformer • 8 items • Updated Sep 28, 2025

Papers for OpenGVLab/pvt_v2_b4

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

Paper • 2201.07436 • Published Jan 19, 2022

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Paper • 2109.03814 • Published Sep 8, 2021

PVT v2: Improved Baselines with Pyramid Vision Transformer

Paper • 2106.13797 • Published Jun 25, 2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Paper • 2105.15203 • Published May 31, 2021 • 3

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Paper • 2010.04159 • Published Oct 8, 2020 • 1