THUDM
/

VisionReward-Image-bf16

Text Generation

Model card Files Files and versions Community

VisionReward-Image-bf16 / README.md

CodeZzz's picture

Initiate

90b5e1f 18 days ago

|

history blame contribute delete

1.7 kB

	---
	license: other
	license_name: cogvlm2
	license_link: https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B/blob/main/LICENS

	language:
	- ens
	pipeline_tag: text-generation
	tags:
	- chat
	- cogvlm2

	inference: false
	---
	# VisionReward-Image

	## Introduction
	We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
	Here, we present the model of VisionReward-Image.

	## Merging and Extracting Checkpoint Files
	Use the following command to merge the split files into a single `.tar` file and then extract it into the specified directory:

	```sh
	cat ckpts/split_part_* > ckpts/visionreward_image.tar
	tar -xvf ckpts/visionreward_image.tar
	```

	## Using this model
	You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).
	> This model utilizes bf16 precision parameters and requires the use of the sat (SwissArmyTransformer) library for invocation. For the fp32 version of the model, please refer to the following link: [https://huggingface.co/THUDM/VisionReward-Image-bf16](https://huggingface.co/THUDM/VisionReward-Image-bf16)