metadata

license: cc-by-4.0
language:
  - en
pipeline_tag: any-to-any
tags:
  - multimodal
library_name: transformers

Align-Anything Chameleon 7B Plus

Introduction

Repository for Align-Anything Chameleon 7B Plus, a powerful model for text-image interleaved input and output, with further alignment by Align-Anything algorithm. This model is based on the Chameleon model, and is trained and aligned on the Align-Anything framework to further unlock its capability of image generation, and improve the alignment ability towards human preferences.

Usage

To use this model, you can refer to the Align-Anything repository for more details, including the training, inference and evaluation:

git clone https://github.com/PKU-Alignment/align-anything.git
cd align-anything/projects/text_image_to_text_image

Then follow the instructions in the README.md file to set up the environment and run the scripts.

Currently, the official Transformer repo does not support Chameleon model with image output (see this PR for more details), so we rely on a certain fork of the repo.

After installing Align-Anything and correctly set up the envrionment, you can install the forked stable version of the repo by running:

pip install git+https://github.com/htlou/transformers.git@hantao_stable_cham

If you want to generate image (pure text generation can be directly done by Transformers), you can follow the instructions in the mmsg_chameleon repo to run the inference.

git clone https://github.com/htlou/mmsg_chameleon.git
cd mmsg_chameleon

Then set up the envrionment using

pip install -e .

After setting up the envrioment, set up the correct paths in scripts/interleaved_gen.sh and then run

bash scripts/interleaved_gen.sh