|
--- |
|
datasets: |
|
- ffurfaro/PixelBytes-PokemonAll |
|
language: en |
|
library_name: pytorch |
|
license: mit |
|
pipeline_tag: text-to-image |
|
tags: |
|
- image-generation |
|
- text-generation |
|
- autio-generation |
|
- multimodal |
|
--- |
|
|
|
# PixelBytes: Unified Multimodal Generation |
|
|
|
Welcome to the **PixelBytes** repository! This project features models designed to generate text, audio and images simultaneously, pixel by pixel, using a unified embedding. (only testing weight) |
|
|
|
## Overview |
|
|
|
### Key Concepts |
|
- **Image Transformer**: Generates images pixel by pixel. |
|
- **Bi-Mamba+**: A bidirectional model for time series prediction. |
|
- **MambaByte**: A selective state-space model without tokens. |
|
|
|
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency. |
|
|
|
## Dataset |
|
|
|
We use the **PixelBytes-PokemonAll** dataset, available on Hugging Face: [PixelBytes-PokemonAll](https://huggingface.co/datasets/ffurfaro/PixelBytes-PokemonAll). It contains text and image sequences of Pokémon for training our model. |
|
|
|
## Models Trained |
|
|
|
- **3 LSTM Models**: 2 Auto-regressive and 1 only predictive. |
|
|
|
Citation |
|
-------- |
|
|
|
Furfaro, F. (2024). PixelBytes: A Unified Multimodal Representation Learning Project. (https://github.com/fabienfrfr/PixelBytes) |
|
|
|
--- |
|
|
|
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects. |