library_name: transformers
license: apache-2.0
pipeline_tag: image-text-to-text
datasets:
- markury/AndroAtlas
language:
- en
AndroGemma-alpha Model Card
Model page: AndroGemma-alpha
AndroGemma-alpha is a fine-tuned Vision-Language Model (VLM) based on Google's PaliGemma. The model aims to enhance the representation and understanding of male anatomy, specifically the penis, in AI models. This fine-tuning utilizes the AndroAtlas dataset, which includes both text and image pairs, to provide comprehensive training data for this purpose.
Resources and technical documentation:
Authors: Markury
Contributors: Members of The Bulge Discord server for various support, and detailed contributions to the system prompts and image sourcing.
Model information
Model summary
Description
AndroGemma-alpha is a fine-tuned version of PaliGemma, focusing on male anatomy to improve the model's understanding and representation of this underrepresented area. The dataset for fine-tuning includes a mix of text and image pairs sourced from Reddit and other non-public sources, ensuring detailed and diverse examples.
Model architecture
AndroGemma-alpha builds on the PaliGemma model, comprising a Transformer decoder and a Vision Transformer image encoder, fine-tuned with AndroAtlas. The model supports tasks like image captioning, visual question answering, and more, specific to male anatomy.
Inputs and outputs
- Input: Image and text string, such as a prompt to caption the image, or a question.
- Output: Generated text in response to the input, such as a caption of the image or an answer to a question.
How to Use
AndroGemma-alpha is best used through the MPIC (Markury's Paligemma Image Captioner) application for practical inference and integration into projects. For Python inference code, refer to the MPIC source code and adapt it to fit your needs.
Using MPIC CLI
The MPIC (Markury's Paligemma Image Captioner) CLI is the preferred method for using the AndroGemma-alpha model. For details on installation and usage, visit the MPIC repository.
Training Details
Training Data
The AndroAtlas dataset was used for training, which includes:
- Text and Image Pairs: Curated from Reddit, ensuring diverse and representative samples.
- Annotations: Detailed labels to enhance model training and understanding.
- Focus: Male anatomy, with an emphasis on the penis.
Training Procedure
The fine-tuning process involved using the first 5 batches (243 text/image pairs) of images from AndroAtlas, supplemented with approximately 150 additional image/text pairs with detailed human-captioned annotations on circumcision and erection status. The captions were generated using a specialized system prompt with GPT-4o and later refined with Llama3-70B for consistency.
For full details on the training process, refer to the training script provided in the repository.
Example Outputs
Below are some examples of images and their corresponding captions generated by AndroGemma-alpha.
caption en: "a young man with a lean physique and short dark hair, sitting comfortably with his legs slightly apart, wearing grey shorts and a visible tattoo on his arm, taking a mirror selfie with a relaxed expression, holding a smartphone in his right hand, with a light skin tone and light body hair visible on his chest and abdomen, set against a neutral indoor background with subtle lighting." |
caption en: "a headless torso of a nude man standing in a kitchen, with hands resting on his thighs, exposing his genitals, including the penis and testicles, and a visible abdominal hair line, the man has short light-colored hair on his head, and his body is covered with light body hair, the background features a white tiled wall and a white appliance." |
caption en: "a naked man standing outdoors, holding a yellow saw with a black handle, looking up at the camera with a slight smile, his short dark hair and beard visible, his muscular physique and defined abs prominent, his penis and testicles visible, his right hand resting on a wooden fence, natural light coming from behind, background featuring dense forest and trees with sparse foliage." |
Ethical Considerations
The use of AndroGemma-alpha adheres to the Gemma Prohibited Use Policy set forth by Google. Specifically, the content generated by AndroGemma-alpha is intended for academic and research purposes, focusing on enhancing AI models' understanding of male anatomy. This dataset and model are used within the bounds of scientific, educational, and documentary contexts, which are permitted under the policy.
Compliance with Gemma Prohibited Use Policy
- Non-Infringement: The dataset does not infringe on any individual's or entity's rights, including copyrighted content.
- Safe and Legal Use: The model and dataset are not used to facilitate illegal, dangerous, or malicious activities. The focus remains strictly academic.
- Non-Sexual Content: AndroGemma-alpha is not used to generate sexually explicit content for pornography or sexual gratification. Instead, the content is utilized for scientific and educational research to improve AI capabilities.
- Respect for Privacy and Rights: All data used and generated complies with privacy regulations and respects the rights and dignity of individuals.
By ensuring these guidelines are followed, AndroGemma-alpha contributes to the academic understanding of AI's interaction with underrepresented anatomical data, aligning with the ethical standards outlined in the Gemma Prohibited Use Policy.
This model card provides an overview of the AndroGemma-alpha model, including its purpose, training details, and evaluation. By using this model, you contribute to the development of more inclusive and representative AI systems.