Instructions to use CowCorpus/Cluster3-Takeover-Llava with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CowCorpus/Cluster3-Takeover-Llava with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CowCorpus/Cluster3-Takeover-Llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("CowCorpus/Cluster3-Takeover-Llava")
model = AutoModelForImageTextToText.from_pretrained("CowCorpus/Cluster3-Takeover-Llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CowCorpus/Cluster3-Takeover-Llava with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CowCorpus/Cluster3-Takeover-Llava"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster3-Takeover-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CowCorpus/Cluster3-Takeover-Llava

SGLang

How to use CowCorpus/Cluster3-Takeover-Llava with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CowCorpus/Cluster3-Takeover-Llava" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster3-Takeover-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CowCorpus/Cluster3-Takeover-Llava" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster3-Takeover-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CowCorpus/Cluster3-Takeover-Llava with Docker Model Runner:
```
docker model run hf.co/CowCorpus/Cluster3-Takeover-Llava
```

Cluster3-Takeover-Llava / README.md

oaishi

Update README.md

57de309 verified 3 months ago

preview code

raw

history blame contribute delete

4.99 kB

	---
	license: llama3
	language:
	- en
	base_model:
	- lmms-lab/llama3-llava-next-8b
	- CowCorpus/CowCorpus-llama3-llava-next-8b
	pipeline_tag: text-generation
	tags:
	- text-generation
	- agent
	- cowcorpus
	- llava
	- personalization
	- user-adaptation
	metrics:
	- accuracy
	- f1
	- perfect-timing-score
	library_name: transformers
	---

	# Model Card for CowCorpus/Cluster3-Takeover-Llava

	<!-- Provide a quick summary of what the model is/does. -->
	This model is a specialized fine-tune of the general [CowCorpus-Llava](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b) model.

	It was specifically further fine-tuned on Cluster 3 - Takeover User data from the CowCorpus dataset to adapt to the specific intervention preferences and behavioral patterns of this user group.

	This model is designed for the task of Human Intervention Prediction in collaborative web navigation. Unlike standard autonomous agents,
	this model predicts when Takeover user (Cluster 3) needs to take control from an AI agent. It utilizes multimodal inputs (screenshots, DOM trees, and action history)
	to distinguish between safe autonomous execution and moments requiring human error correction, preference alignment, or assistance.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	- Developed by: CowCorpus Team (Huq et al.)
	- Model type: Multimodal Causal Language Model
	- Parent Model: [CowCorpus/CowCorpus-llama3-llava-next-8b](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b)
	- Base model: [lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)
	- Language: English
	- License: [Llama 3 Community License Agreement](https://www.llama.com/llama3/license/)
	- Paper: Modeling Distinct Human Interaction in Web Agents
	- Repository: [GitHub: oaishi/CowCorpus](https://github.com/oaishi/CowCorpus)

	### Input Data
	The model is trained on a rich, multimodal state representation:
	1. Visual Screenshot: The pixel-level view of the current webpage.
	2. UI Structure (AX Tree): The accessibility tree (textual representation of DOM).
	3. Past Trajectory: The history of actions taken by the agent/human so far.
	4. Proposed Next Action: The action that the autonomous agent intends to take. The model evaluates if this intent is erroneous.

	## How to Get Started

	For inference code, prompt templates, and setup instructions, please refer to our [GitHub Repository](https://github.com/oaishi/CowCorpus).

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	The model underwent a two-stage training process:
	1. Stage 1 (General Adaptation): Fine-tuned on the complete CowCorpus dataset.
	2. Stage 2 (User Personalization): Further fine-tuned on the User Cluster 3 subset of CowCorpus, consists of 26 trajectories and 131 steps.

	User Cluster 2 Characteristics:
	* Data Source: A subset of the collaborative trajectories specific to User Group 3.
	* Behavioral Profile: Takeover user, occasional interventions, but almost exclusively at the very end of the task, and once they step in, they do not hand the control back to the agent.

	### Training Configuration

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
	- Hyperparameters:
	- Learning Rate: Linear decay from 1e-5 to ~2e-9
	- Epochs: 6
	- Global Steps: 120
	- Batch Size: 1
	- Precision: bfloat16

	## Evaluation: Cross-Cluster Personalization

	We evaluate the model using the Perfect Timing Score (PTS), a metric designed to measure the temporal accuracy of intervention predictions.

	Because this is a personalized model, we report Cross-Cluster PTS. This measures how well the model (trained on Cluster 3) performs on its own test data versus test data from other user clusters.
	High performance on the diagonal (matching train/test groups) indicates successful personalization.

	### Cross-Cluster PTS Heatmap

	The table below displays the PTS values. Rows represent the User Cluster the model was trained on, and Columns represent the User Cluster data it was tested on.

	\| Trained On (Model) \| Tested On: Collaborative (User 0) \| Tested On: Hands-on (User 2) \| Tested On: Takeover (User 3) \|
	\| :--- \| :---: \| :---: \| :---: \|
	\| Collaborative \| 0.187 \| 0.130 \| 0.058 \|
	\| Hands-on \| 0.417 \| 0.583 \| 0.468 \|
	\| Takeover \| 0.000 \| 0.027 \| 0.009 \|

	Note: All models are evaluated in a zero-shot setting without reasoning.

	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
	If you use this model or dataset, please cite our work: Paper incoming