Instructions to use CowCorpus/Cluster3-Takeover-Llava with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CowCorpus/Cluster3-Takeover-Llava with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CowCorpus/Cluster3-Takeover-Llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("CowCorpus/Cluster3-Takeover-Llava")
model = AutoModelForImageTextToText.from_pretrained("CowCorpus/Cluster3-Takeover-Llava")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CowCorpus/Cluster3-Takeover-Llava with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CowCorpus/Cluster3-Takeover-Llava"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster3-Takeover-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CowCorpus/Cluster3-Takeover-Llava

SGLang

How to use CowCorpus/Cluster3-Takeover-Llava with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CowCorpus/Cluster3-Takeover-Llava" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster3-Takeover-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CowCorpus/Cluster3-Takeover-Llava" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CowCorpus/Cluster3-Takeover-Llava",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CowCorpus/Cluster3-Takeover-Llava with Docker Model Runner:
```
docker model run hf.co/CowCorpus/Cluster3-Takeover-Llava
```

ZhanqiuG commited on Feb 7

Commit

6542e01

verified ·

1 Parent(s): 1619bf1

Create README.md

Browse files

Files changed (1) hide show

README.md +104 -0

README.md ADDED Viewed

	@@ -0,0 +1,104 @@

+---
+license: llama3
+language:
+- en
+base_model:
+- lmms-lab/llama3-llava-next-8b
+- CowCorpus/CowCorpus-llama3-llava-next-8b
+pipeline_tag: text-generation
+tags:
+- text-generation
+- agent
+- cowcorpus
+- llava
+- personalization
+- user-adaptation
+metrics:
+- accuracy
+- f1
+- perfect-timing-score
+library_name: transformers
+---
+# Model Card for CowCorpus/UserGroup3_final_fixed_llava
+<!-- Provide a quick summary of what the model is/does. -->
+This model is a **specialized fine-tune** of the general [CowCorpus-Llava](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b) model.
+It was specifically further fine-tuned on **Cluster 3 - Takeover User** data from the **CowCorpus** dataset to adapt to the specific intervention preferences and behavioral patterns of this user group.
+This model is designed for the task of **Human Intervention Prediction** in collaborative web navigation. Unlike standard autonomous agents,
+this model predicts *when* **Takeover** user (Cluster 3) needs to take control from an AI agent. It utilizes multimodal inputs (screenshots, DOM trees, and action history)
+to distinguish between safe autonomous execution and moments requiring human error correction, preference alignment, or assistance.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** CowCorpus Team (Huq et al.)
+- **Model type:** Multimodal Causal Language Model
+- **Parent Model:** [CowCorpus/CowCorpus-llama3-llava-next-8b](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b)
+- **Base model:** [lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)
+- **Language:** English
+- **License:** [Llama 3 Community License Agreement](https://www.llama.com/llama3/license/)
+- **Paper:** *Modeling Distinct Human Interaction in Web Agents*
+- **Repository:** [GitHub: oaishi/CowCorpus](https://github.com/oaishi/CowCorpus)
+### Input Data
+The model is trained on a rich, multimodal state representation:
+1.  **Visual Screenshot:** The pixel-level view of the current webpage.
+2.  **UI Structure (AX Tree):** The accessibility tree (textual representation of DOM).
+3.  **Past Trajectory:** The history of actions taken by the agent/human so far.
+4.  **Proposed Next Action:** The action that the autonomous agent *intends* to take. The model evaluates if this intent is erroneous.
+## How to Get Started
+For inference code, prompt templates, and setup instructions, please refer to our [GitHub Repository](https://github.com/oaishi/CowCorpus).
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The model underwent a two-stage training process:
+1.  **Stage 1 (General Adaptation):** Fine-tuned on the complete CowCorpus dataset.
+2.  **Stage 2 (User Personalization):** Further fine-tuned on the **User Cluster 3 subset** of CowCorpus, consists of 26 trajectories and 131 steps. (P10, P13, P18)
+**User Cluster 2 Characteristics:**
+*   **Data Source:** A subset of the collaborative trajectories specific to User Group 3.
+*   **Behavioral Profile:** **Takeover** user, occasional interventions, but almost exclusively at the very end of the task, and once they step in, they do not hand the control back to the agent.
+### Training Configuration
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+- **Hyperparameters:**
+  - Learning Rate: Linear decay from 1e-5 to ~2e-9
+  - Epochs: 6
+  - Global Steps: 120
+  - Batch Size: 1
+  - Precision: bfloat16
+## Evaluation: Cross-Cluster Personalization
+We evaluate the model using the **Perfect Timing Score (PTS)**, a metric designed to measure the temporal accuracy of intervention predictions.
+Because this is a personalized model, we report **Cross-Cluster PTS**. This measures how well the model (trained on Cluster 3) performs on its own test data versus test data from other user clusters.
+High performance on the diagonal (matching train/test groups) indicates successful personalization.
+### Cross-Cluster PTS Heatmap
+*The table below displays the PTS values. Rows represent the User Cluster the model was trained on, and Columns represent the User Cluster data it was tested on.*
+| Trained On (Model) | Tested On: Collaborative (User 0) | Tested On: Hands-on (User 2) | Tested On: **Takeover (User 3)** |
+| :--- | :---: | :---: | :---: |
+| Collaborative | **0.187** | 0.130 | 0.058 |
+| Hands-on | 0.417 | **0.583** | 0.468 |
+| Takeover | 0.000 | **0.027** | 0.009 |
+*Note: All models are evaluated in a zero-shot setting without reasoning.*
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+If you use this model or dataset, please cite our work: Paper incoming