ZhanqiuG commited on
Commit
6542e01
·
verified ·
1 Parent(s): 1619bf1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ language:
4
+ - en
5
+ base_model:
6
+ - lmms-lab/llama3-llava-next-8b
7
+ - CowCorpus/CowCorpus-llama3-llava-next-8b
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - text-generation
11
+ - agent
12
+ - cowcorpus
13
+ - llava
14
+ - personalization
15
+ - user-adaptation
16
+ metrics:
17
+ - accuracy
18
+ - f1
19
+ - perfect-timing-score
20
+ library_name: transformers
21
+ ---
22
+
23
+ # Model Card for CowCorpus/UserGroup3_final_fixed_llava
24
+
25
+ <!-- Provide a quick summary of what the model is/does. -->
26
+ This model is a **specialized fine-tune** of the general [CowCorpus-Llava](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b) model.
27
+
28
+ It was specifically further fine-tuned on **Cluster 3 - Takeover User** data from the **CowCorpus** dataset to adapt to the specific intervention preferences and behavioral patterns of this user group.
29
+
30
+ This model is designed for the task of **Human Intervention Prediction** in collaborative web navigation. Unlike standard autonomous agents,
31
+ this model predicts *when* **Takeover** user (Cluster 3) needs to take control from an AI agent. It utilizes multimodal inputs (screenshots, DOM trees, and action history)
32
+ to distinguish between safe autonomous execution and moments requiring human error correction, preference alignment, or assistance.
33
+
34
+ ## Model Details
35
+
36
+ ### Model Description
37
+
38
+ <!-- Provide a longer summary of what this model is. -->
39
+ - **Developed by:** CowCorpus Team (Huq et al.)
40
+ - **Model type:** Multimodal Causal Language Model
41
+ - **Parent Model:** [CowCorpus/CowCorpus-llama3-llava-next-8b](https://huggingface.co/CowCorpus/CowCorpus-llama3-llava-next-8b)
42
+ - **Base model:** [lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)
43
+ - **Language:** English
44
+ - **License:** [Llama 3 Community License Agreement](https://www.llama.com/llama3/license/)
45
+ - **Paper:** *Modeling Distinct Human Interaction in Web Agents*
46
+ - **Repository:** [GitHub: oaishi/CowCorpus](https://github.com/oaishi/CowCorpus)
47
+
48
+ ### Input Data
49
+ The model is trained on a rich, multimodal state representation:
50
+ 1. **Visual Screenshot:** The pixel-level view of the current webpage.
51
+ 2. **UI Structure (AX Tree):** The accessibility tree (textual representation of DOM).
52
+ 3. **Past Trajectory:** The history of actions taken by the agent/human so far.
53
+ 4. **Proposed Next Action:** The action that the autonomous agent *intends* to take. The model evaluates if this intent is erroneous.
54
+
55
+ ## How to Get Started
56
+
57
+ For inference code, prompt templates, and setup instructions, please refer to our [GitHub Repository](https://github.com/oaishi/CowCorpus).
58
+
59
+ ## Training Details
60
+
61
+ ### Training Data
62
+
63
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
64
+ The model underwent a two-stage training process:
65
+ 1. **Stage 1 (General Adaptation):** Fine-tuned on the complete CowCorpus dataset.
66
+ 2. **Stage 2 (User Personalization):** Further fine-tuned on the **User Cluster 3 subset** of CowCorpus, consists of 26 trajectories and 131 steps. (P10, P13, P18)
67
+
68
+ **User Cluster 2 Characteristics:**
69
+ * **Data Source:** A subset of the collaborative trajectories specific to User Group 3.
70
+ * **Behavioral Profile:** **Takeover** user, occasional interventions, but almost exclusively at the very end of the task, and once they step in, they do not hand the control back to the agent.
71
+
72
+ ### Training Configuration
73
+
74
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
75
+ - **Hyperparameters:**
76
+ - Learning Rate: Linear decay from 1e-5 to ~2e-9
77
+ - Epochs: 6
78
+ - Global Steps: 120
79
+ - Batch Size: 1
80
+ - Precision: bfloat16
81
+
82
+ ## Evaluation: Cross-Cluster Personalization
83
+
84
+ We evaluate the model using the **Perfect Timing Score (PTS)**, a metric designed to measure the temporal accuracy of intervention predictions.
85
+
86
+ Because this is a personalized model, we report **Cross-Cluster PTS**. This measures how well the model (trained on Cluster 3) performs on its own test data versus test data from other user clusters.
87
+ High performance on the diagonal (matching train/test groups) indicates successful personalization.
88
+
89
+ ### Cross-Cluster PTS Heatmap
90
+
91
+ *The table below displays the PTS values. Rows represent the User Cluster the model was trained on, and Columns represent the User Cluster data it was tested on.*
92
+
93
+ | Trained On (Model) | Tested On: Collaborative (User 0) | Tested On: Hands-on (User 2) | Tested On: **Takeover (User 3)** |
94
+ | :--- | :---: | :---: | :---: |
95
+ | Collaborative | **0.187** | 0.130 | 0.058 |
96
+ | Hands-on | 0.417 | **0.583** | 0.468 |
97
+ | Takeover | 0.000 | **0.027** | 0.009 |
98
+
99
+ *Note: All models are evaluated in a zero-shot setting without reasoning.*
100
+
101
+ ## Citation [optional]
102
+
103
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
104
+ If you use this model or dataset, please cite our work: Paper incoming