Yongxin-Guo
/

trace-ft-charades

videollama2_mistral

video temporal grounding

dense video caption

video highlight detection

Model card Files Files and versions Community

Yongxin-Guo commited on Oct 10

Commit

ee74ab3

•

1 Parent(s): 9f1b404

Create README.md

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- mistralai/Mistral-7B-Instruct-v0.2
+tags:
+- video temporal grounding
+- dense video caption
+- video highlight detection
+---
+<h2 align="center"> <a href="https://arxiv.org/abs/2410.05643">TRACE: Temporal Grounding Video LLM via Causal Event Modeling</a></h2>
+<h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/gyxxyg/TRACE">GitHub</a> and cite our paper!</h2>
+<h5 align="center">
+## 📰 News
+- **[2024.10.10]** 🔥 Our [code](https://github.com/gyxxyg/TRACE) and [paper](https://arxiv.org/abs/2410.05643) are released!
+- **[2024.10.10]** 🔥 Our **checkpoints** are available now!
+## Overview
+In this work
+- We model the videos by a series of events, and propose causal event modeling framework to capture videos' inherent structure.
+- We present a novel task-interleaved video LLM model, TRACE, tailored to implement the causal event modeling framework through the sequential encoding/decoding of timestamps, salient scores, and textual captions.
+## Model Zoo
+| Checkpoints | Description | URL |
+| ----------- | ----------- | ----------- |
+| Initialization      | Weights initialized from VideoLLaMA2 | [trace-init](https://huggingface.co/Yongxin-Guo/trace-init) |
+| Stage-1      | Model checkpoints trained after stage-1 | [trace-stage1](https://huggingface.co/Yongxin-Guo/trace-stage1) |
+| Stage-2   | Model checkpoints trained after stage-2 | [trace](https://huggingface.co/Yongxin-Guo/trace) |
+| FT-Charades      | Fine-tuned on Charades-STA dataset | [trace-ft-charades](https://huggingface.co/Yongxin-Guo/trace-ft-charades) |
+| FT-Youcook2      | Fine-tuned on Youcook2 dataset | [trace-ft-youcook2](https://huggingface.co/Yongxin-Guo/trace-ft-youcook2) |
+| FT-QVHighlights   | Fine-tuned on QVHighlights dataset | [trace-ft-qvhighlights](https://huggingface.co/Yongxin-Guo/trace-ft-qvhighlights) |
+#### Results
+| Youcook2 (Zero-Shot) | CIDER | METEOR | SODA_c | F1 |
+| --- | --- | --- | --- | --- |
+| TRACE | 8.1 | 2.8 | 2.2 | 22.4 |
+| Charades-STA (Zero-Shot) | 0.3 | 0.5 | 0.7 | mIOU |
+| --- | --- | --- | --- | --- |
+| TRACE | 58.6 | 40.3 | 19.4 | 38.7 |
+| QVHighlights (Zero-Shot) | mAP | Hit@1 |
+| --- | --- | --- |
+| TRACE | 26.8 | 42.7
+| ActivityNet-DVC | CIDER | METEOR | SODA_c | F1 |
+| --- | --- | --- | --- | --- |
+| TRACE | 25.9 | 6.0 | 6.4 | 39.3 |
+| ActivityNet-MR | 0.3 | 0.5 | 0.7 | mIOU |
+| --- | --- | --- | --- | --- |
+| TRACE | 53.0 | 37.7 | 24.0 | 39.0 |