xiuqhou
/

relation-detr-resnet50

@@ -8,176 +8,108 @@ language:
 pipeline_tag: object-detection
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
 ```
 @misc{hou2024relationdetrexploringexplicit,
@@ -191,24 +123,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 }
 ```
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 pipeline_tag: object-detection
 ---
+# Relation DETR model with ResNet-50 backbone
 ## Model Details
 ### Model Description
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66939171e3a813f3bb10e804/kNzBZZ2SFq6Wgk2ki_c5t.png)
+> This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer).
+> We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from
+> the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating
+> position relation prior as attention bias to augment object detection, following the verification of its statistical
+> significance using a proposed quantitative macroscopic correlation (MC) metric. Our approach, termed Relation-DETR,
+> introduces an encoder to construct position relation embeddings for progressive attention refinement, which further
+> extends the traditional streaming pipeline of DETR into a contrastive relation pipeline to address the conflicts
+> between non-duplicate predictions and positive supervision. Extensive experiments on both generic and task-specific
+> datasets demonstrate the effectiveness of our approach. Under the same configurations, Relation-DETR achieves a
+> significant improvement (+2.0% AP compared to DINO), state-of-the-art performance (51.7% AP for 1x and 52.1% AP
+> for 2x settings), and a remarkably faster convergence speed (over 40% AP with only 2 training epochs) than existing
+> DETR detectors on COCO val2017. Moreover, the proposed relation encoder serves as a universal plug-in-and-play component,
+>  bringing clear improvements for theoretically any DETR-like methods. Furthermore, we introduce a class-agnostic detection
+> dataset, SA-Det-100k. The experimental results on the dataset illustrate that the proposed explicit position relation
+> achieves a clear improvement of 1.3% AP, highlighting its potential towards universal object detection.
+> The code and dataset are available at [this https URL](https://github.com/xiuqhou/Relation-DETR).
+- **Developed by:** [Xiuquan Hou]
+- **Shared by:** Xiuquan Hou
+- **Model type:** Relation DETR
+- **License:** Apache-2.0
+### Model Sources
 <!-- Provide the basic links for the model. -->
+- **Repository:** [https://github.com/xiuqhou/Relation-DETR](https://github.com/xiuqhou/Relation-DETR)
+- **Paper:** [Relation DETR: Exploring Explicit Position Relation Prior for Object Detection](https://arxiv.org/abs/2407.11699)
+<!-- - **Demo [optional]:** [More Information Needed] -->
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+```python
+import torch
+import requests
+from PIL import Image
+from transformers import RelationDetrForObjectDetection, RelationDetrImageProcessor
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+image_processor = RelationDetrImageProcessor.from_pretrained("PekingU/rtdetr_r50vd")
+model = RelationDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd")
+inputs = image_processor(images=image, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)
+for result in results:
+    for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
+        score, label = score.item(), label_id.item()
+        box = [round(i, 2) for i in box.tolist()]
+        print(f"{model.config.id2label[label]}: {score:.2f} {box}")
+```
+This should output
+```python
+cat: 0.96 [343.8, 24.9, 639.52, 371.71]
+cat: 0.95 [12.6, 54.34, 316.37, 471.86]
+remote: 0.95 [40.09, 73.49, 175.52, 118.06]
+remote: 0.90 [333.09, 76.71, 369.77, 187.4]
+couch: 0.90 [0.44, 0.53, 640.44, 475.54]
+```
+## Training Details
+Relation DEtection TRansformer (Relation DETR) model is trained on [COCO 2017 object detection](https://cocodataset.org/#download) (118k annotated images) for 12 epochs (aka 1x schedule).
 ## Evaluation
+| Model               | Backbone             | Epoch |  mAP  | AP<sub>50 | AP<sub>75 | AP<sub>S | AP<sub>M | AP<sub>L |
+| ------------------- | -------------------- | :---: | :---: | :-------: | :-------: | :------: | :------: | :------: |
+| Relation DETR       | ResNet50             |  12   | 51.7  |   69.1    |   56.3    |   36.1   |   55.6   |   66.1   |
+| Relation DETR       | Swin-L<sub>(IN-22K)  |  12   | 57.8  |   76.1    |   62.9    |   41.2   |   62.1   |   74.4   |
+| Relation DETR       | ResNet50             |  24   | 52.1  |   69.7    |   56.6    |   36.1   |   56.0   |   66.5   |
+| Relation DETR       | Swin-L<sub>(IN-22K)  |  24   | 58.1  |   76.4    |   63.5    |   41.8   |   63.0   |   73.5   |
+| Relation-DETR<sup>† | Focal-L<sub>(IN-22K) | 4+24  | 63.5  |   80.8    |   69.1    |   47.2   |   66.9   |   77.0   |
+† means finetuned model on COCO after pretraining on Object365.
+## Model Architecture and Objective
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66939171e3a813f3bb10e804/UMtLjkxrwoDikUBlgj-Fc.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66939171e3a813f3bb10e804/MBbCM-zQGgUjKUmwB0yje.png)
+## Citation and BibTeX
 ```
 @misc{hou2024relationdetrexploringexplicit,
 }
 ```
+## Model Card Authors
+[xiuqhou](https://huggingface.co/xiuqhou)