File size: 4,969 Bytes
b04c02d
9be4956
 
e9d351d
9be4956
 
f138742
9be4956
f138742
9be4956
f138742
9be4956
f138742
9be4956
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f138742
9be4956
f138742
9be4956
60276d5
818d1c3
9be4956
 
 
30cc052
 
 
 
 
9be4956
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
TITLE = """<h1 align="center" id="space-title">TravelPlanner Leaderboard</h1>"""

INTRODUCTION_TEXT = """
TravelPlanner is a benchmark crafted for evaluating language agents in tool-use and complex planning within multiple constraints. (See our [paper](https://arxiv.org/pdf/2402.01622.pdf) for more details.)

## Data
In TravelPlanner, for a given query, language agents are expected to formulate a comprehensive plan that includes transportation, daily meals, attractions, and accommodation for each day.
For constraints, from the perspective of real world applications, we design three types of them: Environment Constraint, Commonsense Constraint, and Hard Constraint.
TravelPlanner comprises 1,225 queries in total. The number of days and hard constraints are designed to test agents' abilities across both the breadth and depth of complex planning.

TravelPlanner data can be found in [this dataset](https://huggingface.co/datasets/osunlp/TravelPlanner). 

## Submission Guidelines for TravelPlanner
Participants are invited to submit results for both validation and testing phases. The submissions will be evaluated based on several metrics: delivery rate, commonsense constraint pass rate (micro/macro), hard constraint pass rate (micro/macro), and the final pass rate.

### Format of Submission:
Submissions must be in the form of a JSON-line file. Each line should adhere to the following structure:
```
{"idx":0,"query":"Natural Language Query","plan":[{"day": 1, "current_city": "from [City A] to [City B]", "transportation": "Flight Number: XXX, from A to B", "breakfast": "Name, City", "attraction": "Name, City;Name, City;...;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, {"day": 2, "current_city": "City B", "transportation": "-", "breakfast": "Name, City", "attraction": "Name, City;Name, City;", "lunch": "Name, City", "dinner": "Name, City", "accommodation": "Name, City"}, ...]}
```
Explanation of Fields:
#### day:
Description: Indicates the specific day in the itinerary.
Format: Enter the numerical value representing the sequence of the day within the travel plan. For instance, '1' for the first day, '2' for the second day, and so on.

#### current city:
Description: Indicates the city where the traveler is currently located.
Format: When there is a change in location, use "from [City A] to [City B]" to denote the transition. If remaining in the same city, simply use the city's name (e.g., "City A").

#### transportation:
Description: Specifies the mode of transportation used.
Format: For flights, include the details in the format "Flight Number: XXX, from [City A] to [City B]". For self-driven or taxi travel, use "self-driving/taxi, from [City A] to [City B]". If there is no travel between cities on that day, use "-".

#### breakfast, lunch, and dinner:
Description: Details about dining arrangements.
Format: Use "Name, City" to specify the chosen restaurant and its location. If a meal is not planned, use "-".

#### attraction:
Description: Information about attractions visited.
Format: List attractions as "Name, City". If visiting multiple attractions, separate them with a semicolon ";". If no attraction is planned, use "-".

Please refer to [this](https://huggingface.co/datasets/osunlp/TravelPlanner/resolve/main/example_submission.jsonl?download=true) for example submission file. 

Submission made by our team are labelled "TravelPlanner Team". Each submission will be automatically evaluated and scored based on the predefined metrics. You can then obtain the scores and download the detailed constraint pass rates after the evaluation.

## Show Your Results on Leaderborad
If you are interested in showing your results on our leaderboard, we invite you to reach out to us. Please send an email to [us](mailto:jianx0321@gmail.com) including the following details: evaluation mode, fondation model, tool-use strategy, planning strategy, organization, and your paper link (if available), along with your submission files. 
"""

CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = r"""@article{Xie2024TravelPlanner,
  author    = {Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su},
  title     = {TravelPlanner: A Benchmark for Real-World Planning with Language Agents},
  journal   = {arXiv preprint arXiv: 2402.01622},
  year      = {2024}
}"""


def format_error(msg):
    return f"<p style='color: red; font-size: 20px; text-align: center;'>{msg}</p>"

def format_warning(msg):
    return f"<p style='color: orange; font-size: 20px; text-align: center;'>{msg}</p>"

def format_log(msg):
    return f"<p style='color: green; font-size: 20px; text-align: center;'>{msg}</p>"

def model_hyperlink(link, model_name):
    return f'<a target="_blank" href="{link}" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">{model_name}</a>'