Spaces:

allenai
/

reward-bench

Running

natolambert commited on 29 days ago

Commit

9545133

•

1 Parent(s): 2057cd9

Update src/md.py

Files changed (1) hide show

src/md.py CHANGED Viewed

@@ -1,4 +1,5 @@
 from datetime import datetime
 ABOUT_TEXT = """
 We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
@@ -94,12 +95,12 @@ Lengths (mean, std. dev.) include the prompt
 For more details, see the [dataset](https://huggingface.co/datasets/allenai/reward-bench).
 """
-# Get current time formatted nicely
-current_time = datetime.now().strftime("%H:%M, %d %b %Y")
 TOP_TEXT = f"""# RewardBench: Evaluating Reward Models
 ### Evaluating the capabilities, safety, and pitfalls of reward models
-Last restart: {current_time}
-[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787) | Total models: {{}} | * Unverified models | ⚠️ Dataset Contamination
 ⚠️ Many of the top models were trained on unintentionally contaminated, AI-generated data, for more information, see this [gist](https://gist.github.com/natolambert/1aed306000c13e0e8c5bc17c1a5dd300)."""

 from datetime import datetime
+import pytz
 ABOUT_TEXT = """
 We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
 For more details, see the [dataset](https://huggingface.co/datasets/allenai/reward-bench).
 """
+# Get Pacific time zone (handles PST/PDT automatically)
+pacific_tz = pytz.timezone('America/Los_Angeles')
+current_time = datetime.now(pacific_tz).strftime("%H:%M %Z, %d %b %Y")
 TOP_TEXT = f"""# RewardBench: Evaluating Reward Models
 ### Evaluating the capabilities, safety, and pitfalls of reward models
+[Code](https://github.com/allenai/reward-bench) | [Eval. Dataset](https://huggingface.co/datasets/allenai/reward-bench) | [Prior Test Sets](https://huggingface.co/datasets/allenai/pref-test-sets) | [Results](https://huggingface.co/datasets/allenai/reward-bench-results) | [Paper](https://arxiv.org/abs/2403.13787) | Total models: {{}} | * Unverified models | ⚠️ Dataset Contamination | Last restart (PST): {current_time}
 ⚠️ Many of the top models were trained on unintentionally contaminated, AI-generated data, for more information, see this [gist](https://gist.github.com/natolambert/1aed306000c13e0e8c5bc17c1a5dd300)."""