Spaces:

MCP-1st-Birthday
/

Agentic-Codenames-Arena

Running

App Files Files Community

lucadipalma commited on 5 days ago

Commit

2d2d677

1 Parent(s): 983e23f

update readme, add video

Browse files

Files changed (4) hide show

README.md +102 -1
graph.png +0 -0
pages/home.py +2 -10
support/game_settings.py +43 -24

README.md CHANGED Viewed

@@ -13,4 +13,105 @@ tags:
   - mcp-in-action-track-creative
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

   - mcp-in-action-track-creative
 ---
+# 🧠 Agentic Codenames Arena
+![Meme](assets/meme.png)
+**Watch, or join, LLMs battling it out in Codenames.**
+[Demo on YouTube](https://youtu.be/DKIfJ-j-GEg?si=sRXrr5XtP0MOvq1T)
+[My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08)
+---
+## 🧩 What This App Does
+**Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*.
+Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of:
+* **1 Boss**: Provides the clue and clue number for each turn.
+* **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
+* **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.
+The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.
+Below is the LangGraph diagram illustrating how the different roles communicate during each turn:
+![LangGraph Architecture](graph.png)
+You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues.
+---
+## 🤖 How It Works
+### **LLM Teams**
+Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace...
+Each model plays autonomously using its own reasoning chain and game strategy.
+### **Two Gameplay Modalities**
+#### **1️⃣ Observation Mode — Watch AIs Battle**
+Sit back and spectate.
+See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses.
+You'll see:
+* Model-to-model conversations
+* Reasoning traces
+* Turn-by-turn decisions
+* How each team coordinates across multiple rounds
+Perfect for AI benchmarking, research, or just entertainment.
+#### **2️⃣ Human Boss Mode — Enter the Fight**
+Become the Boss for either team and give your own clue + number.
+Your AI teammates will interpret your hint and take their guesses.
+---
+## 🧠 Why It’s Interesting
+* **Compare LLM reasoning styles:**
+  Watch how different models interpret associations, analogies, and subtle semantic cues.
+* **Analyze team dynamics:**
+  Some models coordinate beautifully. Others… not so much.
+  Observe emergent cooperation, miscommunication, or unexpected strategies.
+* **Experiment with human–AI collaboration:**
+  Test how effective your clues are with LLM teammates.
+  Try pushing the limits with creative, cryptic, or minimalist hints.
+---
+## 🕹️ Main Features
+* **Create & customize teams** using any mix of LLMs
+* **Switch between AI vs AI** and **Human vs AI** modes
+* **Detailed per-turn logs** for all model decisions
+* **Transparent reasoning chains**
+* **Interactive UI** for watching matches play out
+* **Match history & analytics dashboard**
+---
+## 📊 Stats & Analytics
+All games played in the Arena are stored in a database.
+The Stats section of the app includes:
+* **Model win/loss rates** across all recorded matches
+* **Performance comparisons** between model families (OpenAI vs Google vs …)
+* **Historical match logs** for replay & analysis
+* **Leaderboards** highlighting the best-performing models
+This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.

graph.png DELETED Viewed

Binary file (36.2 kB)

pages/home.py CHANGED Viewed

@@ -1,19 +1,11 @@
 import gradio as gr
-from support.game_settings import APP_DESCRIPTION, GAME_RULES_HTML
 with gr.Blocks(fill_width=True) as demo:
     # Rules section with HTML
     with gr.Row(elem_id="row_description", equal_height=True):
-        gr.Markdown(APP_DESCRIPTION, elem_id="app_description")
         gr.HTML(GAME_RULES_HTML, elem_id="rules_accordion")
-    # with gr.Row():
-    #     gr.HTML("""
-    #         <div style="text-align: center; margin: 2rem 0;">
-    #             <p style="font-size: 1.2rem;">Ready to start playing?</p>
-    #             <p style="color: #666;">Navigate to the <strong>Play</strong> page to begin your game!</p>
-    #         </div>
-    #     """)
 if __name__ == "__main__":
     demo.launch()

 import gradio as gr
+from support.game_settings import APP_DESCRIPTION_HTML, GAME_RULES_HTML
 with gr.Blocks(fill_width=True) as demo:
     # Rules section with HTML
     with gr.Row(elem_id="row_description", equal_height=True):
+        gr.HTML(APP_DESCRIPTION_HTML, elem_id="app_description")
         gr.HTML(GAME_RULES_HTML, elem_id="rules_accordion")
 if __name__ == "__main__":
     demo.launch()

support/game_settings.py CHANGED Viewed

@@ -1,26 +1,45 @@
-APP_DESCRIPTION = """
-### 🧩 What This App Does
-This dashboard lets you watch (or join!) teams of Large Language Models (LLMs) play **Codenames** against each other.
-Two teams — **Red** and **Blue** — face off in a 4v4 format. Each team has a **Boss** and **three Agents** working together to identify their team’s words before the other side does.
-### 🤖 How It Works
-* **LLM Teams:** You can assemble teams using different LLMs (e.g., GPT, Claude, Gemini, or OpenSource models...).
-* **Human Mode:** You can also jump in as a **Boss** yourself, giving clues to your AI teammates and seeing how well they interpret your hints.
-* **Observation Mode:** Prefer to just watch? Sit back and enjoy the game unfold, analyzing how different models reason, cooperate, and sometimes hilariously misfire.
-### 🧠 Why It’s Interesting
-* **Compare LLM reasoning styles:** See how different models interpret subtle associations and language cues.
-* **Team Dynamics:** Watch how collaboration (or confusion) emerges between AIs when they have to coordinate across multiple turns.
-* **Human-AI Interaction:** Experiment with leading a team of LLMs and discover how clearly (or creatively) you need to communicate to win.
-### 🕹️ Main Features
-* Create and customize teams with any available LLMs.
-* Switch between **AI vs AI** and **Human vs AI** modes.
-* View reasoning and chat logs for each model’s decisions.
 """
 GAME_RULES_HTML = """
@@ -509,7 +528,7 @@ ALL_MODELS = sorted({
 # Custom header HTML
 custom_header = """
 <div class="custom-navbar">
-    <div class="navbar-title">🕵️ Agentic Codenames</div>
     <div class="navbar-links">
         <a href="#" class="nav-link active" data-tab-id="home_id">Home</a>
         <a href="#" class="nav-link" data-tab-id="play_id">Play</a>

+APP_DESCRIPTION_HTML = """
+<div style="display: flex; flex-direction: column; gap: 20px;">
+    <div><h3>🎥 Watch the Demo</h3></div>
+    <div style="display: flex; justify-content: center; margin-top: 20px;">
+        <iframe
+            width="560"
+            height="315"
+            src="https://www.youtube.com/embed/DKIfJ-j-GEg"
+            frameborder="0"
+            style="border-radius:12px; width:100%; max-width:560px;"
+            allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+            allowfullscreen>
+        </iframe>
+    </div>
+    <div>
+        <h3>🧩 What This App Does</h3>
+        <p>This dashboard lets you watch (or join!) teams of Large Language Models (LLMs) play <strong>Codenames</strong> against each other.
+        Two teams — <strong>Red</strong> and <strong>Blue</strong> — face off in a 4v4 format. Each team has a <strong>Boss</strong> and <strong>three Agents</strong> working together to identify their team's words before the other side does.</p>
+        <h3>🤖 How It Works</h3>
+        <ul>
+            <li><strong>LLM Teams:</strong> You can assemble teams using different LLMs (e.g., GPT, Claude, Gemini, or OpenSource models...).</li>
+            <li><strong>Human Mode:</strong> You can also jump in as a <strong>Boss</strong> yourself, giving clues to your AI teammates and seeing how well they interpret your hints.</li>
+            <li><strong>Observation Mode:</strong> Prefer to just watch? Sit back and enjoy the game unfold, analyzing how different models reason, cooperate, and sometimes hilariously misfire.</li>
+        </ul>
+        <h3>🧠 Why It's Interesting</h3>
+        <ul>
+            <li><strong>Compare LLM reasoning styles:</strong> See how different models interpret subtle associations and language cues.</li>
+            <li><strong>Team Dynamics:</strong> Watch how collaboration (or confusion) emerges between AIs when they have to coordinate across multiple turns.</li>
+            <li><strong>Human-AI Interaction:</strong> Experiment with leading a team of LLMs and discover how clearly (or creatively) you need to communicate to win.</li>
+            <li><strong>Benchmarking & Analytics:</strong> All games are stored in a database. The Stats section includes model win/loss rates, performance comparisons between model families and leaderboards</li>
+        </ul>
+        <h3>🕹️ Main Features</h3>
+        <ul>
+            <li>Create and customize teams with any available LLMs.</li>
+            <li>Switch between <strong>AI vs AI</strong> and <strong>Human&AI vs AI</strong> modes.</li>
+            <li>View reasoning and chat logs for each model's decisions.</li>
+        </ul>
+    </div>
+</div>
 """
 GAME_RULES_HTML = """
 # Custom header HTML
 custom_header = """
 <div class="custom-navbar">
+    <div class="navbar-title">🕵️ Agentic Codenames Arena</div>
     <div class="navbar-links">
         <a href="#" class="nav-link active" data-tab-id="home_id">Home</a>
         <a href="#" class="nav-link" data-tab-id="play_id">Play</a>