lucadipalma
commited on
Commit
·
2d2d677
1
Parent(s):
983e23f
update readme, add video
Browse files- README.md +102 -1
- graph.png +0 -0
- pages/home.py +2 -10
- support/game_settings.py +43 -24
README.md
CHANGED
|
@@ -13,4 +13,105 @@ tags:
|
|
| 13 |
- mcp-in-action-track-creative
|
| 14 |
---
|
| 15 |
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
- mcp-in-action-track-creative
|
| 14 |
---
|
| 15 |
|
| 16 |
+
|
| 17 |
+
# 🧠 Agentic Codenames Arena
|
| 18 |
+
|
| 19 |
+

|
| 20 |
+
|
| 21 |
+
**Watch, or join, LLMs battling it out in Codenames.**
|
| 22 |
+
|
| 23 |
+
[Demo on YouTube](https://youtu.be/DKIfJ-j-GEg?si=sRXrr5XtP0MOvq1T)
|
| 24 |
+
|
| 25 |
+
[My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08)
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## 🧩 What This App Does
|
| 30 |
+
|
| 31 |
+
**Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*.
|
| 32 |
+
Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of:
|
| 33 |
+
|
| 34 |
+
* **1 Boss**: Provides the clue and clue number for each turn.
|
| 35 |
+
* **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
|
| 36 |
+
* **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.
|
| 40 |
+
|
| 41 |
+
Below is the LangGraph diagram illustrating how the different roles communicate during each turn:
|
| 42 |
+
|
| 43 |
+

|
| 44 |
+
|
| 45 |
+
You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues.
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## 🤖 How It Works
|
| 50 |
+
|
| 51 |
+
### **LLM Teams**
|
| 52 |
+
|
| 53 |
+
Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace...
|
| 54 |
+
Each model plays autonomously using its own reasoning chain and game strategy.
|
| 55 |
+
|
| 56 |
+
### **Two Gameplay Modalities**
|
| 57 |
+
|
| 58 |
+
#### **1️⃣ Observation Mode — Watch AIs Battle**
|
| 59 |
+
|
| 60 |
+
Sit back and spectate.
|
| 61 |
+
See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses.
|
| 62 |
+
|
| 63 |
+
You'll see:
|
| 64 |
+
|
| 65 |
+
* Model-to-model conversations
|
| 66 |
+
* Reasoning traces
|
| 67 |
+
* Turn-by-turn decisions
|
| 68 |
+
* How each team coordinates across multiple rounds
|
| 69 |
+
|
| 70 |
+
Perfect for AI benchmarking, research, or just entertainment.
|
| 71 |
+
|
| 72 |
+
#### **2️⃣ Human Boss Mode — Enter the Fight**
|
| 73 |
+
|
| 74 |
+
Become the Boss for either team and give your own clue + number.
|
| 75 |
+
Your AI teammates will interpret your hint and take their guesses.
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
## 🧠 Why It’s Interesting
|
| 80 |
+
|
| 81 |
+
* **Compare LLM reasoning styles:**
|
| 82 |
+
Watch how different models interpret associations, analogies, and subtle semantic cues.
|
| 83 |
+
|
| 84 |
+
* **Analyze team dynamics:**
|
| 85 |
+
Some models coordinate beautifully. Others… not so much.
|
| 86 |
+
Observe emergent cooperation, miscommunication, or unexpected strategies.
|
| 87 |
+
|
| 88 |
+
* **Experiment with human–AI collaboration:**
|
| 89 |
+
Test how effective your clues are with LLM teammates.
|
| 90 |
+
Try pushing the limits with creative, cryptic, or minimalist hints.
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## 🕹️ Main Features
|
| 95 |
+
|
| 96 |
+
* **Create & customize teams** using any mix of LLMs
|
| 97 |
+
* **Switch between AI vs AI** and **Human vs AI** modes
|
| 98 |
+
* **Detailed per-turn logs** for all model decisions
|
| 99 |
+
* **Transparent reasoning chains**
|
| 100 |
+
* **Interactive UI** for watching matches play out
|
| 101 |
+
* **Match history & analytics dashboard**
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
## 📊 Stats & Analytics
|
| 106 |
+
|
| 107 |
+
All games played in the Arena are stored in a database.
|
| 108 |
+
The Stats section of the app includes:
|
| 109 |
+
|
| 110 |
+
* **Model win/loss rates** across all recorded matches
|
| 111 |
+
* **Performance comparisons** between model families (OpenAI vs Google vs …)
|
| 112 |
+
* **Historical match logs** for replay & analysis
|
| 113 |
+
* **Leaderboards** highlighting the best-performing models
|
| 114 |
+
|
| 115 |
+
This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.
|
| 116 |
+
|
| 117 |
+
|
graph.png
DELETED
|
Binary file (36.2 kB)
|
|
|
pages/home.py
CHANGED
|
@@ -1,19 +1,11 @@
|
|
| 1 |
import gradio as gr
|
| 2 |
-
from support.game_settings import
|
| 3 |
|
| 4 |
with gr.Blocks(fill_width=True) as demo:
|
| 5 |
# Rules section with HTML
|
| 6 |
with gr.Row(elem_id="row_description", equal_height=True):
|
| 7 |
-
gr.
|
| 8 |
gr.HTML(GAME_RULES_HTML, elem_id="rules_accordion")
|
| 9 |
|
| 10 |
-
# with gr.Row():
|
| 11 |
-
# gr.HTML("""
|
| 12 |
-
# <div style="text-align: center; margin: 2rem 0;">
|
| 13 |
-
# <p style="font-size: 1.2rem;">Ready to start playing?</p>
|
| 14 |
-
# <p style="color: #666;">Navigate to the <strong>Play</strong> page to begin your game!</p>
|
| 15 |
-
# </div>
|
| 16 |
-
# """)
|
| 17 |
-
|
| 18 |
if __name__ == "__main__":
|
| 19 |
demo.launch()
|
|
|
|
| 1 |
import gradio as gr
|
| 2 |
+
from support.game_settings import APP_DESCRIPTION_HTML, GAME_RULES_HTML
|
| 3 |
|
| 4 |
with gr.Blocks(fill_width=True) as demo:
|
| 5 |
# Rules section with HTML
|
| 6 |
with gr.Row(elem_id="row_description", equal_height=True):
|
| 7 |
+
gr.HTML(APP_DESCRIPTION_HTML, elem_id="app_description")
|
| 8 |
gr.HTML(GAME_RULES_HTML, elem_id="rules_accordion")
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
if __name__ == "__main__":
|
| 11 |
demo.launch()
|
support/game_settings.py
CHANGED
|
@@ -1,26 +1,45 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
"""
|
| 25 |
|
| 26 |
GAME_RULES_HTML = """
|
|
@@ -509,7 +528,7 @@ ALL_MODELS = sorted({
|
|
| 509 |
# Custom header HTML
|
| 510 |
custom_header = """
|
| 511 |
<div class="custom-navbar">
|
| 512 |
-
<div class="navbar-title">🕵️ Agentic Codenames</div>
|
| 513 |
<div class="navbar-links">
|
| 514 |
<a href="#" class="nav-link active" data-tab-id="home_id">Home</a>
|
| 515 |
<a href="#" class="nav-link" data-tab-id="play_id">Play</a>
|
|
|
|
| 1 |
+
APP_DESCRIPTION_HTML = """
|
| 2 |
+
<div style="display: flex; flex-direction: column; gap: 20px;">
|
| 3 |
+
<div><h3>🎥 Watch the Demo</h3></div>
|
| 4 |
+
<div style="display: flex; justify-content: center; margin-top: 20px;">
|
| 5 |
+
<iframe
|
| 6 |
+
width="560"
|
| 7 |
+
height="315"
|
| 8 |
+
src="https://www.youtube.com/embed/DKIfJ-j-GEg"
|
| 9 |
+
frameborder="0"
|
| 10 |
+
style="border-radius:12px; width:100%; max-width:560px;"
|
| 11 |
+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
|
| 12 |
+
allowfullscreen>
|
| 13 |
+
</iframe>
|
| 14 |
+
</div>
|
| 15 |
+
<div>
|
| 16 |
+
<h3>🧩 What This App Does</h3>
|
| 17 |
+
<p>This dashboard lets you watch (or join!) teams of Large Language Models (LLMs) play <strong>Codenames</strong> against each other.
|
| 18 |
+
Two teams — <strong>Red</strong> and <strong>Blue</strong> — face off in a 4v4 format. Each team has a <strong>Boss</strong> and <strong>three Agents</strong> working together to identify their team's words before the other side does.</p>
|
| 19 |
+
|
| 20 |
+
<h3>🤖 How It Works</h3>
|
| 21 |
+
<ul>
|
| 22 |
+
<li><strong>LLM Teams:</strong> You can assemble teams using different LLMs (e.g., GPT, Claude, Gemini, or OpenSource models...).</li>
|
| 23 |
+
<li><strong>Human Mode:</strong> You can also jump in as a <strong>Boss</strong> yourself, giving clues to your AI teammates and seeing how well they interpret your hints.</li>
|
| 24 |
+
<li><strong>Observation Mode:</strong> Prefer to just watch? Sit back and enjoy the game unfold, analyzing how different models reason, cooperate, and sometimes hilariously misfire.</li>
|
| 25 |
+
</ul>
|
| 26 |
+
|
| 27 |
+
<h3>🧠 Why It's Interesting</h3>
|
| 28 |
+
<ul>
|
| 29 |
+
<li><strong>Compare LLM reasoning styles:</strong> See how different models interpret subtle associations and language cues.</li>
|
| 30 |
+
<li><strong>Team Dynamics:</strong> Watch how collaboration (or confusion) emerges between AIs when they have to coordinate across multiple turns.</li>
|
| 31 |
+
<li><strong>Human-AI Interaction:</strong> Experiment with leading a team of LLMs and discover how clearly (or creatively) you need to communicate to win.</li>
|
| 32 |
+
<li><strong>Benchmarking & Analytics:</strong> All games are stored in a database. The Stats section includes model win/loss rates, performance comparisons between model families and leaderboards</li>
|
| 33 |
+
</ul>
|
| 34 |
+
|
| 35 |
+
<h3>🕹️ Main Features</h3>
|
| 36 |
+
<ul>
|
| 37 |
+
<li>Create and customize teams with any available LLMs.</li>
|
| 38 |
+
<li>Switch between <strong>AI vs AI</strong> and <strong>Human&AI vs AI</strong> modes.</li>
|
| 39 |
+
<li>View reasoning and chat logs for each model's decisions.</li>
|
| 40 |
+
</ul>
|
| 41 |
+
</div>
|
| 42 |
+
</div>
|
| 43 |
"""
|
| 44 |
|
| 45 |
GAME_RULES_HTML = """
|
|
|
|
| 528 |
# Custom header HTML
|
| 529 |
custom_header = """
|
| 530 |
<div class="custom-navbar">
|
| 531 |
+
<div class="navbar-title">🕵️ Agentic Codenames Arena</div>
|
| 532 |
<div class="navbar-links">
|
| 533 |
<a href="#" class="nav-link active" data-tab-id="home_id">Home</a>
|
| 534 |
<a href="#" class="nav-link" data-tab-id="play_id">Play</a>
|