Update README.md
Browse files
README.md
CHANGED
@@ -116,6 +116,7 @@ The `AutonomousWebAgent` is a sophisticated, multi-component search and retrieva
|
|
116 |
- `ToTNode` and `ToTSearch` classes enable the agent to generate thoughts, evaluate them, and navigate through them as a tree, considering various potential paths to best answer the query.
|
117 |
- It combines MCTS and RAG to synthesize responses based on the generated thought paths.
|
118 |
|
|
|
119 |
### Training Process
|
120 |
|
121 |
The training process for the agent involves episodic learning, where it interacts with various queries from a predefined list. Each query initiates an episode, and the agent performs actions based on its learned policy:
|
@@ -292,6 +293,183 @@ After each epoch, the model is evaluated on the validation set, computing the av
|
|
292 |
### Checkpoints
|
293 |
At the end of each epoch, the model saves checkpoints of all components, enabling easy resumption or further fine-tuning as needed.
|
294 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
295 |
|
296 |
## Requirements
|
297 |
|
|
|
116 |
- `ToTNode` and `ToTSearch` classes enable the agent to generate thoughts, evaluate them, and navigate through them as a tree, considering various potential paths to best answer the query.
|
117 |
- It combines MCTS and RAG to synthesize responses based on the generated thought paths.
|
118 |
|
119 |
+
|
120 |
### Training Process
|
121 |
|
122 |
The training process for the agent involves episodic learning, where it interacts with various queries from a predefined list. Each query initiates an episode, and the agent performs actions based on its learned policy:
|
|
|
293 |
### Checkpoints
|
294 |
At the end of each epoch, the model saves checkpoints of all components, enabling easy resumption or further fine-tuning as needed.
|
295 |
|
296 |
+
## Inference Details
|
297 |
+
|
298 |
+
1. Input Processing:
|
299 |
+
- The function takes a query (text input), world model components, a root thought node, and a tokenizer.
|
300 |
+
- The query is tokenized and encoded using the provided tokenizer.
|
301 |
+
|
302 |
+
2. Inference Modes:
|
303 |
+
The function supports three inference modes:
|
304 |
+
|
305 |
+
a. 'without_world_model':
|
306 |
+
- This mode directly uses the transformer model to generate text.
|
307 |
+
- It doesn't utilize the world model components or the Tree of Thought.
|
308 |
+
- The transformer generates text autoregressively up to the specified max length.
|
309 |
+
|
310 |
+
b. 'world_model':
|
311 |
+
- This mode uses the world model components but doesn't use the Tree of Thought.
|
312 |
+
- It generates actions based on the prediction network's output.
|
313 |
+
|
314 |
+
c. 'world_model_tree_of_thought':
|
315 |
+
- This is the most comprehensive mode, using both the world model and the Tree of Thought.
|
316 |
+
|
317 |
+
3. World Model Inference Process:
|
318 |
+
For the 'world_model' and 'world_model_tree_of_thought' modes:
|
319 |
+
|
320 |
+
a. Initial State:
|
321 |
+
- The query is passed through the transformer model.
|
322 |
+
- The representation network creates an initial state representation from the transformer output.
|
323 |
+
|
324 |
+
b. Action Selection:
|
325 |
+
- For 'world_model':
|
326 |
+
- The prediction network generates policy logits from the state representation.
|
327 |
+
- Actions are selected based on the highest probabilities in the policy.
|
328 |
+
|
329 |
+
- For 'world_model_tree_of_thought':
|
330 |
+
- It uses Monte Carlo Tree Search (MCTS) to explore the Tree of Thought.
|
331 |
+
- For each MCTS iteration:
|
332 |
+
* Selection: Traverse the tree to find a leaf node.
|
333 |
+
* Expansion: Add child nodes to the leaf.
|
334 |
+
* Evaluation: Use the prediction network to estimate the value of the node.
|
335 |
+
* Backpropagation: Update the values and visit counts of nodes.
|
336 |
+
- The best action is chosen based on visit counts after MCTS.
|
337 |
+
|
338 |
+
c. State Transition:
|
339 |
+
- The selected action is applied to the current state using the dynamics network.
|
340 |
+
- This creates a new state representation for the next step.
|
341 |
+
|
342 |
+
d. Sequence Generation:
|
343 |
+
- The process repeats for the specified number of steps or until a termination condition is met.
|
344 |
+
- For the Tree of Thought approach, it continues until reaching a leaf node in the thought tree.
|
345 |
+
|
346 |
+
4. Output:
|
347 |
+
- For 'without_world_model', it returns the generated text.
|
348 |
+
- For 'world_model' and 'world_model_tree_of_thought', it returns a sequence of selected actions (thoughts).
|
349 |
+
|
350 |
+
The world model inference leverages the learned representations and dynamics to navigate the problem-solving process. The Tree of Thought approach adds structure to this process, guiding the model through a predefined hierarchy of problem-solving steps. This allows for a more structured and potentially more effective approach to complex problem-solving tasks.
|
351 |
+
|
352 |
+
Here I am utilising Trees of Thought as a structure of how to structure sets of policies, and sequences of actions. These Tree structures provide the World Model a general thought structure and pattern, similarly to how humans create thought patterns for solving certain problems (e.g. understand, describe, analyse, etc).
|
353 |
+
|
354 |
+
Here are some example Trees of Thought:
|
355 |
+
graph TD
|
356 |
+
A[Problem-Solving Process] --> B[Problem Identification]
|
357 |
+
A --> C[Problem Analysis]
|
358 |
+
A --> D[Solution Generation]
|
359 |
+
A --> E[Implementation]
|
360 |
+
A --> F[Evaluation and Adjustment]
|
361 |
+
B --> B1[Define the Problem]
|
362 |
+
B --> B2[Identify Stakeholders]
|
363 |
+
B --> B3[Determine Constraints]
|
364 |
+
B --> B4[Recognize Problem Type]
|
365 |
+
B --> B5[Historical Context]
|
366 |
+
C --> C1[Root Cause Analysis]
|
367 |
+
C --> C2[System Mapping]
|
368 |
+
C --> C3[Data Collection]
|
369 |
+
C --> C4[Impact Assessment]
|
370 |
+
C --> C5[Theoretical Framework]
|
371 |
+
D --> D1[Creative Problem Solving]
|
372 |
+
D --> D2[Analytical Approach]
|
373 |
+
D --> D3[Mathematical Computation]
|
374 |
+
D --> D4[Decision Making]
|
375 |
+
E --> E1[Action Planning]
|
376 |
+
E --> E2[Resource Allocation]
|
377 |
+
E --> E3[Change Management]
|
378 |
+
F --> F1[Verification]
|
379 |
+
F --> F2[Performance Metrics]
|
380 |
+
F --> F3[Feedback Loops]
|
381 |
+
F --> F4[Continuous Improvement]
|
382 |
+
C3 --> C3a[Quantitative Data]
|
383 |
+
C3 --> C3b[Qualitative Data]
|
384 |
+
C3 --> C3c[Data Validation]
|
385 |
+
D1 --> D1a[Divergent Thinking]
|
386 |
+
D1 --> D1b[Convergent Thinking]
|
387 |
+
D1 --> D1c[Lateral Thinking]
|
388 |
+
D2 --> D2a[Logical Reasoning]
|
389 |
+
D2 --> D2b[Critical Analysis]
|
390 |
+
D2 --> D2c[Systems Thinking]
|
391 |
+
D3 --> D3a[Basic Operations]
|
392 |
+
D3 --> D3b[Advanced Operations]
|
393 |
+
D3 --> D3c[Computational Methods]
|
394 |
+
D4 --> D4a[Decision Trees]
|
395 |
+
D4 --> D4b[Multi-Criteria Analysis]
|
396 |
+
D4 --> D4c[Probabilistic Reasoning]
|
397 |
+
G[Cross-Cutting Considerations] --> G1[Ethical Framework]
|
398 |
+
G --> G2[Stakeholder Management]
|
399 |
+
G --> G3[Interdisciplinary Connections]
|
400 |
+
G --> G4[Technological Integration]
|
401 |
+
G --> G5[Emotional Intelligence]
|
402 |
+
G --> G6[Collaborative Problem Solving]
|
403 |
+
G1 --> G1a[Value-based Decision Making]
|
404 |
+
G1 --> G1b[Long-term Consequences]
|
405 |
+
G2 --> G2a[Direct Stakeholders]
|
406 |
+
G2 --> G2b[Indirect Stakeholders]
|
407 |
+
G2 --> G2c[Conflicting Interests]
|
408 |
+
G3 --> G3a[Related Fields]
|
409 |
+
G3 --> G3b[Cross-disciplinary Impact]
|
410 |
+
G4 --> G4a[AI-assisted Problem Solving]
|
411 |
+
G4 --> G4b[Data-driven Insights]
|
412 |
+
G4 --> G4c[Digital Collaboration Tools]
|
413 |
+
G5 --> G5a[Self-Awareness]
|
414 |
+
G5 --> G5b[Empathy]
|
415 |
+
G5 --> G5c[Stress Management]
|
416 |
+
G6 --> G6a[Team Dynamics]
|
417 |
+
G6 --> G6b[Communication Strategies]
|
418 |
+
G6 --> G6c[Conflict Resolution]
|
419 |
+
H[Computational Considerations] --> H1[CPU Operations]
|
420 |
+
H --> H2[GPU Parallelization]
|
421 |
+
H --> H3[Floating-Point Precision]
|
422 |
+
I[Order of Operations] --> I1[Parentheses]
|
423 |
+
I --> I2[Exponents]
|
424 |
+
I --> I3[Multiplication and Division]
|
425 |
+
I --> I4[Addition and Subtraction]
|
426 |
+
J[Critical Thinking] --> J1[Assumptions Questioning]
|
427 |
+
J --> J2[Bias Recognition]
|
428 |
+
K[Future Perspective] --> K1[Short-term Projections]
|
429 |
+
K --> K2[Long-term Scenarios]
|
430 |
+
K --> K3[Potential Impacts]
|
431 |
+
L[Learning and Adaptation] --> L1[Reflective Practice]
|
432 |
+
L --> L2[Knowledge Transfer]
|
433 |
+
L --> L3[Adaptive Problem Solving]
|
434 |
+
|
435 |
+
|
436 |
+
graph TD
|
437 |
+
A[Meta-Cognitive Strategies] --> B[Creative Problem Solving]
|
438 |
+
A --> C[Systems Thinking]
|
439 |
+
A --> D[Decision Making]
|
440 |
+
A --> E[Emotional Intelligence]
|
441 |
+
A --> F[Collaborative Problem Solving]
|
442 |
+
B --> B1[Divergent Thinking]
|
443 |
+
B --> B2[Convergent Thinking]
|
444 |
+
B --> B3[Lateral Thinking]
|
445 |
+
C --> C1[Holistic Perspective]
|
446 |
+
C --> C2[Feedback Loops]
|
447 |
+
C --> C3[Emergent Properties]
|
448 |
+
D --> D1[Decision Trees]
|
449 |
+
D --> D2[Multi-Criteria Decision Analysis]
|
450 |
+
D --> D3[Probabilistic Reasoning]
|
451 |
+
E --> E1[Self-Awareness]
|
452 |
+
E --> E2[Empathy]
|
453 |
+
E --> E3[Stress Management]
|
454 |
+
F --> F1[Team Dynamics]
|
455 |
+
F --> F2[Communication Strategies]
|
456 |
+
F --> F3[Conflict Resolution]
|
457 |
+
G[Learning and Adaptation]
|
458 |
+
A --> G
|
459 |
+
G --> G1[Reflective Practice]
|
460 |
+
G --> G2[Knowledge Transfer]
|
461 |
+
G --> G3[Adaptive Problem Solving]
|
462 |
+
H[Ethical Framework]
|
463 |
+
A --> H
|
464 |
+
H --> H1[Value-based Decision Making]
|
465 |
+
H --> H2[Stakeholder Analysis]
|
466 |
+
H --> H3[Long-term Consequences]
|
467 |
+
I[Technological Integration]
|
468 |
+
A --> I
|
469 |
+
I --> I1[AI-assisted Problem Solving]
|
470 |
+
I --> I2[Data-driven Insights]
|
471 |
+
I --> I3[Digital Collaboration Tools]
|
472 |
+
|
473 |
|
474 |
## Requirements
|
475 |
|