RobbiePasquale commited on
Commit
7221607
·
verified ·
1 Parent(s): 17d845e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +178 -0
README.md CHANGED
@@ -116,6 +116,7 @@ The `AutonomousWebAgent` is a sophisticated, multi-component search and retrieva
116
  - `ToTNode` and `ToTSearch` classes enable the agent to generate thoughts, evaluate them, and navigate through them as a tree, considering various potential paths to best answer the query.
117
  - It combines MCTS and RAG to synthesize responses based on the generated thought paths.
118
 
 
119
  ### Training Process
120
 
121
  The training process for the agent involves episodic learning, where it interacts with various queries from a predefined list. Each query initiates an episode, and the agent performs actions based on its learned policy:
@@ -292,6 +293,183 @@ After each epoch, the model is evaluated on the validation set, computing the av
292
  ### Checkpoints
293
  At the end of each epoch, the model saves checkpoints of all components, enabling easy resumption or further fine-tuning as needed.
294
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
295
 
296
  ## Requirements
297
 
 
116
  - `ToTNode` and `ToTSearch` classes enable the agent to generate thoughts, evaluate them, and navigate through them as a tree, considering various potential paths to best answer the query.
117
  - It combines MCTS and RAG to synthesize responses based on the generated thought paths.
118
 
119
+
120
  ### Training Process
121
 
122
  The training process for the agent involves episodic learning, where it interacts with various queries from a predefined list. Each query initiates an episode, and the agent performs actions based on its learned policy:
 
293
  ### Checkpoints
294
  At the end of each epoch, the model saves checkpoints of all components, enabling easy resumption or further fine-tuning as needed.
295
 
296
+ ## Inference Details
297
+
298
+ 1. Input Processing:
299
+ - The function takes a query (text input), world model components, a root thought node, and a tokenizer.
300
+ - The query is tokenized and encoded using the provided tokenizer.
301
+
302
+ 2. Inference Modes:
303
+ The function supports three inference modes:
304
+
305
+ a. 'without_world_model':
306
+ - This mode directly uses the transformer model to generate text.
307
+ - It doesn't utilize the world model components or the Tree of Thought.
308
+ - The transformer generates text autoregressively up to the specified max length.
309
+
310
+ b. 'world_model':
311
+ - This mode uses the world model components but doesn't use the Tree of Thought.
312
+ - It generates actions based on the prediction network's output.
313
+
314
+ c. 'world_model_tree_of_thought':
315
+ - This is the most comprehensive mode, using both the world model and the Tree of Thought.
316
+
317
+ 3. World Model Inference Process:
318
+ For the 'world_model' and 'world_model_tree_of_thought' modes:
319
+
320
+ a. Initial State:
321
+ - The query is passed through the transformer model.
322
+ - The representation network creates an initial state representation from the transformer output.
323
+
324
+ b. Action Selection:
325
+ - For 'world_model':
326
+ - The prediction network generates policy logits from the state representation.
327
+ - Actions are selected based on the highest probabilities in the policy.
328
+
329
+ - For 'world_model_tree_of_thought':
330
+ - It uses Monte Carlo Tree Search (MCTS) to explore the Tree of Thought.
331
+ - For each MCTS iteration:
332
+ * Selection: Traverse the tree to find a leaf node.
333
+ * Expansion: Add child nodes to the leaf.
334
+ * Evaluation: Use the prediction network to estimate the value of the node.
335
+ * Backpropagation: Update the values and visit counts of nodes.
336
+ - The best action is chosen based on visit counts after MCTS.
337
+
338
+ c. State Transition:
339
+ - The selected action is applied to the current state using the dynamics network.
340
+ - This creates a new state representation for the next step.
341
+
342
+ d. Sequence Generation:
343
+ - The process repeats for the specified number of steps or until a termination condition is met.
344
+ - For the Tree of Thought approach, it continues until reaching a leaf node in the thought tree.
345
+
346
+ 4. Output:
347
+ - For 'without_world_model', it returns the generated text.
348
+ - For 'world_model' and 'world_model_tree_of_thought', it returns a sequence of selected actions (thoughts).
349
+
350
+ The world model inference leverages the learned representations and dynamics to navigate the problem-solving process. The Tree of Thought approach adds structure to this process, guiding the model through a predefined hierarchy of problem-solving steps. This allows for a more structured and potentially more effective approach to complex problem-solving tasks.
351
+
352
+ Here I am utilising Trees of Thought as a structure of how to structure sets of policies, and sequences of actions. These Tree structures provide the World Model a general thought structure and pattern, similarly to how humans create thought patterns for solving certain problems (e.g. understand, describe, analyse, etc).
353
+
354
+ Here are some example Trees of Thought:
355
+ graph TD
356
+ A[Problem-Solving Process] --> B[Problem Identification]
357
+ A --> C[Problem Analysis]
358
+ A --> D[Solution Generation]
359
+ A --> E[Implementation]
360
+ A --> F[Evaluation and Adjustment]
361
+ B --> B1[Define the Problem]
362
+ B --> B2[Identify Stakeholders]
363
+ B --> B3[Determine Constraints]
364
+ B --> B4[Recognize Problem Type]
365
+ B --> B5[Historical Context]
366
+ C --> C1[Root Cause Analysis]
367
+ C --> C2[System Mapping]
368
+ C --> C3[Data Collection]
369
+ C --> C4[Impact Assessment]
370
+ C --> C5[Theoretical Framework]
371
+ D --> D1[Creative Problem Solving]
372
+ D --> D2[Analytical Approach]
373
+ D --> D3[Mathematical Computation]
374
+ D --> D4[Decision Making]
375
+ E --> E1[Action Planning]
376
+ E --> E2[Resource Allocation]
377
+ E --> E3[Change Management]
378
+ F --> F1[Verification]
379
+ F --> F2[Performance Metrics]
380
+ F --> F3[Feedback Loops]
381
+ F --> F4[Continuous Improvement]
382
+ C3 --> C3a[Quantitative Data]
383
+ C3 --> C3b[Qualitative Data]
384
+ C3 --> C3c[Data Validation]
385
+ D1 --> D1a[Divergent Thinking]
386
+ D1 --> D1b[Convergent Thinking]
387
+ D1 --> D1c[Lateral Thinking]
388
+ D2 --> D2a[Logical Reasoning]
389
+ D2 --> D2b[Critical Analysis]
390
+ D2 --> D2c[Systems Thinking]
391
+ D3 --> D3a[Basic Operations]
392
+ D3 --> D3b[Advanced Operations]
393
+ D3 --> D3c[Computational Methods]
394
+ D4 --> D4a[Decision Trees]
395
+ D4 --> D4b[Multi-Criteria Analysis]
396
+ D4 --> D4c[Probabilistic Reasoning]
397
+ G[Cross-Cutting Considerations] --> G1[Ethical Framework]
398
+ G --> G2[Stakeholder Management]
399
+ G --> G3[Interdisciplinary Connections]
400
+ G --> G4[Technological Integration]
401
+ G --> G5[Emotional Intelligence]
402
+ G --> G6[Collaborative Problem Solving]
403
+ G1 --> G1a[Value-based Decision Making]
404
+ G1 --> G1b[Long-term Consequences]
405
+ G2 --> G2a[Direct Stakeholders]
406
+ G2 --> G2b[Indirect Stakeholders]
407
+ G2 --> G2c[Conflicting Interests]
408
+ G3 --> G3a[Related Fields]
409
+ G3 --> G3b[Cross-disciplinary Impact]
410
+ G4 --> G4a[AI-assisted Problem Solving]
411
+ G4 --> G4b[Data-driven Insights]
412
+ G4 --> G4c[Digital Collaboration Tools]
413
+ G5 --> G5a[Self-Awareness]
414
+ G5 --> G5b[Empathy]
415
+ G5 --> G5c[Stress Management]
416
+ G6 --> G6a[Team Dynamics]
417
+ G6 --> G6b[Communication Strategies]
418
+ G6 --> G6c[Conflict Resolution]
419
+ H[Computational Considerations] --> H1[CPU Operations]
420
+ H --> H2[GPU Parallelization]
421
+ H --> H3[Floating-Point Precision]
422
+ I[Order of Operations] --> I1[Parentheses]
423
+ I --> I2[Exponents]
424
+ I --> I3[Multiplication and Division]
425
+ I --> I4[Addition and Subtraction]
426
+ J[Critical Thinking] --> J1[Assumptions Questioning]
427
+ J --> J2[Bias Recognition]
428
+ K[Future Perspective] --> K1[Short-term Projections]
429
+ K --> K2[Long-term Scenarios]
430
+ K --> K3[Potential Impacts]
431
+ L[Learning and Adaptation] --> L1[Reflective Practice]
432
+ L --> L2[Knowledge Transfer]
433
+ L --> L3[Adaptive Problem Solving]
434
+
435
+
436
+ graph TD
437
+ A[Meta-Cognitive Strategies] --> B[Creative Problem Solving]
438
+ A --> C[Systems Thinking]
439
+ A --> D[Decision Making]
440
+ A --> E[Emotional Intelligence]
441
+ A --> F[Collaborative Problem Solving]
442
+ B --> B1[Divergent Thinking]
443
+ B --> B2[Convergent Thinking]
444
+ B --> B3[Lateral Thinking]
445
+ C --> C1[Holistic Perspective]
446
+ C --> C2[Feedback Loops]
447
+ C --> C3[Emergent Properties]
448
+ D --> D1[Decision Trees]
449
+ D --> D2[Multi-Criteria Decision Analysis]
450
+ D --> D3[Probabilistic Reasoning]
451
+ E --> E1[Self-Awareness]
452
+ E --> E2[Empathy]
453
+ E --> E3[Stress Management]
454
+ F --> F1[Team Dynamics]
455
+ F --> F2[Communication Strategies]
456
+ F --> F3[Conflict Resolution]
457
+ G[Learning and Adaptation]
458
+ A --> G
459
+ G --> G1[Reflective Practice]
460
+ G --> G2[Knowledge Transfer]
461
+ G --> G3[Adaptive Problem Solving]
462
+ H[Ethical Framework]
463
+ A --> H
464
+ H --> H1[Value-based Decision Making]
465
+ H --> H2[Stakeholder Analysis]
466
+ H --> H3[Long-term Consequences]
467
+ I[Technological Integration]
468
+ A --> I
469
+ I --> I1[AI-assisted Problem Solving]
470
+ I --> I2[Data-driven Insights]
471
+ I --> I3[Digital Collaboration Tools]
472
+
473
 
474
  ## Requirements
475