RobbiePasquale commited on
Commit
d7db658
·
verified ·
1 Parent(s): 2204b0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -0
README.md CHANGED
@@ -3,6 +3,7 @@ license: apache-2.0
3
 
4
  ---
5
 
 
6
  # Use in Colab
7
 
8
  from huggingface_hub import snapshot_download
@@ -143,6 +144,160 @@ python main_menu.py --task advanced_inference --query "Analyze the economic effe
143
  ```
144
 
145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  ## Citation
147
 
148
  If you use LightBulb in your research, please cite the author:
@@ -158,6 +313,7 @@ If you use LightBulb in your research, please cite the author:
158
  ```
159
 
160
 
 
161
  ## License
162
 
163
  This project is licensed under the Apache 2.0 License.
 
3
 
4
  ---
5
 
6
+
7
  # Use in Colab
8
 
9
  from huggingface_hub import snapshot_download
 
144
  ```
145
 
146
 
147
+ # Explanation:
148
+
149
+ World Model Optimisation:
150
+ -------------------------------------------------------------------
151
+ Input: I_i
152
+ -------------------------------------------------------------------
153
+ Rotary Positional Encoding:
154
+
155
+ emb_i = RoPE (Input)
156
+ -------------------------------------------------------------------
157
+ Token_i = t_i transformer(, k_beams = k, n_tokens = j)
158
+
159
+ CE_Loss = CE_loss(token_i , true tokens)
160
+
161
+ -------------------------------------------------------------------
162
+ Variance of the next token + Entropy of the sequence = State Score
163
+ -------------------------------------------------------------------
164
+ Representation Network:
165
+ GAN/VAE/SAE (o_t -> s_t)
166
+
167
+ If the final hidden layer of the transformer outputs o_t of size S
168
+
169
+ h_t = GELU(sum(W.o_t + b))
170
+
171
+ Reconstruction Loss (o_t , h_t)
172
+
173
+
174
+ -------------------------------------------------------------------
175
+ Dynamics Network (s_t -> s_t+1)
176
+
177
+ ... -> LSTM(s_t) -> LSTM(s_t+1) -> ...
178
+
179
+ min MSE (s_t+1 , z_t+1 )
180
+
181
+ State mapping
182
+ -------------------------------------------------------------------
183
+ Utilise dynamics influence:
184
+
185
+ Action_i = a_i = t_1 , ... , t_n
186
+
187
+ Prediction Network : mcts( Q(s,a) , gamma * LSTM(s_t) , delta * State Score (s_t), tree_depth = m, num_simulations) -> Q(s_t+1)
188
+
189
+ action search / selection
190
+ -------------------------------------------------------------------
191
+ Optimise the KL divergence between the policy of actions (and the tokens that were selected in those actions) and the actual sequences in the training data.
192
+
193
+ Policy_i = p_i = a_1, ... ,a_n
194
+
195
+ min - KL(p_i / true_sequences)
196
+
197
+ -------------------------------------------------------------------
198
+ Inference:
199
+
200
+ Thought_i = p_i , ... , p_n
201
+
202
+ Tree of Thought :
203
+ Example:
204
+
205
+ -----------------
206
+ 1
207
+ -----------------
208
+ 121
209
+ 122
210
+ 123
211
+ -----------------
212
+ 12131
213
+ 12132
214
+ 12133
215
+
216
+ 12231
217
+ 12232
218
+ 12233
219
+
220
+ 12331
221
+ 12332
222
+ 12333
223
+ ---------------
224
+
225
+ = Graph(system prompt, children = 3, depth = 4, min - KL(p_i / true_sequences))
226
+
227
+ Graph(Thought_i -> Thought i+1)
228
+
229
+ Min ThoughtLoss()
230
+
231
+ -------------------------------------------------------------------
232
+
233
+ Backpropagate back through each section, get gradients for:
234
+ -------------------------------------------------------------------
235
+ for thought batch size = b_t:
236
+
237
+ d ThoughtLoss
238
+ ______________
239
+ d Graph(Thought_i -> Thought_i+1)
240
+
241
+ -------------------------------------------------------------------
242
+ for policy batch size = b_p:
243
+
244
+ d KL(p_i / true_sequences)
245
+ _______________________
246
+ d Prediction_Network
247
+
248
+ -------------------------------------------------------------------
249
+ for state batch size: b_s:
250
+
251
+ d MSE(s_t+1 , z_t+1 )
252
+ _________________
253
+ d Dynamics Network
254
+
255
+ -------------------------------------------------------------------
256
+ for state batch size: b_s:
257
+
258
+ d Contrastive Loss
259
+ ________________
260
+ d Representation Network
261
+
262
+ -------------------------------------------------------------------
263
+ for token batch_size: b_to
264
+
265
+ d Multi token beam search Transformer CE Loss
266
+ __________________________________
267
+ d transformer
268
+
269
+
270
+ +++++++++++++++++++++++++++++++++++++++++
271
+ +++++++++++++++++++++++++++++++++++++++++
272
+ +++++++++++++++++++++++++++++++++++++++++
273
+ +++++++++++++++++++++++++++++++++++++++++
274
+
275
+ Inference:
276
+
277
+ 1. Input User Query
278
+ 2. The model's goal is to generate a thought, which contains a set of policies, which contains sequences of actions, and an action is a sequence of tokens.
279
+ 3. The sequence of tokens is chosen using multi token prediction.
280
+ 4. The Thought size is defined based on the user prompt, if the user prompt is in depth, then given the text in the input query, a larger output tree of thought.
281
+ 5. Perform the multi token beam search, depending on the action size, for each action will contain a multi token beam search (so an action will contain the state score of k beams for n tokens each time step, for a batch size of b_to).
282
+ 6. PPO agent selects the actions given a mcts over actions using their state scores
283
+ 7. Based on the tree of thought prompt tree, and given the sequence of actions selected for the policy, feed the chosen policy into the tree of thought, and get the Transformer Language Model to output token sequences based on the tree of thought prompts. There is an actor critic RL agent that selects the next child node in the tree of thought that is used, therefore learning to control how it responds to different user queries. The tree of thought should contain logic for decision making or solving problems in different ways.
284
+ 8. Update world model given external evaluation datasets
285
+
286
+ +++++++++++++++++++++++++++++++++++++++++
287
+ +++++++++++++++++++++++++++++++++++++++++
288
+ +++++++++++++++++++++++++++++++++++++++++
289
+ +++++++++++++++++++++++++++++++++++++++++
290
+
291
+
292
+ Web Search Agent:
293
+
294
+ 1. Given a user prompt, search N websites, using the input search query.
295
+ 2. Given meta charactistics of he webpages, use FFN to rank the web pages
296
+ 3. Utilise RAG to retrieve and summarize the content from the k highest ranking web pages given the user search query.
297
+ 4. Extract and formulate the retrieved information into a custom dataset.
298
+ 5. Feed the LLM and World Model the custom search dataset.
299
+
300
+
301
  ## Citation
302
 
303
  If you use LightBulb in your research, please cite the author:
 
313
  ```
314
 
315
 
316
+
317
  ## License
318
 
319
  This project is licensed under the Apache 2.0 License.