File size: 2,302 Bytes
079c32c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
@startuml
skinparam NoteBackgroundColor PapayaWhip

autonumber

participant Coordinator
participant Learner
participant Collector
participant Middleware
participant Operator

group start
Coordinator->Coordinator: start communication module
Coordinator->Coordinator: start commander
Coordinator->Coordinator: start replay buffer
Coordinator->Operator: connect operator
Operator->Coordinator: send collector/learner info
Coordinator->Learner: create connection
Coordinator->Collector: create connection
end

loop
autonumber
group learn(async)
Coordinator->Learner: request learner start task
note right
policy config
learner config
end note
Learner->Coordinator: return learner start info
group learner loop
Coordinator->Learner: request data demand task
Learner->Coordinator: return data demand
Coordinator->Learner: request learn task and send data(metadata)
note right
data path
data priority
end note
Middleware->Learner: load data(stepdata)
Learner->Learner: learner a iteration
Learner->Middleware: send policy info
note left
model state_dict
model hyper-parameter
end note
Learner->Coordinator: return learn info
note right
policy meta
train stat
data priority
end note
end
Coordinator->Learner: request learner close task
Learner->Coordinator: return learner close info
note right
save final policy
end note
end

autonumber
group data collection/evaluation(async)
Coordinator->Collector: request collector start task
note right
policy meta
env config
collector config
end note
Collector->Coordinator: return collector start info
Middleware->Collector: load policy info for init
group collector loop
Coordinator->Collector: request get data task
Collector->Collector: policy interact with env
Collector->Middleware: send data(stepdata)
Collector->Coordinator: return data(metadata)
note right
data path
data length(rollout length)
end note
Middleware->Collector: load policy info for update
end group
Coordinator->Collector: request collector close task
Collector->Coordinator: return collector close info
note right
episode result(cumulative reward)
collector performance
end note
end group
end

autonumber
group close
Coordinator->Learner: destroy connection
Coordinator->Collector: destroy connection
Coordinator->Operator: disconnect operator
Coordinator->Coordinator: close
end group
@enduml