File size: 2,302 Bytes
079c32c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
@startuml
skinparam NoteBackgroundColor PapayaWhip
autonumber
participant Coordinator
participant Learner
participant Collector
participant Middleware
participant Operator
group start
Coordinator->Coordinator: start communication module
Coordinator->Coordinator: start commander
Coordinator->Coordinator: start replay buffer
Coordinator->Operator: connect operator
Operator->Coordinator: send collector/learner info
Coordinator->Learner: create connection
Coordinator->Collector: create connection
end
loop
autonumber
group learn(async)
Coordinator->Learner: request learner start task
note right
policy config
learner config
end note
Learner->Coordinator: return learner start info
group learner loop
Coordinator->Learner: request data demand task
Learner->Coordinator: return data demand
Coordinator->Learner: request learn task and send data(metadata)
note right
data path
data priority
end note
Middleware->Learner: load data(stepdata)
Learner->Learner: learner a iteration
Learner->Middleware: send policy info
note left
model state_dict
model hyper-parameter
end note
Learner->Coordinator: return learn info
note right
policy meta
train stat
data priority
end note
end
Coordinator->Learner: request learner close task
Learner->Coordinator: return learner close info
note right
save final policy
end note
end
autonumber
group data collection/evaluation(async)
Coordinator->Collector: request collector start task
note right
policy meta
env config
collector config
end note
Collector->Coordinator: return collector start info
Middleware->Collector: load policy info for init
group collector loop
Coordinator->Collector: request get data task
Collector->Collector: policy interact with env
Collector->Middleware: send data(stepdata)
Collector->Coordinator: return data(metadata)
note right
data path
data length(rollout length)
end note
Middleware->Collector: load policy info for update
end group
Coordinator->Collector: request collector close task
Collector->Coordinator: return collector close info
note right
episode result(cumulative reward)
collector performance
end note
end group
end
autonumber
group close
Coordinator->Learner: destroy connection
Coordinator->Collector: destroy connection
Coordinator->Operator: disconnect operator
Coordinator->Coordinator: close
end group
@enduml
|