runtime error
urn F.layer_norm( File "/opt/conda/lib/python3.9/site-packages/torch/nn/", line 2515, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' [2m2023-10-29T08:03:54.468454Z[0m [31mERROR[0m [1mwarmup[0m[1m{[0m[3mmax_input_length[0m[2m=[0m1024 [3mmax_prefill_tokens[0m[2m=[0m4096 [3mmax_total_tokens[0m[2m=[0m2048[1m}[0m[2m:[0m[1mwarmup[0m[2m:[0m [2mtext_generation_client[0m[2m:[0m [2mrouter/client/src/[0m[2m:[0m[2m33:[0m Server error: "LayerNormKernelImpl" not implemented for 'Half' Error: Warmup(Generation("\"LayerNormKernelImpl\" not implemented for 'Half'")) [2m2023-10-29T08:03:54.553953Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Webserver Crashed [2m2023-10-29T08:03:54.553979Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Shutting down shards [2m2023-10-29T08:03:55.452679Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Shard terminated [2m[3mrank[0m[2m=[0m0[0m Error: WebserverFailed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (7) Failed to connect to port 8080: Connection refused Warning: Transient problem: connection refused Will retry in 10 seconds. 2 Warning: retries left. 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (7) Failed to connect to port 8080: Connection refused Warning: Transient problem: connection refused Will retry in 10 seconds. 1 Warning: retries left. 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (7) Failed to connect to port 8080: Connection refused
Container logs:
Fetching error logs...