Running on Zero 5 5 Dolphin: Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention ๐ Separate speakers in videos