Audio-visual representation learning for lip-sync estimation through ranking augmented contrastive training

Audio-visual representation learning for lip-sync estimation through ranking augmented contrastive training.

comments powered by Disqus