Microsoft Research335 тыс
Опубликовано 17 сентября 2021, 16:29
In this talk, I am going to cover three of our recent explorations on pre-training. First is an analysis on object/attribute detection pre-training, which produces bottom-attention features extensively used in vision and language research. The main finding is that plain grid features can work equally well without object proposals, while being significantly faster. Next is an approach for self-supervised visual representation learning. The main message is that a simple Siamese network can learn competitive representations, without commonly believed essential components such as contrastive pairs, or momentum encoders. Last is an architecture extension of major frameworks in self-supervised learning from convolutional networks to transformers. We find vision transformers can work out-of-box, subject to instability issues which we call out for awareness.
Speaker: Xinlei Chen, Facebook AI Research
Microsoft Research Deep Learning team: microsoft.com/en-us/research/g...
Speaker: Xinlei Chen, Facebook AI Research
Microsoft Research Deep Learning team: microsoft.com/en-us/research/g...
Свежие видео
Случайные видео