Работайте офлайн с приложением Player FM !
Learning to Retrieve Passages without Supervision: finally unsupervised Neural IR?
Manage episode 355037189 series 3446693
In this third episode of the Neural Information Retrieval Talks podcast, Andrew Yates and Sergi Castella discuss the paper "Learning to Retrieve Passages without Supervision" by Ori Ram et al.
Despite the massive advances in Neural Information Retrieval in the past few years, statistical models still overperform neural models when no annotations are available at all. This paper proposes a new self-supervised pertaining task for Dense Information Retrieval that manages to beat BM25 on some benchmarks without using any label.
Paper: https://arxiv.org/abs/2112.07708
Timestamps:
00:00 Introduction
00:36 "Learning to Retrieve Passages Without Supervision"
02:20 Open Domain Question Answering
05:05 Related work: Families of Retrieval Models
08:30 Contrastive Learning
11:18 Siamese Networks, Bi-Encoders and Dual-Encoders
13:33 Choosing Negative Samples
17:46 Self supervision: how to train IR models without labels.
21:31 The modern recipe for SOTA Retrieval Models
23:50 Methodology: a new proposed self supervision task
26:40 Datasets, metrics and baselines
\33:50 Results: Zero-Shot performance
43:07 Results: Few-shot performance
47:15 Practically, is not using labels relevant after all?
51:37 How would you "break" the Spider model?
53:23 How long until Neural IR models outperform BM25 out-of-the-box robustly?
54:50 Models as a service: OpenAI's text embeddings API
Contact: castella@zeta-alpha.com
21 эпизодов
Manage episode 355037189 series 3446693
In this third episode of the Neural Information Retrieval Talks podcast, Andrew Yates and Sergi Castella discuss the paper "Learning to Retrieve Passages without Supervision" by Ori Ram et al.
Despite the massive advances in Neural Information Retrieval in the past few years, statistical models still overperform neural models when no annotations are available at all. This paper proposes a new self-supervised pertaining task for Dense Information Retrieval that manages to beat BM25 on some benchmarks without using any label.
Paper: https://arxiv.org/abs/2112.07708
Timestamps:
00:00 Introduction
00:36 "Learning to Retrieve Passages Without Supervision"
02:20 Open Domain Question Answering
05:05 Related work: Families of Retrieval Models
08:30 Contrastive Learning
11:18 Siamese Networks, Bi-Encoders and Dual-Encoders
13:33 Choosing Negative Samples
17:46 Self supervision: how to train IR models without labels.
21:31 The modern recipe for SOTA Retrieval Models
23:50 Methodology: a new proposed self supervision task
26:40 Datasets, metrics and baselines
\33:50 Results: Zero-Shot performance
43:07 Results: Few-shot performance
47:15 Practically, is not using labels relevant after all?
51:37 How would you "break" the Spider model?
53:23 How long until Neural IR models outperform BM25 out-of-the-box robustly?
54:50 Models as a service: OpenAI's text embeddings API
Contact: castella@zeta-alpha.com
21 эпизодов
Все серии
×Добро пожаловать в Player FM!
Player FM сканирует Интернет в поисках высококачественных подкастов, чтобы вы могли наслаждаться ими прямо сейчас. Это лучшее приложение для подкастов, которое работает на Android, iPhone и веб-странице. Зарегистрируйтесь, чтобы синхронизировать подписки на разных устройствах.