Subbarao Kambhampati - Do O1 Models Search? Machine Learning Street Talk (MLST) podcast

Контент предоставлен Machine Learning Street Talk (MLST). Весь контент подкастов, включая эпизоды, графику и описания подкастов, загружается и предоставляется непосредственно компанией Machine Learning Street Talk (MLST) или ее партнером по платформе подкастов. Если вы считаете, что кто-то использует вашу работу, защищенную авторским правом, без вашего разрешения, вы можете выполнить процедуру, описанную здесь https://ru.player.fm/legal.

Machine Learning Street Talk (MLST) »
Subbarao Kambhampati - Do o1 models search?

22h ago 1:32:13

MP3•Главная эпизода

Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.

* How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see

* The evolution from traditional Large Language Models to more sophisticated reasoning systems

* The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably

* Why O1's improved performance comes with substantial computational costs

* The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)

* The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

TOC:

1. **O1 Architecture and Reasoning Foundations**

[00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations

[00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning

[00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach

[00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities

2. **Monte Carlo Methods and Model Deep-Dive**

[00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation

[00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems

[00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations

[00:45:59] 2.4 Mechanistic Interpretability of Model Behavior

[00:51:41] 2.5 O1 Response Patterns and Performance Analysis

3. **System Design and Real-World Applications**

[00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models

[01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1

[01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems

[01:16:01] 3.4 Program Generation and Fine-Tuning Approaches

[01:26:08] 3.5 Hybrid Architecture Implementation Strategies

Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0

REFS:

[00:02:00] Monty Python (1975)

Witch trial scene: flawed logical reasoning.

https://www.youtube.com/watch?v=zrzMhU_4m-g

[00:04:00] Cade Metz (2024)

Microsoft–OpenAI partnership evolution and control dynamics.

https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html

[00:07:25] Kojima et al. (2022)

Zero-shot chain-of-thought prompting ('Let's think step by step').

https://arxiv.org/pdf/2205.11916

[00:12:50] DeepMind Research Team (2023)

Multi-bot game solving with external and internal planning.

https://deepmind.google/research/publications/139455/

[00:15:10] Silver et al. (2016)

AlphaGo's Monte Carlo Tree Search and Q-learning.

https://www.nature.com/articles/nature16961

[00:16:30] Kambhampati, S. et al. (2023)

Evaluates O1's planning in "Strawberry Fields" benchmarks.

https://arxiv.org/pdf/2410.02162

[00:29:30] Alibaba AIDC-AI Team (2023)

MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.

https://arxiv.org/html/2411.14405

[00:31:30] Kambhampati, S. (2024)

Explores LLM "reasoning vs retrieval" debate.

https://arxiv.org/html/2403.04121v2

[00:37:35] Wei, J. et al. (2022)

Chain-of-thought prompting (introduces last-letter concatenation).

https://arxiv.org/pdf/2201.11903

[00:42:35] Barbero, F. et al. (2024)

Transformer attention and "information over-squashing."

https://arxiv.org/html/2406.04267v2

[00:46:05] Ruis, L. et al. (2023)

Influence functions to understand procedural knowledge in LLMs.

https://arxiv.org/html/2411.12580v1

(truncated - continued in shownotes/transcript doc)

199 эпизодов

#Machine Learning Street Talk #Artificial Intelligence #Tech #Machine Learning