OpenAI’s o1 Model — ‘Strawberry’

Veer Jain
3 min readSep 24, 2024

--

OpenAI has officially introduced its much-anticipated “Strawberry” AI model, formally named OpenAI o1. This model aims to improve upon its predecessors by enhancing reasoning and problem-solving abilities, a move that has generated both excitement and skepticism in the AI community. The o1 model comes in two versions: o1-preview and o1-mini, with both now available to ChatGPT Plus users and select API users.

The o1-preview model claims to surpass the earlier GPT-4o across various benchmarks, particularly in competitive programming, mathematics, and scientific reasoning. However, early users note that while it excels in specific areas, it hasn’t yet outperformed GPT-4o across the board. One notable drawback is the delay in response times, caused by the model’s multi-step problem-solving approach.

Joanne Jang, an OpenAI product manager, cautioned users about overhyping the o1 model, explaining that while it performs well on challenging tasks, it is not a universal improvement over previous models. Her comments emphasize that o1 is still a work in progress, with significant potential but not yet a “miracle model.”

One standout achievement of the o1-preview is its performance on competitive programming questions, where it ranks in the 89th percentile on Codeforces questions. It also dramatically improved in mathematics, scoring 83% on an exam for the International Mathematics Olympiad, compared to GPT-4o’s 13%. OpenAI also claims that o1 performs on par with PhD students in fields like physics, chemistry, and biology. Meanwhile, the o1-mini is optimized for coding tasks and is priced significantly lower than the o1-preview.

One of the key advancements of o1 lies in its new reinforcement learning approach. This method encourages the model to “think through” problems before delivering an answer, a strategy that mirrors the “step-by-step” prompting used in other large language models (LLMs). This allows o1 to recognize its mistakes and adopt different problem-solving strategies. However, AI benchmarks have been known to be unreliable, and it will take independent testing from users to fully validate OpenAI’s claims.

An amusing yet telling example of o1’s improved capabilities is its ability to count the number of R’s in the word “strawberry” — a task that has stumped previous models due to how LLMs tokenize words. While trivial, this demo highlights the model’s ability to handle character-level tasks more effectively.

Despite these advancements, some experts remain cautious about the “reasoning” claims surrounding o1. The term is often seen as anthropomorphizing AI, giving users the impression that the system can “think” in a human-like way. Clement Delangue, CEO of Hugging Face, criticized this language, pointing out that these models are still just processing data and running predictions.

Ultimately, while the o1 model shows promise, it is not without its limitations. It still lacks features like web browsing, image generation, and file uploads, though OpenAI plans to integrate these in future updates. For now, the release of o1-preview and o1-mini represents a step forward, but it is clear that OpenAI’s journey toward creating a truly reasoning AI is still ongoing.

--

--

Veer Jain
Veer Jain

Written by Veer Jain

I am a undergraduate student who is eager to learn more!

No responses yet