A new AI model from OpenAI that uses a trial-and-error approach is starting to produce surprising results results on complex challenges. This method, previously used in gaming AIs such as AlphaGo, is now being adapted for a wider range of tasks, including language models.
DeepMind’s AlphaGo was the first AI to master a game without relying on human instructions or reading the rules. Instead, it used reinforcement learning (RL) to independently develop its understanding of Go.
This approach allowed AlphaGo to beat the European Go champion 5-0 and later beat the best human player in the world.
OpenAI’s latest model, o1, produces remarkable results on similarly complex problems. Like AlphaGo, o1 forms its own understanding of problem spaces through trial and error, without relying on human input.
This makes it the first major language model (LLM) to create its own highly efficient, AlphaGo-like interpretation of problem solving.
This method allows the AI to tackle previously unsolvable challenges by learning from real-world interactions, rather than being limited to language-based input. As a result, AI will be able to solve increasingly complex problems that were previously beyond its reach.
While o1 has many similarities to previous models, the main difference lies in the addition of ‘think time’ before responding to a question.
During this phase, o1 generates a ‘thought chain’ where the reasoning is carefully considered and justified before arriving at a solution.
Experts suggest that this new learning approach will result in AI exhibiting behavior that appears unusual or unpredictable, driven by its own unique logic. By doing this, AI could discover new knowledge and methods that are beyond human understanding. The future is already beginning to unfold.