OpenAI just dropped its latest AI model, GPT-4o, and it's already the talk of the tech town. The "o" stands for "omni," showcasing its ability to see, talk, and listen—it's multimodal. This new model will roll out gradually across OpenAI's platforms in the next few weeks.
Multimodal magic
GPT-4o takes what GPT-4 could do and makes it even fancier. It can understand and respond in text, audio, and visuals.
OpenAI CTO Mira Murati highlighted how GPT-4o combines GPT-4-level intelligence with enhanced capabilities across multiple formats. “GPT-4o reasons across voice, text, and vision,” Murati said. This includes more complicated nuances like tone, background noises, or emotions from the audio all together. This takes us to a whole new level of human-machine interaction where it almost feels like talking to a real person.
ChatGPT gets a major upgrade
ChatGPT with GPT-4o is now more conversational. You can chat with it in real-time, interrupt it, ask it follow-up questions and it all seems just too real. It picks up on the tone of your voice and responds with different emotions. In the demo video it even did a great job singing.
Vision capabilities enhanced
The vision upgrade means ChatGPT can now analyze photos or screenshots. Snap a pic of a menu in a foreign language, and it’ll translate it. Or show it a photo, and it’ll tell you what’s in it, like the brand of a shirt someone is wearing. These features are set to evolve even further, potentially allowing ChatGPT to "watch" live events and provide commentary.
Faster and more accessible
GPT-4o’s impressive with its speed. It’s twice as fast as the previous model, GPT-4 Turbo, and costs half as much to use via OpenAI's API and Microsoft’s Azure OpenAI Service. Both free and paid users get access, with paid users getting a 5x higher message limit.
Free users will also get access to the GPT Store, which was previously exclusive to paid users—a step toward democratizing AI.
Performance
OpenAI says GPT-4o achieves GPT-4 Turbo-level performance according to traditional benchmarks – text, reasoning and coding. Due to its multimodal capabilities, it achieves remarkable results on multilingual, audio, and vision benchmarks.
Examples
Users on X, LinkedIn, and Reddit are impressed by GPT-4o. Here's how it takes data and performs deep technical and statistical analysis in just a matter of a few seconds.
It’s shockingly well with 18th-century handwriting.
Min Choi on X asked GPT-4o to create an STL file for a 3D model chair, and it's mind-blowing. The file is generated in less than 30 seconds.
Below is @tldraw in a notebook, connected to GPT-4o's vision API. This video is at original speed, and it shows how @matplotlib plots are generated from scribbles.
There’s a viral video on X of GPT-4o’s singing to each other. Matt Shumer says it’s the craziest thing he’s ever seen.
Community feedback
Reddit users are buzzing with thoughts of GPT-4o. Here's what stands out in the community feedback:
- Practical use: Many users are thrilled about GPT-4o's potential in real-world applications. It's not just a neat tech demo anymore; people see it as a game changer that could make interacting with AI a daily norm and necessity.
- Cost and performance: The fact that GPT-4o offers more for less money has caught everyone's attention. Users are excited that such a powerful tool is becoming more accessible and affordable, reflecting just how far AI has come in a short time.
- Feels more human: There's a lot of chat about how “human” GPT-4o feels. Users who have tried the model note that conversations are smoother and more intuitive, making it feel like you're chatting with a human rather than a machine.
- AGI debate: The question of whether GPT-4o is the step towards Artificial general intelligence (AGI) is hot on Reddit. While some think we're on the brink, others feel the goalposts for AGI are still far off. It’s a lively debate that shows just how provocative and exciting AI developments are.
Overall, the mood on Reddit is a mix of optimism and curiosity, with a healthy dose of tech enthusiasm.
Challenges and future prospects
Even though GPT-4o is loaded with cool features, it’s still early days for unified multimodal interaction. Some things, like audio outputs, are only available in a limited form right now. OpenAI is committed to continuous development to unlock the model's potential fully.
As AI tech keeps evolving, GPT-4o is setting a new benchmark for generative AI. Its cutting-edge abilities and wider accessibility make it a game-changer for both individuals and businesses. With the AI race heating up, GPT-4o is ready to take charge of the future of AI.
Closing remarks
As OpenAI rolls out GPT-4o, it's clear we're entering a new era of artificial intelligence. This model isn't just faster and more efficient—it's a glimpse into a future where AI can understand and interact in ways that feel incredibly human. From enhancing conversations in ChatGPT to understanding complex visual inputs and singing, GPT-4o is setting the stage for more dynamic and integrated AI applications.