On December 6, Google officially launched Gemini, its groundbreaking large language model (LLM). CEO Sundar Pichai announced that Gemini is the "most capable and general model yet." It demonstrates some impressive results, already competing with the existing powerful models from OpenAI, Meta, and Microsoft.
Here's a breakdown of Gemini:
It's multimodal
Gemini can accept almost any input, including text, code, audio, image, and video. In more technical terms, this means that any data that we can compress into a vector can be used as input. For example, we can pass in image (patch) embeddings, audio embeddings, video embeddings, and much more.
It comes in 3 sizes
Gemini is incorporated into Google's AI-powered chatbot Bard and Pixel 8 Pro smartphones. It has three versions – Ultra, Pro, and Nano. Ultra is the largest and most capable model for complex tasks. It's intended for data centers and enterprise applications, set to launch next year. Pro is a less robust version that powers several Google AI services and underpins Bard. It's eventually the best model for scaling complex tasks. Nano is a lighter variant designed for offline use on Android devices.
With Gemini, Google says Bard will become much more intuitive with planning tasks, and Pixel 8 Pro will be able to summarize recordings and provide automatic responses on messaging services quickly.
Performance
In a direct comparison with OpenAI's GPT-4, Google asserts that Gemini outperforms GPT-4 in 30 of 32 established benchmarks. This includes better understanding and interaction with video and audio, a key distinction from OpenAI's models. Demis Hassabis, CEO of Google DeepMind, emphasizes Gemini's multimodal capabilities, integrating text, images, video, and audio data for more comprehensive understanding and responses.
On one benchmark, Gemini Ultra has a 74.4% success rate in Python coding tasks, while GPT-4 has 67%. On another benchmark, Gemini Ultra has a reading comprehension score of 82.4 compared with GPT-4's 80.9.
Pichai notes that on the text-only questions, Gemini scores 90%, human experts score approximately 89%, and GPT-4 scores 86%. Gemini scores 59% on the multimodal questions, while GPT-4 scores 57%.
Gemini Ultra is the first model to outperform human experts on Massive Multitask Language Understanding (MMLU). This is a set of tests designed to measure the performance of models on tasks involving text and images, including reading comprehension, college math, and multiple-choice quizzes in domains like physics, economics, and social sciences.
Gemini's faster and more cost-effective efficiency than previous models is another highlight. It's trained on Google's Tensor Processing Units and will be supported by the new TPU v5p for large-scale model operations.
However, there is still time to jump to conclusions. Many experiments still need to be done to determine which model is better. A Reddit user compared the multimodal capabilities of Gemini vs. GPT-4 with a small example. It seems like the Reddit community is having a lot of fun with the results.
Experts’ opinion
Melanie Mitchell, an artificial intelligence researcher at the Santa Fe Institute in New Mexico, observes that Gemini excels in language and coding benchmarks but has room for improvement in image and video processing. Google DeepMind has updated Gemini to be more accurate and responsible, reducing false information generation. However, a fundamental change in the underlying technology might be necessary for complete accuracy.
Experts question the effectiveness of Google's benchmarks for Gemini, citing a lack of transparency that makes it challenging to validate Google's claims. Emily Bender, a professor of computational linguistics at the University of Washington, notes that while Gemini is marketed as versatile, its evaluation benchmarks are limited, potentially affecting its thorough assessment.
Some researchers suggest that for the average user, the minor improvements Gemini offers over other models may not be a decisive factor. User preferences may lean more toward convenience, brand familiarity, and integration with existing products rather than technical advancements.
Applications
Gemini's multimodality gives the user a huge space for creativity. Its potential to transform any type of input into any type of output is a great opportunity to think of various applications.
Also, Gemini's ecosystem begins with powering Bard and new features on the Pixel 8 Pro, with plans to integrate into Google's search engine, Chrome browser, and other products. Gemini's unique strength of multimodality may potentially revolutionize tasks like coding, (where its AlphaCode 2 system demonstrates superior performance), generating combined text and images, reasoning visually.
Ethical considerations
Google, acknowledging its initial delay in responding to ChatGPT's launch, maintains its commitment to a cautious yet bold approach to AI development. Hassabis and Pichai emphasize the importance of not rushing, particularly with the advancement towards artificial general intelligence (AGI), which promises to surpass human intelligence. They advocate for a careful yet optimistic approach to this transformative technology.
To ensure the safety and reliability of Gemini, Google has conducted extensive internal and external testing. Pichai highlights the criticality of data security, especially for enterprise-focused AI products. Hassabis acknowledges the inherent risks in launching advanced AI systems, including unforeseen issues and vulnerabilities. Thus, the release of Gemini Ultra is being approached methodically, akin to a controlled beta test to identify and address potential problems.
Wrapping up
In summary, Google's Gemini, a multifaceted and advanced large language model, marks a significant stride in AI. It stands out with its multimodal capabilities, handling diverse inputs like text, code, and multimedia. Available in three versions, Gemini is handy for various applications of different sizes, from data centers to smartphones. While it excels in many benchmarks, surpassing competitors like GPT-4 in several areas, experts have called for more evaluation transparency. Google's cautious yet forward-looking approach in developing Gemini reflects its commitment to ethical AI and positions it as a key player in the evolving AI landscape.