Databricks just dropped DBRX with Mosaic AI research, and it's pretty neat. It's a model packed with 132B parameters that can chat in multiple languages, code, and solve complex math problems. It's stepping up to outperform big names like Mixtral, Claude 3, Llama 2, and Grok-1 on standard benchmarking tests.
DBRX was developed in just two months with a $10M budget, setting new benchmarks for open LLMs with its efficiency and advanced capabilities. Databricks is letting anyone get DBRX off GitHub and Hugging Face for free. This is important because it means more people can play around with, improve, and innovate with AI. This decision lowers entry barriers, allowing enterprises to develop Gen AI use cases with their own data without emptying their pockets.
DBRX architecture
A standout feature of DBRX is its mixture of experts (MoE) architecture with 132 billion parameters. This setup includes 16 experts, with 4 activated for any given token, contributing to 36 billion active parameters. This setup contrasts with models like Mixtral, which operates with around 13 billion active parameters, suggesting DBRX might need a bit more GPU power to run smoothly. Unlike Mixtral and Grok-1, which use fewer but larger experts, DBRX seems to get a performance edge by engaging more experts per token, offering a fresh take on optimizing AI efficiency and output.
This architecture allows DBRX to process information up to twice as fast as models like LLaMA2-70B, with a more compact size — approximately 40% that of Grok-1 in terms of both total and active parameters. When operated on Databricks' Mosaic AI Model Serving, DBRX achieves impressive text generation speeds of up to 150 tokens per second.
Quality and performance
DBRX's prowess in MMLU and its exceptional capabilities in math and code (evidenced by HumanEval and GSM-8K benchmarks) are worth noting. Although there's a bit of confusion regarding comparisons made in their blog post—mainly focusing on the Instruct version against Mixtral's various forms—it's clear that DBRX's Instruct model stands out, especially in code and math benchmarks. DBRX outperforms GPT-3.5 and rivals Gemini 1.0 Pro in performance. It's particularly adept at programming, surpassing niche models like CodeLLaMA-70B while also excelling as a versatile large language model (LLM).
Data
The model's strength is underpinned by its training on an impressive 12 trillion tokens of text and code data, with a maximum context length of 32k tokens. Yet, specifics beyond this are scarce, leaving us curious about the finer details of its data diet.
Technical implementation
From a technical standpoint, DBRX uses the GPT-4 tokenizer alongside features like gated linear units, grouped query attention, and rotary position encodings. Those interested in a deeper dive into its mechanics should check out Daniel Hanchen's insightful explanation. And for practical implementation, it's compatible with HF transformers, ensuring a degree of accessibility and ease of use.
Community feedback on DBRX
There are some hot discussions around DBRX on Reddit, with users sharing what they like and don't like about the new model. Here's a quick summary of people's thoughts:
- Accuracy on niche topics: Users noted DBRX's accurate handling of specialized topics without making up answers, which is a significant improvement over other models.
- Efficiency and hardware Requirements: Despite its large size, DBRX is described as lean and potentially operable on systems with 64GB RAM when quantized, making it more accessible.
- Practical application and technical support: Discussions include how to run DBRX in Databricks environments and its performance on complex programming tasks, like writing specific algorithms in Python.
- Open-source accessibility: DBRX is praised for being freely available on GitHub and Hugging Face, encouraging widespread experimentation and innovation without financial barriers.
- Comparative performance: Users compare DBRX to GPT-3.5, Gemini 1.0 Pro, and CodeLLaMA-70B, highlighting its competitive edge in efficiency and performance.
- Broader implications for Databricks: Conversations also delve into Databricks' strategy post-DBRX release, including its potential impact on the company's position in the AI and data analytics market.
Overall, the community's hyped about the potential that Databricks may bring to the AI landscape with DBRX.
Wrapping up
Wrapping up, Databricks' DBRX isn't just making waves for its open-access philosophy; it's genuinely setting new standards in AI capabilities. Outshining competitors like Mixtral MoE, Llama-2 70B, and Grok-1 in key areas such as language, programming, and math, DBRX demonstrates what's possible when innovation meets execution. What's equally impressive is how Databricks managed to train such a sophisticated model in just two months with a $10M investment, showcasing their commitment to pushing the envelope in AI development. As we dive into exploring DBRX, we see that its blend of exceptional performance and community-focused accessibility promises to spark a new era of innovation, inviting everyone to contribute to the rapidly evolving landscape of artificial intelligence.