SuperAnnotate announces $36M Series B funding round

My brother and I started SuperAnnotate in 2019 based on my PhD research in image segmentation. We realized that the research could help companies accelerate tedious work in pixel-precise annotation in computer vision, thereby streamlining one of the most time-consuming parts of building computer vision systems. Since then, we have gotten far, recognized as the best annotation platform on G2 and as one of the top startup employers in the US by Forbes. As generative AI emerged in 2023, we took a step from traditional data labeling and evolved into a leading enterprise software provider for creating and managing large-scale multimodal AI datasets.

As of today, we're thrilled to announce the closure of our $36M Series B funding round. The round was led by Socium Ventures, a VC fund backed by Cox Enterprises, with additional participation from NVIDIA, Databricks Ventures, Lionel Messi's Play Time Ventures, Glynn Capital, and several existing investors. The new capital will accelerate our growth and further establish our role as a key player in enterprise GenAI dataset creation, management, and orchestration. Trusted by several Fortune 50 companies, our platform enables organizations to create complex AI models using the highest quality multimodal datasets.

Data quality in modern AI applications

Starting with the launch of ChatGPT, the impact of generative AI has reshaped the ML landscape and set new standards for data quality. Large foundation models and applications like chatbots, multimodal models, retrieval-augmented generation (RAG) systems, and AI agents require training data of high complexity and quality. However, existing solutions were largely ill-equipped to handle these needs. As a result, companies building AI products had to either create their own in-house systems, adapt their projects to fit outdated SaaS offerings, or even manage complex projects with basic tools like Excel—similar to the state of the computer vision annotation market before SuperAnnotate emerged.

We quickly responded to this gap, evolving our platform to cater to the specific needs of multimodal AI applications, complex agentic data flows, and other GenAI-specific use cases. The need for high-quality datasets was becoming more stringent, with each organization having multiple use cases requiring specialized data. Recognizing this need, we saw an opportunity to build an easy-to-use no-code/low-code platform — like a Swiss Army Knife for modern AI training data — capable of fully adapting to enterprises' evolving AI-data needs.

The new AI data wall

Recent discussions (e.g., Ilya Sutskever, The Information) have pointed out a slowdown in the performance improvements of foundation models in certain areas. This is mainly because these models are facing a "data wall" caused by the limited availability of publicly accessible data. This plateau presents a unique opportunity for enterprises with access to vast stores of proprietary, domain-specific data that these foundation models haven't been trained on. By leveraging this data, enterprises can gain a significant competitive advantage by building AI products tailored to their specific needs through fine-tuned models, retrieval-augmented generation (RAG), or custom AI agents.

Unlocking AI data wall for enterprises

The challenge in breaking through this data wall primarily lies in transforming the data in an existing data store, data lake, or data warehouse into high-quality, training-ready data — what we call SuperData. It's no secret that extracting and transforming raw data into SuperData often takes more than 80% of the AI development timeline. As enterprises leverage their data to develop more automated solutions and intelligent services and products for their customers, creating SuperData remains a significant bottleneck. At SuperAnnotate, we address this challenge by providing enterprises with the infrastructure they need to create their own SuperData. Since each use case within this infrastructure is slightly different from one another, we provide building blocks that users can easily adapt to their needs to accelerate and automate the SuperData creation process.

Use cases covered by SuperAnnotate

There are several ways to convert raw enterprise data into data that can be leveraged for building cutting-edge AI products. Some of the most common use cases that we’re seeing today are related to information extraction from a large knowledge base with subject matter experts, creating high-quality multiturn conversations for various LLM models, evaluating and ranking different model or agentic system outputs, etc. Below we have put together a list with more details on some of the many use-cases we are helping our enterprise customers with:

Training data for foundation models: After learning foundational language skills and knowledge on vast pre-training datasets, foundation models are adapted to use cases and human preferences in another data-intensive process called post-training. Leading foundation models require high-complexity and high-quality datasets, which require excellent quality control tooling, workflow orchestration, management tools, and more - which SuperAnnotate provides.
Enhancing RAG systems: Retrieval-augmented generation (RAG) is becoming an essential application for language models in enterprise settings. However, many organizations face challenges in achieving the expected performance. SuperAnnotate provides companies with evaluation data to make informed choices on model selection, prompt strategies, and training data to fine-tune embedding models for specific domains, improving RAG outcomes.
Evaluating agentic systems: Interest in agentic systems, where foundation models get access to tools and are set up to perform tasks autonomously, is growing among our enterprise clients. However, evaluating these systems is complex, as many reasoning steps remain hidden from the user. SuperAnnotate enables evaluators to visualize each step in the reasoning process, providing insights beyond just final output ratings.
Model evaluations: While public benchmarks on general datasets provide a baseline, they offer limited insight into how models perform on proprietary, domain-specific data in enterprise environments. SuperAnnotate empowers companies to conduct thorough evaluations using third-party evaluators or internal domain experts, ensuring reliable performance before deploying models in customer-facing or internal applications.
Synthetic data generation: As AI performance advances, model-generated synthetic data is increasingly used to enrich human-curated training datasets. However, effectively integrating synthetic data requires care, especially in more complex domains. SuperAnnotate's platform enables companies to establish hybrid data pipelines, blending synthetic and human-labeled data for optimal results.

***

As generative AI evolves, so does the demand for better, smarter enterprise data solutions. Today's funding boosts our capacity to enhance our multimodal data offerings—the building block for enterprises to build intelligent AI systems. The future of AI starts with the right data, and SuperAnnotate is here to be the data partner you can count on.

SuperAnnotate announces $36M Series B backed by Socium Ventures, NVIDIA, Databricks, and other investors

Contents

Data quality in modern AI applications

The new AI data wall

Unlocking AI data wall for enterprises

Use cases covered by SuperAnnotate

Vahan Petrosyan

Recommended for you

Stay connected