The story about vision of vision
Author: Tigran Petrosyan, Co-founder and CEO at SuperAnnotate
I know the title may be misleading… This post is not about the models from journals. I want to talk about the models that are the key part of arguably something sexier and magical for many, including my team and I.
Luckily, nature awarded us, human beings, with two fascinating biological cameras. They may not be doing a great job to record, store and analyze all the visual information in their sight over time, but hey, they help us see objects, orient ourselves in space, be fascinated about art, etc. What if we had multiple eyes that we could place anywhere we want – maybe in the buildings, streets, cars, drones, robots, satellites, literally anywhere. Imagine they could give us any information we need anytime - maybe cars could analyze the surroundings and drive themselves, robots could overcome obstacles, differentiate and sort different types of wastes, picking fruits and vegetables, or maybe checking how your kid is doing in their room. I am sure you can think of countless cases of how such “eyes” with such visual perception capabilities can make our lives easier.
Back in the mid-2010s, when I was a PhD student in Applied Physics and Biomedical Imaging in Switzerland, I could clearly see the several use cases of how computer vision (CV) could help radiologists give a faster and better diagnosis to save even more lives. With that thought in mind, I gave a TEDx talk in the beautiful town of Luzern in 2016 to show how AI and automation can bring about positive transformation. During that time, my brother was doing his PhD research in Sweden and was working on cutting-edge image segmentation technologies called superpixels to improve the state-of-the-art and speed up the most time-consuming image annotation job. Annotation is the backbone of any CV-based object detection task and is very labor intensive. Just to give a perspective, tens of thousands to millions of images need to be collected, labeled, and managed for a single CV project. Quite quickly, my brother’s results were getting better and better, and at some point in the mid of 2018, it became apparent that his algorithms can accelerate CV not only for Biomedical Imaging but also countless other applications. This led my brother and I to drop out of our PhDs and start our adventure that we called SuperAnnotate to provide image annotation service with our superpixel-based tooling.
At that time, I couldn’t even imagine how an innovative superpixel algorithm would become a start of something much bigger. Just after starting the company, we were very fortunate to meet some of the most prominent figures and visionaries in CV and ML, including professors Pieter Abbeel and Trevor Darrell, and later Gary Bradski, the founder of OpenCV - the largest open-source CV library. They eventually joined our advisory board and helped us see the bigger picture that shaped our long-term mission and vision. Early on, we were also very fortunate to partner with Point Nine Capital as the lead investor in our Seed funding round, who would constantly push us further and always think about the bigger value. We were quite quick to realize that there is much more value in the platform we were building, which we released shortly after our Seed round.
Although relatively new, computer vision applications are continuously spreading across each and every industry, from autonomous vehicles, robotics, and security to less obvious ones like medical imaging, retail, insurance, and agriculture, just to name a few. Soon, every industry will be building a system with visual perception capabilities. We are just entering a new era of computer vision where a large number of companies, from startups to larger enterprises that are not necessarily ML first, are starting to develop their first CV applications. To be successful, these companies need the right tooling and infrastructure to make their CV application development much easier and faster.
The data preparation and labeling solutions only have already created a $3.8B market in 2020 that is expected to double in the next 4 years. While the market for computer vision is estimated in the order of tens of billions, we can see how computer vision will become a key layer of any business infrastructure in 10 years or so, akin to Amazon Web Services or Stripe. This is just the tip of the iceberg we see now compared to what is yet to come.
Yet, building computer vision pipelines is not a piece of cake, to say the least. According to multiple sources, over 80% of computer vision projects fail and ultimately don’t end up in production. This is predominantly due to substantial challenges in creating and managing these tens of thousands to millions of high-quality labeled visual data, which is still predominantly a manual labor job. More importantly, monitoring, improving, and versioning such a large amount of datasets and their corresponding ML model performances becomes a huge pain without the right infrastructure in place. Previously, this was mostly accessible with relatively large companies with large in-house, dedicated teams of engineers that could build and maintain these deeply integrated systems that are needed to make production-level CV possible.
We started with an annotation platform, but now we are addressing all those problems with the first of its kind end-to-end application development platform to build computer vision pipelines.
- First, we have built an image and video annotation platform streamlining workflows between CV engineers, managers, and labeling teams thanks to our advanced QA system and data management infrastructure.
- Second, we have created a managed marketplace of service providers to match our customers with the best annotation team for their projects to ensure the high quality of annotations.
- Third, we enable CV engineers to create custom models with our NoCode neural network training enabling not only to test the model on the go but also semi-automate the labeling process.
- Ultimately, thanks to our data curation system, we are closing the loop by allowing our users to query and review the subset of the data, monitor and improve model performance to make sure it’s ready to push into production.
By this, we want to enable every company from academic researchers and CV enthusiasts to startups and enterprises to build, test, and push their CV products into production 3-5x faster and make them successful.
As we are focused on computer vision, we are convinced that dealing with only visual data enables us to build all the essential components for the successful completion of CV projects, and that success lies in seamlessly linking each of these components together within the same platform. For those who want to learn more about this, I recently wrote a comprehensive overview on the 7 key considerations to successfully build and scale computer vision pipelines.
And we are at this exciting point now that we just raised $14.5M in our Series A round of funding with some of the world-class investors led by Base10 Partners. Interestingly, we met Base10’s managing partner TJ Nahigian for the first time when we just started our company and were presenting our superpixel solution at the HIVE Ventures Summit in Yerevan, and it was thrilling to reconnect with TJ after 18 months and realize how much we have evolved after that. I am more than convinced that TJ’s enthusiasm and the sheer knowledge of our space and its future will be a big asset for our further development and growth, and we are more than happy to have him and the whole Base10 team with us in this uplifting journey.
The funds will allow us to expand our team in the US, Europe, and Asia to further advance our solutions and continue building the most state-of-the-art solutions for our users to advance their computer vision products. There is still a lot of work to do to create more automation, more integrations, advanced analytics, etc, to constantly be on top of the rapid development of CV and make sure our users create the most state-of-the-art magic.
About SuperAnnotate
SuperAnnotate is helping companies build the next generation of computer vision products with its end-to-end platform and integrated marketplace of managed annotation service teams. SuperAnnotate provides comprehensive annotation tooling, robust collaboration, and quality management systems, NoCode Neural Network training and automation, as well as a data review and curation system to successfully develop and scale computer vision projects. Everyone from researchers to startups to enterprises all over the world trust SuperAnnotate to build higher-quality training datasets up to 10x faster while significantly improving model performance. SuperAnnotate was recognized as one of the world’s top 100 AI companies in 2021 by CB Insights.