Working with hundreds of clients and spending thousands of hours annotating data, we’ve accumulated experience and major pain points machine learning engineers encounter at the very first step of building AI models — data annotation. That’s why we’ve decided to launch a series of webinars to cover the questions often discussed in our AI community.
In this webinar, we’ve referred to some of the most daunting questions that seem to slow down the annotation process:
- How to select the right data to annotate?
- How do we prioritize data to improve model performance after deployment?
And that’s where we need active learning.
What is active learning?
Active learning is an algorithm that can interactively query a user or other information source to label new data points with the desired outputs, i.e., let your model pick the training data. We know that deep learning models need a lot of data, but in reality, we do have constraints like budget, timing, etc. Active learning helps us to get the best of existing data and, as a result, reduce dataset size for comparable performance or increase performance with similar dataset size.
During the webinar, we introduce the two most common use cases of active learning before and after deployment.
Pool setting
You’re starting a project, you have a large pool of data, it’s just too much to label, so how do you select the right samples?
Stream setting
After deployment you have the model ready, it’s out in production and it gets a constant stream of data. You want to improve your model but it’s too much to decide which samples should be passed to data storage for labeling.
During the webinar, we open up these use cases in detail and show you how to use active learning in practice. We also take the time to answer some of your questions:
What kind of approach are we using to measure uncertainty on the neural networks?
- How many classes can active learning be used with?
- What are the best practices for the quality assurance process?
- When else does active learning become an important part of the workflow?
- How well does active learning scale with combinations of datasets from different sources?
- Are there any other metrics that can be used besides uncertainty and class distribution?
In a nutshell, after watching this webinar, you’ll learn how to:
- Set up a way to effectively select samples to label when you have a large pool of data.
- Select the right data that comes in when we run our model in production and fine-tune it.
- Find a way to be data-centric whit limited resources.
Please leave your email below to access the webinar recording.
Who we are
We are SuperAnnotate — the world’s leading platform for building the highest quality training datasets for computer vision and NLP. With advanced tooling and QA, ML and automation features, data curation, robust SDK, offline access, and integrated annotation services, we enable machine learning teams to build incredibly accurate datasets 3-5x faster.
By bringing our annotation tool and professional annotators together we’ve built a unified annotation environment, optimized to provide integrated software and services experience that leads to higher quality data and more efficient data pipelines.