Behind each autonomous driving technology, behind each online generator of artificial faces or behind each massive facial recognition system, there is an enormous task of labeling the data that will later be used to train the corresponding AIs. It is a heavy task that requires thousands of people to spend the day identifying characteristics of an image, a video or an audio file.
Although the focus is usually on the technology that is developed from this data, there are also companies innovating to achieve advances and improvements in the field of labeling. One of them is Scale AI, a startup founded in 2016 (when its current CEO was only 19 years old) and that has just managed to raise 100 million dollars in its C round of financing (more than 5 times what was raised in the series). B last year).
The last ‘unicorn’ to join the club
After this, the company has been valued at 1 billion dollars, which makes it a member of the select club of ‘unicorns’ in the technology sector. “We are proud of what we have built in these last three years,” explains Alexandr Wang, the CEO of Scale AI.
Peter Thiel, technology investor and former adviser to Donald Trump, is clear about the importance of this company: “AI companies come and go as they compete to find the most effective machine learning applications. But Scale AI is going to last longer.” time, thanks to the fact that it is the provider of the central infrastructure of the main players in this market”. Among these ‘players’ are many of the big Silicon Valley companies: Waymo, Uber, OpenAI, Airbnb or Lyft can be cited among their clients.
According to Wang, “Billions or billions of sample data are required for AI systems to reach a human-like level of performance. There is a big gap between a few giant companies that can afford this kind of training and the many they can’t.” But the amount of data available is not the only challenge companies face: the data must also be of good quality.
Not all of this data requires the same labeling work. In the case of photos intended for the training of autonomous driving systems, this task can require between 10 minutes and two hours: the human who analyzes and labels each image must mark with the mouse pointer the outline of each car, pedestrian, building or semaphore that appears on it. For other clients, the task may be to analyze a wide variety of texts to optimize natural language processing.
Graphic showing the role of data labeling in training artificial intelligences. (Via: Waymo)
Scale AI’s objective is to optimize the labeling process for both men and machines: for the former, it has developed a set of software tools that automatically label images in a first stage, to later deliver it to the network of 30,000 freelance workers who they should only review and correct this first labeling. “Tasks that used to take hours end up being solved in just a couple of minutes.”
Regarding the human factor, they have also developed software to identify the best labellers. “Humans have a critical role to play in what we’re doing because they’re there to make sure that all the data we provide is really high quality,” Wang explains.
If Scale AI stands out, it is not for lack of competition: two months ago, Uber acquired another startup with a similar profile (Mighty AI) and other companies, both established (Amazon, through Mechanical Turk) or newly created (such as Hive). they do a similar job. But according to Scale AI investors, its tools stand out for being more accurate, faster and cheaper.