Figure Eight Announces Datasets, Video Object Tracking, and Smart Bounding Box Annotation to Accelerate the Adoption of AI
The Human-in-the-Loop leader combines machine learning and human intelligence to create the high quality, large scale, structured data necessary to make AI work in the real world
Figure Eight, the essential Human-in-the-Loop artificial intelligence platform for data science and machine learning teams, today unveiled at their annual conference Train AI the new offerings to accelerate the adoption of AI by more businesses: Figure Eight Datasets, Video Object Tracking, and Smart Bounding Box Annotation capabilities.
Figure Eight Datasets is a free, curated repository of versioned, open-source training data the industry needs for benchmarking and advancing critical machine learning deployments and research. Today, the repository launches with the publication of eight originally developed and researched training datasets constructed from millions of human-generated labels. The inherent value of high-quality, publicly-available training datasets of this size and type is unique, and these sets also include an unusual but critical transparency of methodology: each dataset contains the raw input data, the original settings for Figure Eight’s human-in-the-loop workflow, and the logic that produced the curated, validated data set. The initial eight Figure Eight Datasets are:
Open Images Dataset V4 (Bounding Boxes)
A set of 1.7 million images, annotated with bounding boxes for 600 classes of objects, served in collaboration with Google.
Medical Images for Nucleus Segmentation
21,000 nuclei from several different organ types annotated by medical experts.
Transcriptions of 400,000 handwritten names for Optical Character Recognition (OCR).
San Francisco Parking Sign Detection
Parking sign detection and parsing from images of San Francisco streets.
Medical Speech, Transcription and Intent (English)
A collection of audio utterances for common medical symptoms.
Medical Information Extraction
A dataset of relationships between medical terms in PubMed articles, for relation extraction and related natural language processing tasks.
Multilingual Disaster Response Messages
A set of messages related to disaster response, covering multiple languages, suitable for text categorization and related natural language processing tasks.
Swahili Health Translation, Speech, Transcription and Topics
A collection of health-related audio recordings in Swahili created in collaboration with Translators Without Borders and the Red Cross.
“Today data scientists struggle to find high quality, relevant benchmark datasets for testing their Machine Learning algorithms. In fact, most existing datasets are limited in the languages and cultures that they cover,” said Robert Munro, CTO of Figure Eight. “These eight training data sets were selected because they represent real-world problems that are important to solve or because they are tough machine learning problems. The Figure Eight Machine Learning Team selected these datasets from a broad range of candidates. We think it will enrich the machine learning community to have more datasets to work on, and it will enrich the world because those datasets will help make real-world AI available to more people.”
Figure Eight is also announcing new platform capabilities, the first of which is Video Object Tracking. Figure Eight Video Object Tracking allows machine learning teams to annotate an object within a video frame, and then have that annotation persist across frames within the video, all while ensuring that every frame is accurately reviewed by a human where high quality annotation is required. This object tracking capability is essential to annotate video content at scale in applications such as autonomous vehicles, security surveillance and media entertainment. Without object tracking capability, the cost and time required to annotate individual frames in video would be prohibitive and make AI applications that need to understand objects moving through time and space untenable. Video is a growing data format, with over 500,000 hours of video uploaded and 1 billion hours of video consumed on YouTube every day. Figure Eight Video Object Tracking is available now as a private beta and will become generally available to all customers in Q3. For customers who want to sign up as a private beta customer, they can register here.
The second capability is Figure Eight Smart Bounding Box Annotation. This feature allows machine learning teams to leverage the power of deep learning to accurately identify objects in computer vision applications. The Figure Eight Smart Bounding Box Annotation capability comprises two new features: Predictive Bounding Boxes and Intelligent Bounding Box Aggregation.
The Predictive Bounding Boxes feature greatly reduces the human effort to identify, label and draw bounding boxes around objects in images. High-confidence bounding boxes are created by Figure Eight’s deep learning model instead of human annotators, freeing human annotators to confirm, adjust, or remove a predictive bounding box to ensure that it correctly labels an object of interest. The intelligent aggregation feature addresses the problem of how to determine the ‘correct’ bounding box when multiple people have drawn bounding boxes around an object that are slightly different from each other. Optimized over millions of past bounding box jobs on the Figure Eight platform, Intelligent Bounding Box Aggregation addresses this problem using deep learning computer vision combined with expertise in quality control for human annotation. Intelligent Aggregation uses the bounding boxes created by any number of humans, the past history of those people’s accuracy, and the image content itself, to create a single, optimized bounding box for each object. The bounding box placement is accurate down to a single pixel, allowing Figure Eight customers to have the most accurate possible human-driven object detection for their Computer Vision models. The Figure Eight Smart Bounding Box Annotation capability is available now as a private beta and will become generally available to all customers in Q3. For customers who want to sign up as a private beta customer, they can register here.