Microsoft AI for Earth and SpaceNet Training Data Now Available on Radiant Earth Foundation’s Open Repository for Geospatial Training Data
Radiant MLHub Is the World’s First Cloud-Based Open Library Dedicated to Earth Observation Training Data for Use with Machine Learning Algorithms
Radiant Earth Foundation announced the availability of Microsoft AI for Earth’s Chesapeake Bay Land Cover and SpaceNet’s Roads and Buildings training datasets through Radiant MLHub, an open digital training data repository that debuted earlier this week with “crop type” labels for major crops in Kenya, Tanzania and Uganda.
Designed to encourage widespread data collaboration, Radiant MLHub allows anyone to access, store, register and/or share open training datasets for high-quality Earth observations. Shared data and models are accessible via a standardized API, and can therefore move across organizations, governments and sectors in order to unlock new opportunities for data-based insights. Moreover, Radiant MLHub features a global map of geospatial training data location that can be used to identify under-represented geographical areas from which more training data are needed.
The addition to Radiant MLHub of the “Chesapeake Bay Land Cover” and “SpaceNet Roads and Buildings” training datasets will make it easier for individuals and organizations working on conservation, land cover and land use change, urban planning, rural development and related issues to discover and access data for use in training their machine learning algorithms and validating their models for accuracy.
Radiant MLHub is an interoperable solution for sharing training data and is compatible with all commercial and private cloud repositories. The new “Chesapeake Bay Land Cover” and “SpaceNet Roads and Buildings” training datasets are stored and managed by Microsoft AI for Earth and SpaceNet, respectively.
The “Chesapeake Bay Land Cover” dataset — which can be used to assess generalization of land cover classification methods (i.e., whether a model trained on data from one U.S. state, like Maryland, can be used to generalize land cover classification over the rest of the Chesapeake Bay region) — includes land cover classifications based on six classes, high-resolution USDA NAIP imagery, USGS Landsat 8 medium-resolution imagery and associated land cover classification, as well as Bing building masks. This dataset helps improve generalization efforts and potentially can serve as a basis for similar watershed datasets.
The “SpaceNet Roads and Building” dataset, on the other hand, focuses on the problem of object detection and classification in high-resolution imagery. SpaceNet LLC is a nonprofit organization run in collaboration by IQT CosmiQ Works, Maxar Technologies, Amazon Web Services, Intel AI, Capella Space, Topcoder, and IEEE GRSS. SpaceNet develops and open sources datasets of labeled, high-resolution satellite imagery over 10 urban areas including Shanghai, Khartoum, Mumbai, and Dar es Salaam. They use those datasets in public data science challenges focused on automated building footprint identification and road network extraction for routing. To date, they have hosted five challenges, open sourced 23 winning algorithms, and distributed $250,000 in prizes. The datasets and open source computer vision models will help disaster response workers and other researchers identify buildings quickly using automated techniques.
“Radiant MLHub is an asset for data scientists and geospatial professionals globally,” says Radiant Earth Foundation Chief Data Scientist Dr. Hamed Alemohammad. “Considering the growing volume of multifaceted Earth observations that is available for research and applications, developing innovative, open and collaborative tools for geospatial analysis is a must. Radiant MLHub is built on this principle, and we are excited to partner with leading organizations that also invest in high-quality training data generation, such as Microsoft AI for Earth and SpaceNet.”
Microsoft AI for Earth Program Director Dr. Dan Morris says, “deep learning is poised to accelerate geospatial analysis workflows, but we’re not there yet. The most important way to ‘move the needle’ is to make curated, labeled training data available to get the machine learning community working on these problems, and to allow for standardized evaluation of algorithms. Radiant MLHub is a huge step in this direction, and we’re fortunate that the Chesapeake Conservancy has been willing to contribute their invaluable data to this effort through their work with AI for Earth.”
“We’ve seen through the years how building a robust ecosystem of developers, data scientists, and researchers across academia and industry accelerates creation of AI-enabled geospatial product,” says IQT CosmiQ Works Senior Data Scientist and SpaceNet LLC Challenge Manager Dr. Nick Weir. “Incorporating the SpaceNet datasets of open source labeled very high-resolution satellite imagery into Radiant MLHub will help achieve this goal. In the coming months we will be extending our datasets with new modalities (including synthetic aperture radar), and we plan to continue to work with Radiant Earth to ensure those data are made available through Radiant MLHub.”