The Role of AI, Data and Analysis in True Digital Transformation in Materials and Chemistry R&D
AI for chemistry is a great domain with the expanding field of Deep Learning and Big Data. A substantial number of researches in Chemistry point to the enhanced effectiveness of any innovation created using AI Machine learning and deep learning capabilities. These new-gen technologies influence the adoption of agile strategies for a progressive and sustainable digital transformation within the industry.
The global material science industry is undergoing a massive transformation, powered by emerging capabilities in the fields of Artificial Intelligence (AI) and machine learning, data management, and analysis. It’s true that every organization is aiming for the digital transformation of their various operations, and deploying AI with Big Data gives them the fastest channel to adapt to digitization goals. While digital transformation remains a buzzword in every sector; materials science and chemistry R&D is behind the curve but certainly no exception. The prospects are, of course, very attractive, yet the reality is far, far harder. The status of this transformation, the enabling solutions, and those unresolved pain points are all examined below. IDTechEx has released the latest report on the various data-centric approaches to materials R&D that build data infrastructures and leverages machine learning solutions. The report is titled “Materials Informatics 2022-2032”, which provides the most comprehensive commercial overview of the field.
There are 3 main considerations to truly enable a digital transformation in materials science R&D: data entry and management, the physical or computation experimental data, and AI-driven screening and analysis.
Data Entry and Management
This is the genesis of any digital transformation and shows quite how far chemistry and materials science must go. Before you get anywhere you need to have your data electronically available in suitable formats, even today the jump to electronic lab notebooks and overcoming data silos is not widely employed. There are plenty of solutions out there, but until this is tackled the transformation can only go far, as Sasha Novakovich, CEO of Alchemy Cloud, stated to IDTechEx:
“For years firms have lauded the benefits of AI in material science: accelerated discovery; faster derivative product development; streamlined material compatibility. The dirty secret you likely didn’t hear: none of these outcomes are possible with mismanaged, unstructured data. If you bought into AI thinking that decades’ worth of stitched together Excel files were the magic wand, you’re likely on a longer (and more expensive) journey than required. From day 1, driven by our insight that better data quality yields better predictive accuracy, Alchemy has been building software solutions to optimize for AI-ready data. AI-ready, in the context of the lab, means data that is automatically validated, formatted, labeled based on your custom ontology, and connected to other data in real-time without the need for chemists to do anything other than their routine work.”
Many notable companies have made this change, most notably in Japan, with some even going back to get their extensive historical data in a usable format.
Internal data is an essential part of any companies IP, but there is also the ability to leverage external data sources within a digital approach to R&D. The public and private data repositories are increasingly common that vary from being highly specific to very broad. In addition, there are many consortia being established to pool knowledge and establish best practices even amongst traditional competitors, a prime example is the Materials Open Platform (MOP) which is a joint effort in establishing a polyolefin database and involves data sharing between Mitsubishi Chemical, Sumitomo Chemical, Asahi Kasei, Mitsui Chemicals and with NIMS at its core.
Physical and/or Computational Experimental Data
How the data is stored is important, but materials informatics is nothing without the data itself. One routine challenge is sparse, high-dimensional, biased, and noisy data. Combinatorial chemistry, high throughput screening, and general laboratory automation has become much more common for physical experiments with numerous tailored instruments available. However, this is still not mature and in man cases still an overly manual process that results in scientists carrying out repetitive tasks rather than applying their domain expertise. There are still innovations arising to tackle this solution, as shown by a spin-out from Northwestern University in the USA, Professor Chad Mirkin stated to IDTechEx that:
“In the race to bring AI to materials discovery, it is all about the size and quality of the datasets. Stoicheia, with its proprietary MeglibraryTM technology (high-density chips that contain upwards of billions of positionally-encoded materials) and companion screening tools, is uniquely positioned to mine and monetize the materials genome.”
Modifying the synthetic composition or process is essential but characterizing and quality control also become key pieces of the puzzle that must not become a bottleneck. Computational chemistry ad materials science has come a long way in the past decade and can provide key insights and data inputs to this data-centric approach. The challenge is the speed and degree of complexity that can be achieved with current computing technology, it is no surprise that chemistry is considered as one of the key applications for quantum computers. The modeling field continues to advance at pace, typified by the likes of OTI Lumionics who are initially applying their approach to advanced materials for OLED displays.
Dr. Michael Helander, President and CEO, stated to IDTechEx said: “The key bottleneck in materials informatics, regardless of the approach, algorithms or AI/ML used, is access to large-high-quality datasets, which is typically lacking in chemistry and materials science. Highly accurate simulations of chemical and materials properties are thus required to accelerate the digital transformation of these fields. Quantum computing techniques, which offer a path to high accuracy simulations in reasonable compute time, are therefore an essential element of future materials informatics roadmaps.”
AI-Driven Screening and Analysis
The third piece of the equation is the one that grabs the headlines, using artificial intelligence in R&D to drive the screening, guide experimentation, and enhance the analysis. Machine learning can be used in several ways, it can be looking to optimize many multi-variable properties, it can be in learning new structure-property relationships, or virtual screening for the desired candidate. A wide range of bespoke supervised and unsupervised learning techniques have been deployed and although there are success stories many are tackling the same challenging data problems. Two central themes arise across many approaches when handling challenging materials datasets, understanding the uncertainty in your model, and leveraging domain knowledge. There are numerous emerging companies offering materials informatics platforms, one of the most prominent is Citrine Informatics and their CEO, Greg Mulholland, stated to IDTechEx that:
“At Citrine, our approach to AI is tailor-made for materials and chemicals. We know that our customers don’t have enormous volumes of structured like big tech companies, and often their data is sparse or incomplete, so we’ve developed a modeling approach that leverages uncertainty quantification for each prediction and allows our customers to incorporate their scientific knowledge into the modeling process in the form of analytical relationships and materials-specific descriptor libraries. These capabilities, plus “no-code” AI model creation and deployment, allow our customers to develop new materials up to 98% faster than traditional product development approaches.”
This problem is very different from the deep learning advancements many envisage when considering autonomous cars or search engines. Certain examples do have access to reasonable datasets, and more known inputs, but this is not the case for most of these real-world problems and is perceived as an insurmountable barrier.
Intellegens is another company tackling this problem for numerous sectors, Steven Warde commented on this problem: “To fully leverage this technology there is a perception that vast amounts of data will be required, however, in the real-world data is often limited and of poor quality, especially in industrial or experimental settings. Whilst many organizations are moving ahead with digitalization strategies to collect and manage this data in an attempt to improve the quality there are options for working with the limited data they do have now. Using techniques that clearly highlight uncertainty will enable insights from available data no matter how small, using models from limited data to help guide which data is most important to collect next and merging with other information sources, internal or external will add more understanding to the models.”
Interestingly both Citrine Informatics and Intellegens have partnered with large companies primarily in engineering software in Siemens and Ansys, respectively. This shows the capability of these processes progressing towards inverse design and reversing the relationship between design engineers and their material suppliers. It should also be noted that this analysis does not have to stop at physical properties, but could be content with supply chain variations, toxicity, and price.
So, What’s the End Goal?
The ideal goal is where this transformation is not discussed and instead, a materials informatics solution is any scientist’s toolkit. That is a long way to go, but not starting that journey could be catastrophic as more agile and disruptive R&D divisions emerge.
One idealized solution is a way in which all three solutions are combined into a self-driving or autonomous laboratory. There are a handful of exciting university demonstrations, but this is now starting to even come into the commercial sphere. The leading start-up at the forefront here is Kebotix, and their CEO, Dr Jill Baker, stated to IDTechEx that:
“Our secure cloud and data technologies coupled with AI, machine learning, physical modeling, and automation enable Kebotix’s revolutionary autonomous, ‘self-driving lab’ to deliver an efficient, optimized R&D process — something I like to call ‘Fast R&D @Scale’. By discovering novel materials quicker, Kebotix offers inventive solutions that create new, disruptive chemistries and materials at a rapid pace while reducing costs. Even better, our sustainable solutions are addressing some of the world’s most urgent needs.”
Where are the Early Successes?
Many will read this article and think this is great, but how will they get involved. There are numerous strategic approaches to end-users and a range of business models being deployed by external providers, each with their respective strengths and weaknesses. As seen, this is an attractive place for young companies; not only is there plenty of interest in AI but rather than requiring considerable funds and taking 10+ years to generate any notable revenue to bring new material to market, just a small amount of computing power and a start-up can start bringing in consulting revenue overnight or progress towards a MI subscription platform. Lots of end-users are looking to build these capabilities in-house and there are even external companies, such as Enthought, looking to support this training. Many of these companies have been highlighted throughout the article and a fully comprehensive list is available in the market report, but a more recent trend has been in companies more focussed on specific applications such as Matmerize and Polymerize (for polymers) and Aionics (for battery electrolytes).
The other key question that arises is where are the demonstrated success stories that have shown a clear value-add. Now, this is always challenging to prove, and despite claims from early adopters of rapidly reduced research hours and expenditures a genuine side-by-side comparison is practically never seen.
IDTechEx have reported on these case studies and believes that given the status of the technology the most promising fields are in thin-film materials and liquid formulations, the latter is certainly where most of the commercial activity is seen in polymers, coatings, lubricants, and electrolytes. That is not to say we will not see increasing results and adoption elsewhere, there are some early wins in metal alloys, heterogeneous catalysts, superconductors, and many more. Rather than considering the material families, it can also be beneficial to look at problems that this has seen success in such as screening for a band gap, mapping a phase diagram, or reducing your computational load.
The digital transformation of chemistry and materials science R&D is behind the times and in many ways only just waking up to all the technology advances the first 2-decades of the 21st century has offered. This will change a lot in the next 2-decades as the revolution begins.
[To share your insights with us, please write to email@example.com]