• Turing Post
  • Posts
  • MatterGen's Breakthroughs: How AI Shapes the Future of Materials Science

MatterGen's Breakthroughs: How AI Shapes the Future of Materials Science

Tian Xie from Microsoft Research talks about the transformative power of AI in material science and the future of sustainable technologies

If you like Turing Post, please consider to support us today

In January 2024, Microsoft Research published a paper on MatterGen: a generative model for inorganic materials design. This model employs diffusion modeling, similar to how models like DALL-E generate images, but it generates novel and potentially stable inorganic material structures. Unlike previous models, it focuses on generating materials likely to be synthesizable. MatterGen can also be fine-tuned to create materials with specific desired properties, such as specific chemistry, symmetry, or electronic/magnetic characteristics. This opens up the potential to accelerate the discovery of new materials for use in energy solutions, catalysis, and advanced technology. We invited Tian Xie, Principal Research Manager at Microsoft Research, to talk more about MatterGen and his views on the role of AI in the future of sustainable technologies.

What led you to the intersection of deep learning and materials discovery? What was the moment of epiphany when you realized the potential of AI in this field?

When I started my PhD, AlphaGo had just come out and beaten the Go world champion Lee Sedol. I was deeply impressed because I know mastering Go requires years of training to gain human intuition. It is similar to designing new materials, which still largely depends on a materials scientist’s intuition gained from years of experience. This was the moment of epiphany for me to realize that AI will have a major impact on materials design in the coming years. It prompted me to choose AI for materials design as my PhD topic, even though at that time very few around the world were working on this.

MatterGen represents a significant leap in materials design, akin to the evolution seen in generative AI for text and images. What inspired its creation, and what specific challenges in materials science does it aim to address?

The traditional approach to materials design is based on screening a limited set of known materials and filtering candidates satisfying the design requirements for specific applications. However, the size of known materials is only around 10 to the 5th, a tiny portion of hypothetical material space of at least 10 to the 10th to 10 to the 12th. MatterGen was created to explore this significantly larger hypothetical material space by generating novel materials directly guided by a broad set of design requirements. It enables the discovery of more diverse and better-performing materials than those that could be found via screening.

MatterGen operates as a diffusion model, a term more commonly associated with image generation. Can you explain in layman's terms how this model works in the context of generating materials with desired properties?

During the training, MatterGen takes the 3D structure of known materials and iteratively adds noise to corrupt its atom types, position, and periodic lattice to approach a random structure. Then, it trains a score network to denoise the corrupted 3D structure to its original structure. During the generation, MatterGen samples a random structure and iteratively applies the scoring network to generate a novel structure. You can also train individual conditional score networks to guide the generation towards materials satisfying different conditions, like electronic, magnetic, and mechanical properties, chemical elements, and symmetry.

Implementing diffusion models for something as complex as material generation must come with its own set of challenges. Could you highlight some of the technical hurdles you faced and how you overcame them?

There are many unique challenges and I’ll use this opportunity to highlight two of them. First, as in most scientific domains, we are operating in a limited data region. We need to build materials-specific inductive bias into our model. We use geometrically equivariant networks to ensure the scoring network satisfies the required symmetry of materials. We also employ specific designs to handle the periodicity of crystalline materials in both the diffusion process and the architecture of our score network. These designs helped to significantly improve our generation quality. Second, it is generally much more computationally expensive to simulate material properties than to simulate structures, so we have a much smaller dataset of labeled data. We employed techniques from ControlNet to fine-tune our unconditional score network to generate materials given diffusion conditions, which significantly improves the quality of conditional samples. 

Your work mentions that the results are verified via density functional theory (DFT), with real-world experimental verification as the next frontier. How hard is to bridge this gap, and how close are we to seeing materials designed by AI being widely used in everyday applications?

We’ve already seen lots of examples of DFT-validated materials being synthesized and experimentally tested for different applications, including batteries, alloys, and catalysis. For example, Microsoft’s Azure Quantum team synthesized a novel solid-state electrolyte material by collaborating with the Pacific Northwest National Laboratory (PNNL). It is not a trivial process, because it usually involves iterating candidates with experimentalists and domain experts by several rounds. However, I am optimistic that we will see materials designed by a generative model being synthesized and tested in the near future. In the beginning, maybe 1-2 out of 10 candidates will be successful, but it will gradually improve, and people will build more trust in these generative models. This is precisely what has been happening in generative AI for small molecular drug discovery over the past few years

Generative AI has revolutionized several domains. What are the unique opportunities (maybe those you haven’t tackled yet!) in applying these technologies to the design of novel materials?

I think there are opportunities to expand the diffusion model to other classes of materials beyond crystalline materials, like polymers, metal-organic frameworks, etc. They will unlock the model to tackle a broader set of questions like finding recycling plastics or carbon capture materials. 

With global sustainability challenges mounting, how do you envision deep learning and generative AI playing a role in crafting solutions? 

I think this is one of the most exciting frontiers in materials design. Many sustainability challenges are bottlenecked by finding suitable materials. If we can find a cheap material to absorb and recycle CO2 from the atmosphere efficiently, for example, it can be used to create carbon capture plants at scale and be a critical part of our solution to address climate change. (Note: Tien Xie is currently working on Project Carbonix.) If we can find a catalyst that electrochemically reduces iron ore, it can help electrify the production of steel, which is responsible for around 7% of the CO2 emissions. Generative AI can help find better materials in many problems like these and speed up our transition to a carbon-zero future. 

The development of MatterGen was a collaborative effort involving a diverse team. How does interdisciplinary collaboration fuel innovation in AI research, particularly in niche fields like materials science?

Interdisciplinary collaboration is crucial in fueling AI research in fields like materials science. In fields like vision or NLP, it is relatively easier to judge the performance of the model because they are based on a more universal understanding of images and language. In materials science and many other scientific domains, we need strong domain expertise to contextualize the results and understand whether the AI models are making a real-world impact. Furthermore, strong domain expertise is needed to define the tasks and generate data for building AI models to solve scientific problems. 

As AI begins to play a pivotal role in materials discovery, what ethical considerations come to the fore? How does Microsoft Research navigate these?

Beyond the domain of materials discovery, Microsoft believes that when you create technologies that can change the world, you must also ensure that the technology is used responsibly. Our work is guided by a core set of principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. We put those principles into practice in Microsoft Research with a cross-company approach through cutting-edge research, best-of-breed engineering systems, and excellence in policy and governance.

Are there areas of research outside your main interests that you keep an eye on?

The application of AI to other challenges in the healthcare and life sciences space is showing amazing progress and potential. There are a lot of commonalities between drug discovery and materials discovery. I’ve been inspired by my colleagues’ work in precision medicine as well. 

What book would you recommend to aspiring data scientists/ML engineers? (It doesn’t necessarily have to be about ML!)

I always recommend data scientists/ML engineers read the most basic textbooks about ML. Lots of recommendations, like the latest “Deep Learning: Foundations and Concepts” by Chris Bishop, who leads the AI4Science lab at Microsoft Research, and “Deep Learning” by Ian Goodfellow et al. The key is to spend time and fully understand the basic concepts of ML.

Thank you for reading! if you find it interesting, please do share or upgrade to Premium to support our effort 🤍

Join the conversation

or to participate.