For years, the industry has talked enthusiastically about the potential for artificial intelligence to usher in a myriad of expected benefits. At the grandest level, predictive chemistry should accelerate discovery. Predictive chemistry should speed new and derivative product development. It should facilitate assessments of material compatibility and manufacturability. In general, predictive chemistry should speed our understanding of cause-and-effect relationships between ingredients and technical or functional performance or structure-property relationships.

The potential value is everywhere and enterprises across the globe have added “pursue AI” or “technology innovation” to their corporate strategies. Chief Innovation Officers and teams of data scientists are already working to bring this vision to life. But it’s going more slowly, and with spottier results, than many had hoped.

So, is predictive chemistry possible? Why is it taking so long to turn the vision into reality? And what can companies be doing today to accelerate their predictive chemistry strategies?

This article will dive into the following topics in detail:

1. Why product development is so important right now.

2. Why predictive chemistry will be a game-changer for the industry.

3. What factors make it difficult to actualize?

4. What you can do to make predictive chemistry a reality.

Why product development is so important right now.

Humankind is facing existential threats.

World Economic Forum data from the last couple of decades shows us that our industry is currently faced with a number of existential threats. Historically those threats have been much more geopolitical and economic in nature. Now we’re experiencing many more environmental and technological threats like extreme weather, human-made environmental disasters and cyber attacks that can have serious impacts on our business. Chemicals and materials science companies can directly address these threats in two ways:

  1. Defensive Reformulation: Reformulating existing technology in response to increased regulatory hurdles and market demands.
  2. Offensive New Product Development: Core platform R&D that leads to new products with new revenue streams.

Developing and monetizing new chemicals and materials is time-consuming and expensive with no guarantee of success.

Historically the “Innovation Process” has started with objectives and hypotheses—typically based on years of education and experience—that need to be tested, analyzed and sampled. The scientific process isn’t predictable. A plan of action is formed that may involve only one person or numerous teams of people. Certain processes are fundamentally stochastic in nature impacting yield or other outcomes. A new product may take 10’s, 100’s or even 1000’s of iterations and require seemingly endless cycles of formulating - testing - sampling - reformulating - retesting - resampling to eventually develop a new formulation or a new molecule that meets market or customer needs.

Nothing is guaranteed. While opportunities abound, only some can be addressed with existing products. Others are straightforward enhancements to existing formulations or materials. Still others require breakthrough advancements. And many are simply impossible to address.

Competition is fierce. Companies of all sizes, startups and even University lab teams are developing breakthrough materials and products. First movers create competitive moats with patent protection and commercial contracts that lock up large sources of demand, so there are both practical advantages to being first gained in knowledge and experience as well as systemic advantages legally and financially.

Market and technology familiarity drive product development success.

A study by McKinsey looked at 118 companies in the chemical and materials science industries and analyzed various dimensions of operating performance as it relates to new product development. The study found the companies that are doing product line extensions into existing markets with familiar technology are seeing success rates between 40–50%, compared to just 15­–20% from new product launches into new markets.

At the same time, the highest margins also come with the highest risk. The same study shows that companies launching new products into new markets (higher risk) see significantly higher margins than those with product-line extensions into existing markets (lower risk). Additionally, the average internal rate of return tends to be higher for companies focusing on familiar technology. These are the “safer bets.” For innovation to continue in our industry, we need more companies who are willing to take risks on new product categories that don’t exist today, and we believe predictive chemistry will be a big part of that process.

Why predictive chemistry will be a game-changer for the industry.

Customers know they can increase their odds of getting the best material or formulation at the lowest price by sharing their needs and requesting samples from multiple companies at once.

Companies need to master two distinct processes to be commercially successful. The first is more commercial, requiring technical expertise in the sales process to effectively match customer requests to current products based on technical and functional “fit.” The second is more deeply technical and can often take quarters or years to develop a new formulation or material that meets performance or property requirements and get it to market. In both scenarios, the fastest to the finish line is often the one that often wins the business or wins the first-mover advantage in the market.

Of course, these two processes are on top of core platform R&D, which is the lifeblood of future product development and therefore future revenue streams.

All this makes quality and speed the two biggest opportunities for specialty chemicals and materials science companies to differentiate their products and services and win new business.

  1. Quality: Being able to accurately match your products to specific customer applications and being able to develop the advanced formulations or materials that provide the specific properties each client needs to advance their own products.
  2. Speed: How quickly you can get customers samples that meet their needs.

In theory, artificial intelligence-powered predictive chemistry solutions can improve both – more effective R&D, product development and application testing and perform all three at faster speeds than legacy routes. If this represents the biggest opportunity for chemicals and materials companies, and artificial intelligence technology already exists, why is it taking so long for AI to power predictive chemistry?

What factors make it difficult to actualize AI?

Artificial intelligence has been around for quite a few years and has already made a big impact in other industries. It is commonplace in product recommendations and fraud detection in retail, algorithmic trading in financial services, and, of course, in the technology industry where we all likely take for granted the AI in ubiquitous names like Alexa from Amazon, Google Assistant from Google and Siri from Apple.

So why isn’t predictive chemistry more of a reality in the specialty chemicals industry?

For companies to develop predictive chemistry models they first need to have AI-ready data. The challenge is that getting AI-ready data for chemistry is harder than in many other industries.

Clean, comprehensive datasets to train your AI models

Often, organizationally there is no structured process to capture lab data. Each chemist or scientist pursues their own approach with their personal lab notebook and a variety of software tools used inconsistently across the company.

“Dirty” or incomplete data is a common problem across many industries. And some AI solutions can work around – or without – perfect data. But for AI to successfully predict properties or recommend formulations, data challenges go well beyond just some missing or inaccurate data.

Every chemical and material science company has expertise in certain product categories where each product category is likely based on its own underlying chemistry. They, therefore, need product category-specific and chemistry-specific data to train AI models to get the maximum benefit of predictive chemistry.

The biggest problem with data in the chemicals industry is that the data for opportunities, lab projects, formulations, test methods, test data, customer feedback, commercial success and SKU’d products are all stored in completely different systems, and are often not structured, complete, labeled consistently or connected in any way at all.

This data doesn’t just have to be accurate, complete, consistently labeled and formatted, it must also be connected across opportunities, lab projects, formulations, test methods, test result data, customer feedback and ideally even commercial results. For AI to recommend the best formulation for a particular use case, data training sets must include:

  • Desired properties and/or technical and functional specifications.
  • Every trial formulation created in pursuit of a formulation or material with desired properties.
  • Every final formulation with the desired properties.
  • The process for preparing or creating each formulation.
  • All test data for all trial and final formulations, including the test method, steps, prep, lab equipment, and any information that might not make every data point in a column equivalent.
  • Customer feedback on each sample shared.
  • Commercial result, if sold. 

With this type of training data, your AI model can quickly ascertain cause and effect relationships that will drive its predictions or recommendations. This is a necessary and very important first step. But running with just training data would be equivalent to sending a lab size sample to the plant to make at scale without first assessing manufacturability. 

Connecting your ongoing project, formulation, test results and customer feedback data

It is not enough to simply clean up historical data to train your models.

You also need to ensure that your future data coming in from your labs is clean, complete and interconnected so that your chemists can actually run their in-process work through the predictive chemistry models and accelerate their work in real-time.

But this would require:

  • Digital technical and functional specifications: a systematic way to capture requirements in each product category that are related to associated lab projects.
  • Digital lab project: as a node for the scientific exploration required to assess whether you have a product that meets the specifications and/or you are able to create one.
  • Digital formulations: a standard digital method for capturing and storing formulations as well as key statistics about them.
  • Digital test data: a consistent way to capture all test result data – tied to each formulation at a granular level including the test method, steps, prep, lab equipment, and any information that might not make every cell in a table equivalent.
  • Analyses: often key insights result not from the test data itself, but from the resulting analyses done on the test data - these analyses should also be captured in a consistently structured and labeled format.
  • Digital workflows: digital best practice processes that ensure the right data and analyses are captured and reviewed at each stage of primary work types, e.g., research, development, product enhancements, competitive offsets, application and compatibility testing, alternative raw materials assessments, etc.

Scientific lab work will always be hypothesis-driven and iterative. Nonetheless, it is possible to accelerate this innovation cycle if the right data is captured and connected throughout the scientific process. If you can capture all this interconnected data as the work is being done, it could flow directly into predictive chemistry models as soon as they are ready to be run at scale.


What can you do to make predictive chemistry a reality?

There are currently four options for AI execution in the chemical and material science fields:

1. Bolt-on, third-party chemistry-specific AI (requires AI-ready datasets)

2. Generic AI platform (not chemistry-specific)

3. Consolidate disparate data in a data lake, clean it, and run AI there

4. Run AI in an integrated lab ELN+LIMS platform

Bolt-on, third-party chemistry-specific AI

There are numerous stand-alone chemistry AI solutions that have built predictive chemistry models for different product groups based on the underlying chemistry. If you already have fully interconnected, AI-ready data you can download it and send it to them to run in these bolt-on solutions and get their AI-generated recommendations or predictions.

Therein lies the challenge. Perhaps one group has been extremely diligent - they have extremely clean, consistently labeled, connected, contextualized data. However, most groups, in fact, most companies do NOT have AI-ready data at scale. Companies are even today spending millions of dollars and years of time on data science and business intelligence teams attempting to clean and connect all their data after the fact. This is not just expensive but also time-consuming with a low likelihood of success.

Generic AI platform

There are a range of generic AI solutions selling to enterprises across all industries. They have large teams of data scientists and implementation teams that will take your data and try to develop predictive models for you. Many of these companies and implementation teams don’t have the scientific education, professional experience or product development process expertise to produce valuable predictive chemistry models.

Most damagingly, their AI technology is limited by the incomplete and disconnected data they are able to get from your teams.

Consolidate disparate data in a data lake, clean it, and run AI there

Another approach involves significant systems integration work and setting up a data lake. Specifically, you could do numerous integration projects between various software your lab staff uses for everything from innovation pipeline management, digital lab work requests, formulations, product catalog, ELN, LIMS, statistical software, PPM and LAN file storage. The point of the integrations is to use them to digitally transfer and transform data from siloed software tools into a data lake that either runs on your premises or in the cloud. Many assume that because all the data is now in one place, it will be easier to run bolt-on AI from here. 

The primary challenge with this approach is that the data is not (or was not) sanitized during collection, so it will pollute your data lake with invalid data, errors and, missing information, making it very difficult, i.e., producing low accuracy rates, if not impossible to run AI. Second, if you capture test data without context, the data will have low to no value. 

For example, if you capture dry time but fail to capture humidity or temperature, the dry time data may be worthless and require retesting to capture these conditions. Much like the bolt-on AI approach, the data lake approach does not address upstream data collection deficiencies. Finally, this is the most expensive and time-consuming option as there is an investment in the data lake, all the integrations that need to be built and maintained, as well as the AI itself. 

Run AI in an integrated lab ELN+LIMS platform

An integrated lab ELN+LIMS platform can both capture AI-ready data as your teams work in the lab and also use the data to run high-quality predictive chemistry models that can either make formulation recommendations or predict technical or functional performance parameters as your teams work.

Using an integrated predictive chemistry platform, enterprises can ensure that clean, complete, contextualized data is always captured and build up an ever-growing database of AI-ready data and also deploy high-quality and scalable predictive chemistry AI.

This solution provides the following benefits:

  • Chemistry-specific so you capture all the relevant information for each type of product you are developing and supporting in the market. Horizontal AI solutions will be insufficient.
  • Fully customizable so you can add fields, calculations and new use cases as your business evolves.
  • Backward compatible so you can accurately fill in new fields or calculations on past records without additional algorithms or assumptions.
  • Workflow specific so you can apply it to different use cases in priority order.
  • 100% interconnected so that every set of objectives, every sample, every formulation process, every test result, every customer feedback, and every project is appropriately connected and cross-referenced so the AI gets the richest possible perspective of the scientific process.
  • Real-time with integrations to lab equipment, CRM, ERP and other business systems with relevant information that should be pulled into your AI-ready data set.
  • 100% consistent so you always get the full and connected data flowing into your predictive model.
  • 100% adjustable so that processes can flex and adjust as they evolve in real life – without breaking your records and allowing your trained models to be re-trained on-demand to take advantage of the new data points, thus improving the accuracy of predictions.

Using an integrated predictive chemistry platform, enterprises can ensure that clean, complete, contextualized data is always captured and build up an ever-growing database of AI-ready data and also deploy high-quality and scalable predictive chemistry AI.

With this approach your organization will be able to dramatically accelerate 1) core platform research and product development 2) quicker-turn product enhancements and custom formulation work and 3) application and compatibility testing required for selling current product lines, the primary sources of revenue at any chemical or materials company.

Alchemy is an integrated predictive chemistry platform

Our Digital Lab [OS] product makes it possible, for the first time ever, to capture all of the interconnected data across the product development lifecycle from objectives and hypotheses, through formulation and testing processes, and even customer feedback in one place. Instead of storing all your trial and final formulations in a mix of paper lab notebooks, XLS and ELN all your test result data in your LIMS, our Digital Lab [OS] product gives you one digital platform for everything.

  • The AI-ready data captured in our Digital Lab [OS] product is used to train product and chemistry-specific predictive chemistry models.
  • Chemists and scientists can run their projects through our Predictive Chemistry product to get property predictions and formulation recommendations that accelerate the formulation process.

Alchemy is built specifically for enterprises that want to accelerate innovation by running digital labs with built-in predictive chemistry. To learn even more about predictive chemistry in the specialty chemicals and materials science industries, watch the webinar below or visit

Ready to accelerate your innovation?
Scientific AI Roadmap >Request a Demo >