40 Questions Pharmaceutical and Life Sciences Should Ask Before Trusting AI
Your researchers rely on AI outputs from Schrödinger, Insilico Medicine, and BenevolentAI to make decisions that affect drug candidates and patient safety. Without asking the right questions, you risk building programmes on recommendations you cannot defend to regulators or justify to your team.
These are suggestions. Use the ones that fit your situation.
1When Schrödinger predicts a compound will have better binding affinity, what experimental evidence does the model rely on, and is that evidence from your target protein or from similar proteins?
2Has a senior chemist without access to the AI model independently reviewed the top 5 candidates before you commit synthesis resources?
3If Insilico Medicine suggests a novel chemical series, can your team explain why that series was ranked above others, or is the reasoning opaque?
4What happens if the AI model was trained on published data that contained reporting bias towards successful compounds?
5Does your team still run counterscreens on compounds the AI deprioritised, or have you stopped testing those?
6When BenevolentAI identifies a disease mechanism from literature, have you verified that the connections it found are supported by recent primary literature, not just co-occurrence in abstracts?
7Are your medicinal chemists learning to question AI predictions, or are they learning to accept them?
8If an AI tool recommends skipping a particular structural modification because it predicts poor properties, do you know whether that prediction is based on mechanistic understanding or pattern matching?
9Has anyone in your organisation tested the AI model on compounds you already know failed in the lab to see if it would have predicted that failure?
10When you move a compound forward based on AI ranking, are you documenting what alternative candidates were rejected and why, in case you need to explain the decision later?
Clinical Trial Design and Patient Selection
11If an AI system recommends a specific patient population for your Phase 2 trial, what clinical evidence does it base that recommendation on, and have your physicians independently reviewed whether it matches your mechanism of action?
12When ChatGPT or another large language model helps draft inclusion criteria, who is checking that those criteria do not inadvertently exclude patient subgroups that might benefit?
13Has your trial design team run the AI-optimised endpoint set past clinicians who work with your target disease to ask whether those endpoints will actually matter to patients?
14If an AI tool suggests a shorter trial duration to reduce costs, do you understand whether that duration is safe or whether it simply reduces the chance of detecting delayed adverse events?
15When an AI system proposes a dose escalation schedule, can you see the trial data it learned from, and is that data relevant to your specific patient population?
16Are you using AI to predict which patients will drop out of trials, and if so, have you asked whether the model might be identifying patients who truly cannot tolerate your drug rather than simply predicting attrition?
17Does your statistical team independently verify an AI recommendation for sample size, or are you relying on the model's calculation?
18If an AI system recommends dropping a secondary endpoint because it shows low variability in historical data, have you checked whether that low variability is real or an artefact of how previous trials reported results?
19When designing your control arm, did an AI system suggest it, and if so, have you confirmed that choice aligns with current clinical practice and regulatory expectations?
20Has anyone asked the AI tool to explain why a particular safety monitoring strategy is sufficient, and can you understand its reasoning?
Regulatory Strategy and Dossier Preparation
21If an AI tool generated summaries of your preclinical safety data for the regulatory dossier, has a qualified toxicologist independently reviewed those summaries for accuracy and completeness?
22When ChatGPT helps you draft a regulatory strategy letter, who reviews it to ensure it reflects your actual development data and does not make claims the regulator will question?
23If IBM Watson Health or similar tools flag a potential regulatory risk in your programme, do you understand how the tool identified that risk, or are you treating it as a black box warning?
24Has your regulatory team tested whether an AI-generated competitive analysis tool is missing older drugs that might still be relevant precedent?
25When an AI system recommends a particular regulatory pathway (e.g. accelerated assessment), have you independently verified that your data truly supports that pathway, or are you deferring to the algorithm?
26If an AI tool scanned your clinical data and found no safety signals in a particular organ system, have you asked whether the tool was configured to detect signals at the sensitivity your regulator would expect?
27Are you using AI to generate sections of your Common Technical Document, and if so, is a document expert checking that the structure and cross-references meet regulatory requirements?
28When preparing response documents to regulatory questions, does your team draft the response first and then use AI to check it, or does an AI tool draft the response and you only review it?
29If an AI system analysed competitor regulatory submissions to benchmark your programme, do you know whether it accessed publicly available documents or whether it made inferences from limited data?
30Has anyone in your organisation asked a regulator informally whether they have seen the specific AI tools or analytical methods you plan to describe in your dossier?
Model Validation and Ongoing Scrutiny
31When you first implemented Schrödinger or Insilico Medicine in your organisation, did you run a validation study comparing model predictions to your own historical lab results, or did you rely on the vendor's validation data?
32Do you track cases where an AI prediction was wrong, and do you feed those failures back into discussions about when the model is reliable?
33If an AI tool was trained on data from larger molecules, can you justify using it to make decisions about much smaller compounds, or are you extrapolating beyond the model's training set?
34Has your data science team reviewed the training data for the AI tools you use most, and are you confident it does not contain proprietary information from competitors that could bias results?
35When an AI system makes a recommendation that contradicts your prior experience, do you investigate why, or do you assume the AI knows something you do not?
36Are you monitoring whether the AI tools your organisation uses are being updated or retrained, and do you have a process to revalidate them if they change?
37If a regulator later questions the validity of an AI prediction you relied on, do you have documentation of your validation process and your assumptions?
38Does your team perform regular spot checks on AI outputs by manually working through the underlying data to see if the conclusion is sound?
39When an AI tool makes a recommendation that will affect patient safety, have you identified a human expert who is responsible for independently evaluating that recommendation?
40Are researchers in your organisation encouraged to challenge AI outputs, or is the culture shifting towards accepting AI suggestions as truth?
How to use these questions
Assign one senior scientist per AI tool to be the 'interrogator' for that tool. Their job is to question every recommendation and document why it was accepted or rejected. This prevents atrophy of scientific judgment across the team.
Before you scale AI use across drug discovery, run a retrospective validation study. Feed historical compounds into the model and see whether it would have correctly ranked your best candidates. If it would have missed them, recalibrate your confidence in its predictions.
When presenting an AI-based recommendation to your regulatory team or a steering committee, require the presenter to explain not just what the AI recommended but why an alternative recommendation would be weaker. This forces deeper engagement with the output.
Establish a rule that any decision affecting patient safety or regulatory strategy must include a written assessment from a human expert stating whether they agree with the AI recommendation and why. Store these assessments in your regulatory archive.
Every six months, conduct a meeting where your team reviews AI recommendations that turned out to be wrong. Use these to build institutional knowledge about the limits of each tool and the conditions under which it performs poorly.