By Steve Raju

For Data Scientists and ML Engineers

Cognitive Sovereignty Checklist for Data Scientists

About 20 minutes Last reviewed March 2026

AutoML platforms and AI coding assistants make it easy to build models that pass benchmarks but fail in production. You risk losing the intuition that catches when a technically correct model is practically wrong. This checklist helps you stay in control of model selection and reasoning.

Tool names in this checklist are examples. If you use different software, the same principle applies. Check what is relevant to your workflow, mark what is not applicable, and ignore the rest.
Cognitive sovereignty insight for Data Scientists: a typographic card from Steve Raju

These are suggestions. Take what fits, leave the rest.

Download printable PDF
0 / 19 applicable

Tap once to check, again to mark N/A, again to reset.

Before You Train: Defend Your Problem Definition

Write down what success looks like in business terms before touching databeginner
ChatGPT and Claude will happily suggest metrics, but the metric you optimise becomes your outcome. Specify the real cost of false positives and false negatives in your domain before you train anything.
Refuse to accept the default evaluation metric from AutoML platformsbeginner
AutoML defaults to accuracy or AUC because they are easy to compute, not because they match your business goal. Spend 30 minutes calculating what precision versus recall trade-off actually costs your organisation.
List three ways your model could fail quietly in productionintermediate
Before you start feature engineering, imagine the edge cases and distribution shifts that benchmarks never test. This prevents you from building models that look good on validation data but break on real traffic.
Ask a domain expert to criticise your feature listbeginner
Copilot and Claude will suggest statistically valid features that make no sense to the people who know the business. Get domain feedback early, before you have trained five models and become emotionally invested.
Record why you rejected each candidate model, not just which one wonintermediate
Benchmark comparisons show you which model scored highest, but they hide the reasoning. Write down why you chose this model over others so future you can challenge that decision when something goes wrong.
Test your train-test split strategy against timeintermediate
AutoML splits your data randomly because it is simple. If your data has time order or seasonal patterns, specify a temporal split and validate that your model does not leak future information.

During Development: Keep Statistical Reasoning Alive

Examine the top 10 most important features by handbeginner
SHAP values and feature importance plots tell you which features matter, but they do not tell you if the relationship is sensible. Plot the raw data and ask whether the pattern matches domain knowledge or looks like noise.
Generate predictions on data that breaks your model assumptionsintermediate
Your model assumes that training and production data come from the same distribution. Create a small test set with outliers, missing values, and new categorical levels that your training data never saw.
Do not use a model complexity score as your only tiebreakeradvanced
When two models have similar validation metrics, Occam's razor suggests the simpler one. But simpler models are sometimes too rigid for production. Compare them on holdout edge cases, not just on summary statistics.
Recompute key statistics without relying on librariesintermediate
Use pandas and numpy to manually calculate precision, recall, and F1 at least once. This catches bugs in how you set up your evaluation and keeps you from trusting sklearn without questioning it.
Ask Claude or ChatGPT to argue against your chosen modelbeginner
Instead of asking AI tools to validate your choice, ask them to find weaknesses in it. This trains you to see models as tools with trade-offs, not winners of a benchmark.
Validate on three separate holdout sets, not just oneintermediate
A single test set can hide overfitting or lucky splits. Hold back three different chunks of data and check that performance is consistent. Wide variance in results means your model is fragile.
Build a simple baseline model first, by handbeginner
Before you explore AutoML, train a logistic regression or decision tree yourself. This grounds you in the data and gives you a reference point. Everything you build later should beat this baseline for clear reasons.

Before Production: Defend Your Deployment Judgement

Document the specific data your model was trained onbeginner
Production data will drift. Write down the date range, geographic regions, customer segments, and any filters used to build your training set. Future you will need this to diagnose when performance drops.
Set performance thresholds that trigger a model reviewintermediate
Do not wait for a business team to notice a problem. Define what validation metrics you will monitor in production and what drop in performance forces a retrain or rollback.
Explain your model to a colleague who did not build itbeginner
If you cannot explain it without your code or a SHAP plot, you do not understand it well enough to defend it when it breaks. Use plain language to describe what features matter and why.
Create a confusion matrix for the business impact, not just the numbersintermediate
Translate false positives and false negatives into real consequences. What happens if your model wrongly approves 100 applicants. What if it wrongly rejects 100. Which is worse for your organisation.
Write down the manual process your model will replacebeginner
If your model fails, can the business fall back to human review. If not, you need a much higher bar for deployment. Document what the old process did well so you do not optimize away something valuable.
Simulate the model's behaviour on your worst historical mistakesadvanced
Find the decisions that your organisation regrets most in the past year. Retroactively score those decisions with your model. If your model would have made the same mistakes, you need a different approach.

Five things worth remembering

Related reads


Common questions

Should data scientists write down what success looks like in business terms before touching data?

ChatGPT and Claude will happily suggest metrics, but the metric you optimise becomes your outcome. Specify the real cost of false positives and false negatives in your domain before you train anything.

Should data scientists refuse to accept the default evaluation metric from automl platforms?

AutoML defaults to accuracy or AUC because they are easy to compute, not because they match your business goal. Spend 30 minutes calculating what precision versus recall trade-off actually costs your organisation.

Should data scientists list three ways your model could fail quietly in production?

Before you start feature engineering, imagine the edge cases and distribution shifts that benchmarks never test. This prevents you from building models that look good on validation data but break on real traffic.

The Book — Out Now

Cognitive Sovereignty: How To Think For Yourself When AI Thinks For You

Read the first chapter free.

No spam. Unsubscribe anytime.