Stuff about Software Engineering

Tag: CRL

The Evolution of AI: From Frontier Models to Specialized Small Language Models

Where We Came From: The Frontier Model Plateau

Over the past 12–18 months, the large language model (LLM) ecosystem has continued to advance—but largely in an incremental, not disruptive, fashion. Models from OpenAI, Anthropic, and Google have steadily improved across reasoning, multimodality, and scientific benchmarks, yet the relative ordering and qualitative capabilities have remained broadly stable.

Public benchmark suites such as MMLU (Massive Multitask Language Understanding), GPQA (Graduate‑Level Google‑Proof Q&A), and HELM (Stanford Holistic Evaluation of Language Models) show year‑over‑year gains measured in percentage points rather than step‑function breakthroughs. This is not a criticism—these are remarkable systems—but it does indicate a phase of maturation rather than rupture. Frontier models are converging: better, more reliable, more general—but not fundamentally different.

For scientific research, this means frontier GenAI has become a dependable horizontal capability: excellent for literature synthesis, reasoning assistance, explanation, and orchestration—but no longer the sole locus of rapid innovation.

Where We Are Now: The Rise of Small and Specialized Models

In parallel, a very different dynamic is unfolding.

Small Language Models (SLMs) and domain‑specific foundation models are advancing rapidly, particularly in scientific domains such as genomics, protein science, chemistry, and materials research. These models fall broadly into two categories:

  1. Domain‑adapted language models – smaller LLMs fine‑tuned on specific scientific corpora (e.g. chemistry, biology, materials science).
  2. Non‑linguistic foundation models – transformer‑based models trained on alternative “languages” such as DNA, protein sequences, or molecular graphs (e.g. Evo2, ESM, AlphaFold‑class models).

These models are not generalists—and that is precisely their strength. They encode deep inductive bias for their domain, deliver strong signal from sparse data, and increasingly outperform general LLMs on narrowly scoped scientific tasks.

Critically, most of these models do not fit the SaaS GenAI paradigm. They are rarely available via Azure AI Foundry, AWS Bedrock, or similar managed services. Running them typically requires:

  • Dedicated GPU infrastructure (often NVIDIA‑specific)
  • Local fine‑tuning or adaptation
  • Tight coupling to data and experimental context

This creates a structural mismatch between where scientific model innovation is happening and where traditional enterprise AI platforms operate.

External Validation: SLMs as First-Class Scientific Tools

Recent academic work explicitly supports this shift toward small, specialized models. A 2025 paper, “SLMs as Scientific Tools” (arXiv:2512.15943), argues that capability in scientific AI is task-relative rather than size-relative. The authors show that domain-specialized SLMs can match or outperform frontier LLMs on constrained scientific tasks when correctness, structure, and tool integration matter more than linguistic breadth.

Several conclusions from the paper closely align with CRL’s direction:

  • Inference locality beats central intelligence: running models close to data improves latency, reproducibility, validation, and cost control—supporting local, HPC-adjacent, and desk-side deployment.
  • SLMs scale scientifically, not just economically: smaller models are easier to interpret, benchmark, and falsify—critical properties for hypothesis generation and experimental decision-making.
  • Tool integration matters more than prompt engineering: structured inputs and deterministic tool calls outperform free-form prompting in scientific workflows.

The paper ultimately reinforces a hybrid architectural stance: LLMs orchestrate; SLMs execute. This provides external, peer-reviewed validation that SLMs are not a compromise, but the correct abstraction for scientific computing.

A Practical Shift: From Cloud‑Only to Desk‑Side AI

This is where a meaningful, practical shift is occurring.

With the arrival of systems such as NVIDIA DGX Spark, small language models become physically accessible to individual researchers. Instead of renting over‑provisioned H100 or Grace‑Blackwell cloud instances, scientists can:

  • Run and fine‑tune SLMs locally
  • Experiment rapidly without cloud friction or cost surprises
  • Work directly with models that are otherwise unavailable as managed services

In effect, this enables a “small model on every scientist’s desk” paradigm. The value is not raw scale, but immediacy, ownership, and experimentation velocity.

At CRL, this aligns tightly with how scientific progress actually happens: iterative, exploratory, domain‑specific, and data‑proximate.

Looking Toward 2026: A Hybrid, Orchestrated Future

Looking ahead—without making speculative predictions—the most plausible trajectory is not LLMs versus SLMs, but LLMs plus SLMs.

A likely pattern is:

  • Frontier LLMs acting as generalist reasoning, planning, and orchestration layers
  • Specialized small models performing high‑fidelity domain work (genomics, proteins, chemistry, simulation)
  • Tool‑ and model‑calling as the primary integration mechanism

In this model, the LLM does not replace scientific models—it coordinates them. It becomes the interface and glue, while the real scientific signal is generated by specialized systems running locally or on targeted infrastructure.

This is not speculative technology. The building blocks already exist:

  • Tool‑calling and agent frameworks
  • Domain foundation models
  • Local GPU systems capable of running serious scientific workloads

What changes in 2026 is not the theory, but the accessibility.

Summary

  • Frontier LLMs are improving steadily, but incrementally
  • Scientific innovation is accelerating fastest in small, specialized models
  • These models do not fit cloud‑only GenAI platforms
  • Desk‑side systems like DGX Spark make SLMs practically accessible
  • The near‑term future is hybrid: generalist orchestration + specialist execution

Appendix: The Emerging Scientific SLM Ecosystem (snapshot as of 2026-01-21)

Vendor / OriginDomain FocusRepresentative ModelsTypical Scientific Use Cases
NVIDIABiology, Chemistry, ClimateBioNeMo, ChemGPT, MegaMolBART, FourCastNetMolecule generation, QSAR, virtual screening, protein design, weather & climate modeling
DeepMindHigh-impact scientific modelingAlphaFold 3, GraphCastProtein structure prediction, climate forecasting, large-scale simulation
MetaProteins, Scientific LiteratureESMFold, ProtBERT, SciBERTProtein folding, sequence modeling, scientific text analysis
Arc Institute / ProfluentDNA & Protein DesignEvo2, E1DNA sequence design, protein design, strain optimization
Academic & Research ConsortiaGenomics, Materials ScienceOpenFold, MaterialsBERT, MatSciBERTCrystal property prediction, materials discovery
Emerging VendorsSupply Chain & OptimizationSCGPT, Logistics-LLaMA, OR-LLMDemand forecasting, route optimization, constraint planning

Notes

  • Most models listed above are open, open‑weight, or research‑licensed, and evolve in close collaboration with the scientific community.
  • The ecosystem is interoperable and tool‑oriented, designed to be embedded into pipelines rather than accessed via chat interfaces.
  • In contrast, enterprise GenAI platforms primarily target closed, managed, productivity‑oriented workloads.
  • NVIDIA’s role is increasingly that of a horizontal scientific AI platform provider, spanning models, tooling, and local compute rather than acting as a single‑model vendor.
  • Unlike enterprise GenAI platforms, which are predominantly closed and productivity-oriented, the scientific SLM ecosystem is characterized by open models, research licensing, and composability— properties that align naturally with exploratory research environments such as CRL.

From Tools to Orchestrators: A General Architecture for AI-Native Scientific Research

Abstract

Scientific computing has reached an inflection point. High-performance computing, cloud-native data platforms, and foundation models have dramatically accelerated individual steps in research workflows. Yet most scientific environments remain structurally fragmented: data is generated in one system, workflows execute in another, analytical summaries live elsewhere, and interpretation remains largely manual.

This post argues for a general architecture for AI-native scientific research in which artificial intelligence functions not as a standalone analytical tool, but as an orchestration layer across computation, metadata, and analytics systems. Rather than replacing existing infrastructure, this approach integrates it through structured interfaces and provenance-aware data layers. Although the architecture is illustrated through a genomics example, the principles generalize to any domain in which in-silico methods accelerate discovery.

The Real Bottleneck: Fragmentation

Across disciplines such as genomics, metabolomics, sensory science, spectroscopy, materials research, and fermentation science, a common pattern appears. Experimental data is generated within specialized platforms. Computational workflows are executed in separate environments. Results are stored as files in object storage or local servers. Cross-experiment comparison is often manual, and metadata capture is inconsistent. Reproducibility depends more on institutional memory than on system design.

In most research environments today, computational power is not the limiting factor. The constraint lies in orchestration, integration, and structured interpretation. Scientific acceleration increasingly depends on how effectively systems connect, not on how fast individual tools operate.

A Layered Architecture for AI-Native Research

The proposed architecture separates responsibilities into four conceptual layers, each with a clearly defined role.

The first layer is the execution layer, which remains the authoritative source of computational truth. This layer is responsible for heavy computation, workflow execution, and the generation of primary artifacts. Depending on the domain, it may consist of cloud-based genomic pipelines, HPC clusters, digital twin simulations of fermentation processes, robotics-controlled experimentation, or large-scale analytical workflows. The central principle is that this layer computes deterministically and preserves reproducibility. It is not replaced by AI; it is coordinated by it.

The second layer is the structured interpretation layer. Raw artifacts such as alignment files, chromatograms, spectral matrices, or process simulations are rarely suitable for reasoning across experiments. This layer extracts structured summaries, registers parameters and reference versions, and links findings to explicit provenance. In doing so, it transforms scientific reasoning from file-centric to finding-centric. The layer must remain lightweight, rebuildable, and explicit about version identity. Without it, any AI system attempting cross-run reasoning would be forced to reconstruct context from heterogeneous raw files, a fragile and non-scalable approach.

The third layer is the analytical layer. Here, structured outputs are aggregated, modeled, visualized, and integrated across domains. Statistical workflows, machine learning pipelines, and reporting systems operate at this level. It supports exploration and synthesis but does not execute primary experimental computation. It complements the execution layer rather than replacing it.

The fourth and most transformative layer is the conversational orchestration layer. A large language model, connected through structured tool interfaces, interprets researcher intent and coordinates actions across the other layers. It translates natural language questions into structured queries, triggers workflows when appropriate, integrates results across systems, and documents reasoning paths. Importantly, it does not modify raw data or override execution engines. It orchestrates rather than computes.

When these layers are properly separated, AI evolves from a chatbot into a scientific coordinator.

From Queries to Long-Running Co-Scientist Workflows

The next frontier is not single-prompt interaction but long-running, goal-directed research processes. An AI-native orchestrator can maintain contextual awareness across sessions, track hypotheses over time, coordinate multi-step analyses, and integrate intermediate results into evolving reasoning chains.

When domain-specific reasoning patterns are formalized into versioned and reusable “skills,” scientific workflows become auditable and collaborative. Instead of isolated prompts, research evolves into structured AI-mediated projects in which multiple scientists interact with shared computational guardrails. The system preserves reproducibility while accelerating iteration.

In this model, AI becomes a persistent scientific co-orchestrator rather than a transient assistant.

A Genomics Reference Implementation

One instantiation of this architecture can be observed in a genomics context. In that environment, a cloud-based execution engine processes sequencing data and generates alignment and variant artifacts. A lightweight, provenance-first interpretation layer structures variant findings across runs, capturing reference identities and parameter differences. An analytical platform aggregates results for cross-project exploration. A conversational AI interface connects them through structured tool interfaces.

Within such a system, scientists can compare variants across strains, identify changes in reference genomes between runs, detect parameter differences, trigger new workflows, and iteratively refine hypotheses without reopening raw alignment files or reconstructing workflow logs manually. Raw data remains immutable. Provenance remains explicit. Every step is traceable.

Although domain-specific in its implementation, the architectural principles are domain-agnostic.

Generalization Across Scientific Domains

The same structure applies well beyond genomics. Laboratories working with LC-MS and GC-MS data face persistent challenges in analytical reproducibility and cross-instrument transfer. Sensory science groups contend with variability and latent structure in panel data. Spectroscopy platforms require ongoing calibration maintenance across instruments and environments.

In fermentation and ingredient characterization, digital twins and predictive process models increasingly complement physical experimentation, yet their outputs often remain isolated from historical runs and analytical metadata. The opportunity is not merely to build better models, but to connect those models into a structured reasoning fabric that spans experiments, instruments, and time.

In each of these domains, in-silico iteration accelerates discovery. The architectural shift lies not in introducing new models, but in enabling structured orchestration across existing systems.

Design Principles

Several principles emerge as foundational.

Execution engines remain authoritative and deterministic. Interpretation layers must be fully rebuildable from primary artifacts. Provenance must be first-class rather than implicit. AI orchestrates systems but does not own data. Reproducibility must be enforced architecturally rather than culturally.

When these principles are respected, AI-native research becomes both scalable and governable.

Conclusion

The future of scientific computing is unlikely to be another monolithic platform. Instead, it will be a layered architecture in which computation remains deterministic, metadata is structured, analytics are scalable, and AI coordinates interactions across systems.

The real competitive advantage will not belong to those who adopt the largest models, but to those who design systems where models can reason safely and coherently across structured scientific context.

Scientific acceleration, in this view, is no longer primarily a question of faster models. It is a question of who learns to build research environments that think.

Small Models, Real Acceleration: Notes from the Field

Over the past year, I’ve spent more time than I’d like to admit trying to make AI models actually work in scientific environments. Not as demos or slides, but in the kind of messy, disconnected setups that real research runs on. And after enough of those experiments, one thing keeps repeating itself: the smaller models often get the job done better.

Trying to fine-tune or adapt a multi-hundred-billion-parameter model sounds impressive until you’ve actually tried. The cost, the infrastructure, the data wrangling — it’s a full-time job for a team of specialists. Most research teams don’t have that. But give them a 3B or 7B model that runs locally, and suddenly they’re in control. These smaller models are fast, predictable, and easy to bend to specific problems. You can fine-tune them, distil from a larger one, or just shape them around the way your own data behaves.

That’s the difference between something theoretical and something usable. Scientists can now build domain-specific models on their own machines, without waiting for external infrastructure or a cloud budget to clear. You don’t need a new foundation model—you just need one that understands your work.

Working Close to the Data

Running models locally changes how you think about performance. When your data can’t leave the lab, a local model doesn’t just make sense—it’s the only option. And you start realizing that “good enough” isn’t vague at all. It’s measurable. In genomic analysis, it means sequence alignment accuracy within half a percent of a cloud model. In sensory analysis, it means predicted clusters that match what human panels taste nine times out of ten. That’s good enough to move forward.

I’ve seen small models running on local hardware produce the same analytical outputs as flagship cloud models—only faster and without the overhead. That’s when you stop talking about scale and start talking about speed.

Collaboration is the Multiplier

The real unlock isn’t just the model size—it’s the mix of people using it. Scientists who can code, or who have access to someone who can, change the pace completely. Pair one scientist with one software engineer and you often get a tenfold increase in research velocity. That combination of curiosity and technical fluency is where acceleration really happens.

And the hardware helps. With a workstation-class GPU like the NVIDIA DGX Spark, you can fine-tune a model on your own data, automate repetitive analysis, and test ideas before running a single physical experiment. It’s not about replacing scientists—it’s about removing the waiting time between ideas.

Where It’s Heading

This is the new normal for scientific computing:

  • Small, specialized models embedded directly into the research environment.
  • Agentic systems coordinating tools, data, and models in real time.
  • Scientists and engineers working side by side to shape AI tools that mirror experimental logic.

At some point, AI stops observing science and starts participating in it. And that’s where things start to get interesting.

Software is Eating Science – and That’s a Good Thing

Introduction

I love the internet memos like the one from Jeff Bezos about APIs and Marc Andreessen’s 2011 prediction that “software is eating the world.” Over a decade later, it’s devoured more than commerce and media—it’s now eating science, and quite frankly, it’s about time.

Scientific research, especially in domains like biology, chemistry, and medicine, has historically been a software backwater. Experiments were designed in paper notebooks, data handled via Excel, and results shared through PowerPoint screenshots. It’s only recently that leading institutions began embedding software engineering at the core of how science gets done. And the results speak for themselves. The Nobel Prize in Chemistry 2024, awarded for the use of AlphaFold in solving protein structures, is a striking example of how software—developed and scaled by engineers—has become as fundamental to scientific breakthroughs as any wet-lab technique.

The Glue That Holds Modern Science Together

Software engineers aren’t just building tools. At institutions like the Broad Institute, Allen Institute, and EMBL-EBI, they’re building scientific platforms. Terra, Code Ocean, Benchling—these aren’t developer toys, they’re scientific instruments. They standardize experimentation, automate reproducibility, and unlock collaboration at scale.

The Broad Institute’s Data Sciences Platform employs over 200 engineers supporting a staff of 3,000. Recursion Pharmaceuticals operates with an almost 1:1 engineer-to-scientist ratio. These are not exceptions—they’re exemplars.

The Real Payoff: Research Acceleration

When you embed software engineers into scientific teams, magic happens:

  • Setup time drops by up to 70%
  • Research iteration speeds triple
  • Institutional knowledge gets preserved, not lost in SharePoint folders
  • AI becomes usable beyond ChatGPT prompts—supporting actual data analysis, modeling, and automation

These are not hypothetical. They’re documented results from public case studies and internal programs at peer institutions.

From Hype to Hypothesis

While many institutions obsess over full lab digitization (think IoT pipettes), the smarter move is prioritizing where digital already exists: in workflows, data, and knowledge. With tools like Microsoft Copilot, OpenAI Enterprise, and AI language models for genomics like Evo2, AlphaFold for protein structure prediction, and DeepVariant for variant calling—tools that only become truly impactful when integrated, orchestrated, and maintained by skilled engineers who understand both the research goals and the computational landscape, researchers are now unlocking years of buried insights and accelerating modeling at scale.

Scientific software engineers are the missing link. Their work turns ad hoc experiments into reproducible pipelines. Their platforms turn pet projects into institutional capability. And their mindset—rooted in abstraction, testing, and scalability—brings scientific rigor to the scientific process itself.

What many underestimate is that building software—like conducting experiments—requires skill, discipline, and experience. Until AI is truly capable of writing production-grade code end-to-end (and it’s not—see Speed vs. Precision in AI Development), we need real software engineering best practices. Otherwise, biology labs will unknowingly recreate decades of software evolution from scratch—complete with Y2K-level tech debt, spaghetti code, and glaring security gaps.

What Now?

If you’re in research leadership and haven’t staffed up engineering talent, you’re already behind. A 1:3–1:5 engineer-to-scientist ratio is emerging as the new standard—at least in data-intensive fields like genomics, imaging, and molecular modeling—where golden-path workflows, scalable AI tools, and reproducible science demand deep software expertise.

That said, one size does not fit all. Theoretical physics or field ecology may have very different needs. What’s critical is not the exact ratio, but the recognition that modern science needs engineering—not just tools.

There are challenges. Many scientists weren’t trained to work with software engineers, and collaboration across disciplines takes time and mutual learning. There’s also a cultural risk of over-engineering—replacing rapid experimentation with too much process. But when done right, the gains are exponential.

Science isn’t just done in the lab anymore—it’s done in GitHub. And the sooner we treat software engineers as core members of scientific teams, not as service providers, the faster we’ll unlock the discoveries that matter.

Let’s stop treating software like overhead. It’s the infrastructure of modern science.

AI for Data (Not Data and AI)

Cold Open

Most companies get it backwards.

They say “Data and AI,” as if AI is dessert—something you get to enjoy only after you’ve finished your vegetables. And by vegetables, they mean years of data modeling, integration work, and master‑data management. AI ends up bolted onto the side of a data office that’s already overwhelmed.

That mindset isn’t just outdated—it’s actively getting in the way.

It’s time to flip the script. It’s not Data and AI. It’s AI for Data.

AI as a Data Appendage: The Legacy View

In most org charts, AI still reports to the head of data. That tells you everything: AI is perceived as a tool to be used on top of clean data. The assumption is that AI becomes useful only after you’ve reached some mythical level of data maturity.

So what happens? You wait. You delay. You burn millions building taxonomies and canonical models that never quite deliver. When AI finally shows up, it generates dashboards or slide‑deck summaries. Waste of potential.

What If AI Is Your Integration Layer?

Here’s the mental flip: AI isn’t just a consumer of data—it’s a synthesizer. A translator. An integrator – an Enabler!

Instead of cleaning, mapping, and modeling everything up front, what if you simply exposed your data—as is—and let the AI figure it out?

That’s not fantasy. Today, you can feed an AI messy order tables, half‑finished invoice exports, inconsistent SKU lists—and it still works out the joins. Sales and finance data follow patterns the model has seen a million times.

The magic isn’t that AI understands perfect data. The magic is that it doesn’t need to.

MCP: OData for Agents

Remember OData? It promised introspectable, queryable APIs—you could ask the endpoint what it supported. Now meet MCP (Model Context Protocol). Think OData, but for AI agents.

With MCP, an agent can introspect a tool, learn what actions exist, what inputs it needs, what outputs to expect. No glue code. No brittle integrations. You expose a capability, and the AI takes it from there.

OData made APIs discoverable. MCP makes tools discoverable to AIs.

Expose your data with just enough structure, and let the agent reason. No mapping tables. No MDM. Just AI doing what it’s good at: figuring things out.

Why It Works in Science—And Why It’ll Work in Business

Need proof? Look at biology.

Scientific data is built on shared, Latin‑based taxonomies. Tools like Claude or ChatGPT navigate these datasets without manual schema work. At Carlsberg we’ve shown an AI connecting yeast strains ➜ genes ➜ flavor profiles in minutes.

Business data is easier. You don’t need to teach AI what an invoice is. Or a GL account. These concepts are textbook. Give the AI access and it infers relationships. If it can handle yeast genomics, it can handle your finance tables.

Stop treating AI like glass. It’s ready.

The Dream: MCP‑Compliant OData Servers

Imagine every system—ERP, CRM, LIMS, SharePoint—exposing itself via an AI‑readable surface. No ETLs, no integration middleware, no months of project time.

Combine OData’s self‑describing endpoints with MCP’s agent capabilities. You don’t write connectors. You don’t centralize everything first. The AI layer becomes the system‑of‑systems—a perpetual integrator, analyst, translator.

Integration disappears. Master data becomes a footnote.

When Do You Still Need Clean Data?

Let’s address the elephant in the room: there are still scenarios where data quality matters deeply.

Regulatory reporting. Financial reconciliation. Mission-critical operations where a mistake could be costly. In these domains, AI is a complement to—not a replacement for—rigorous data governance.

But here’s the key insight: you can pursue both paths simultaneously. Critical systems maintain their rigor, while the vast majority of your data landscape becomes accessible through AI-powered approaches.

AI for Data: The Flip That Changes Everything

You don’t need perfect data to start using AI. That’s Data and AI thinking.

AI for Data starts with intelligence and lets structure emerge. Let your AI discover, join, and reason across your real‑world mess—not just your sanitized warehouse.

It’s a shift from enforcing models to exposing capabilities. From building integrations to unleashing agents. From waitingto acting while you learn.

If your organization is still waiting to “get the data right,” here’s your wake‑up call: you’re waiting for something AI no longer needs.

AI is ready. Your data is ready enough.

The only question left: Are you ready to flip the model?

Accelerating Research at Carlsberg Research Laboratory using Scientific Computing

Introduction

Scientific discovery is no longer just about what happens in the lab—it’s about how we enable research through computing, automation, and AI. At Carlsberg Research Laboratory (CRL), our Accelerate Research initiative is designed to remove bottlenecks and drive breakthroughs by embedding cutting-edge technology into every step of the scientific process.

The Five Core Principles of Acceleration

To ensure our researchers can spend more time on discovery we are focusing on:

  • Digitizing the Laboratory – Moving beyond manual processes to automated, IoT-enabled research environments.
  • Data Platform – Creating scalable, accessible, and AI-ready data infrastructure that eliminates data silos.
  • Reusable Workflows – Standardizing and automating research pipelines to improve efficiency and reproducibility.
  • High-Performance Computing (HPC) – Powering complex simulations and large-scale data analysis. We are also preparing for the future of quantum computing, which promises to transform how we model molecular behavior and simulate complex biochemical systems at unprecedented speed and scale.
  • Artificial Intelligence – Enhancing data analysis, predictions, and research automation beyond just generative AI.

The Expected Impact

By modernizing our approach, we aim to:

  • Reduce research setup time by up to 70%
  • Accelerate experiment iteration by 3x
  • Improve cross-team collaboration efficiency by 5x
  • Unlock deeper insights through AI-driven analysis and automation

We’re not just improving research at CRL; we’re redefining how scientific computing fuels innovation. The future of research is fast, automated, and AI-driven.

© 2026 Peter Birkholm-Buch

Theme by Anders NorenUp ↑