Peter Birkholm-Buch

Stuff about Software Engineering

From Tools to Orchestrators: A General Architecture for AI-Native Scientific Research

Abstract

Scientific computing has reached an inflection point. High-performance computing, cloud-native data platforms, and foundation models have dramatically accelerated individual steps in research workflows. Yet most scientific environments remain structurally fragmented: data is generated in one system, workflows execute in another, analytical summaries live elsewhere, and interpretation remains largely manual.

This post argues for a general architecture for AI-native scientific research in which artificial intelligence functions not as a standalone analytical tool, but as an orchestration layer across computation, metadata, and analytics systems. Rather than replacing existing infrastructure, this approach integrates it through structured interfaces and provenance-aware data layers. Although the architecture is illustrated through a genomics example, the principles generalize to any domain in which in-silico methods accelerate discovery.

The Real Bottleneck: Fragmentation

Across disciplines such as genomics, metabolomics, sensory science, spectroscopy, materials research, and fermentation science, a common pattern appears. Experimental data is generated within specialized platforms. Computational workflows are executed in separate environments. Results are stored as files in object storage or local servers. Cross-experiment comparison is often manual, and metadata capture is inconsistent. Reproducibility depends more on institutional memory than on system design.

In most research environments today, computational power is not the limiting factor. The constraint lies in orchestration, integration, and structured interpretation. Scientific acceleration increasingly depends on how effectively systems connect, not on how fast individual tools operate.

A Layered Architecture for AI-Native Research

The proposed architecture separates responsibilities into four conceptual layers, each with a clearly defined role.

The first layer is the execution layer, which remains the authoritative source of computational truth. This layer is responsible for heavy computation, workflow execution, and the generation of primary artifacts. Depending on the domain, it may consist of cloud-based genomic pipelines, HPC clusters, digital twin simulations of fermentation processes, robotics-controlled experimentation, or large-scale analytical workflows. The central principle is that this layer computes deterministically and preserves reproducibility. It is not replaced by AI; it is coordinated by it.

The second layer is the structured interpretation layer. Raw artifacts such as alignment files, chromatograms, spectral matrices, or process simulations are rarely suitable for reasoning across experiments. This layer extracts structured summaries, registers parameters and reference versions, and links findings to explicit provenance. In doing so, it transforms scientific reasoning from file-centric to finding-centric. The layer must remain lightweight, rebuildable, and explicit about version identity. Without it, any AI system attempting cross-run reasoning would be forced to reconstruct context from heterogeneous raw files, a fragile and non-scalable approach.

The third layer is the analytical layer. Here, structured outputs are aggregated, modeled, visualized, and integrated across domains. Statistical workflows, machine learning pipelines, and reporting systems operate at this level. It supports exploration and synthesis but does not execute primary experimental computation. It complements the execution layer rather than replacing it.

The fourth and most transformative layer is the conversational orchestration layer. A large language model, connected through structured tool interfaces, interprets researcher intent and coordinates actions across the other layers. It translates natural language questions into structured queries, triggers workflows when appropriate, integrates results across systems, and documents reasoning paths. Importantly, it does not modify raw data or override execution engines. It orchestrates rather than computes.

When these layers are properly separated, AI evolves from a chatbot into a scientific coordinator.

From Queries to Long-Running Co-Scientist Workflows

The next frontier is not single-prompt interaction but long-running, goal-directed research processes. An AI-native orchestrator can maintain contextual awareness across sessions, track hypotheses over time, coordinate multi-step analyses, and integrate intermediate results into evolving reasoning chains.

When domain-specific reasoning patterns are formalized into versioned and reusable “skills,” scientific workflows become auditable and collaborative. Instead of isolated prompts, research evolves into structured AI-mediated projects in which multiple scientists interact with shared computational guardrails. The system preserves reproducibility while accelerating iteration.

In this model, AI becomes a persistent scientific co-orchestrator rather than a transient assistant.

A Genomics Reference Implementation

One instantiation of this architecture can be observed in a genomics context. In that environment, a cloud-based execution engine processes sequencing data and generates alignment and variant artifacts. A lightweight, provenance-first interpretation layer structures variant findings across runs, capturing reference identities and parameter differences. An analytical platform aggregates results for cross-project exploration. A conversational AI interface connects them through structured tool interfaces.

Within such a system, scientists can compare variants across strains, identify changes in reference genomes between runs, detect parameter differences, trigger new workflows, and iteratively refine hypotheses without reopening raw alignment files or reconstructing workflow logs manually. Raw data remains immutable. Provenance remains explicit. Every step is traceable.

Although domain-specific in its implementation, the architectural principles are domain-agnostic.

Generalization Across Scientific Domains

The same structure applies well beyond genomics. Laboratories working with LC-MS and GC-MS data face persistent challenges in analytical reproducibility and cross-instrument transfer. Sensory science groups contend with variability and latent structure in panel data. Spectroscopy platforms require ongoing calibration maintenance across instruments and environments.

In fermentation and ingredient characterization, digital twins and predictive process models increasingly complement physical experimentation, yet their outputs often remain isolated from historical runs and analytical metadata. The opportunity is not merely to build better models, but to connect those models into a structured reasoning fabric that spans experiments, instruments, and time.

In each of these domains, in-silico iteration accelerates discovery. The architectural shift lies not in introducing new models, but in enabling structured orchestration across existing systems.

Design Principles

Several principles emerge as foundational.

Execution engines remain authoritative and deterministic. Interpretation layers must be fully rebuildable from primary artifacts. Provenance must be first-class rather than implicit. AI orchestrates systems but does not own data. Reproducibility must be enforced architecturally rather than culturally.

When these principles are respected, AI-native research becomes both scalable and governable.

Conclusion

The future of scientific computing is unlikely to be another monolithic platform. Instead, it will be a layered architecture in which computation remains deterministic, metadata is structured, analytics are scalable, and AI coordinates interactions across systems.

The real competitive advantage will not belong to those who adopt the largest models, but to those who design systems where models can reason safely and coherently across structured scientific context.

Scientific acceleration, in this view, is no longer primarily a question of faster models. It is a question of who learns to build research environments that think.

Understanding Agentic Architectures and Why They Differ Fundamentally from Event-Driven Design

Introduction

In recent months, an increasing number of vendors and practitioners have begun describing event-driven architectures (EDA) as the foundation for agentic systems.

While both paradigms involve distributed and asynchronous systems, they address entirely different architectural concerns:

  • Event-driven design enables reliable data movement and temporal decoupling across systems.
  • Agentic design enables autonomous reasoning, coordination, and adaptive decision-making.

This post clarifies these differences through the lens of established architectural literature and pattern theory, helping distinguish between data-flow infrastructure and cognitive control-flow systems.

Pattern Lineage and Conceptual Heritage

Software architecture has evolved through distinct, well-documented pattern families—each solving a different class of problems:

DomainCanonical SourceArchitectural Concern
Software Design PatternsGamma et al., Design Patterns (1994)Structuring software components and behaviour
Enterprise Integration PatternsHohpe & Woolf, Enterprise Integration Patterns(2003)Asynchronous communication and integration
Pattern-Oriented Software ArchitectureBuschmann et al., Pattern-Oriented Software Architecture (1996–2007)Component interaction, brokers, and coordination
Agent-Oriented SystemsWooldridge, An Introduction to MultiAgent Systems(2009)Reasoning, autonomy, and collaboration

Each of these domains emerged to address a specific layer of system complexity.
Event-driven architectures belong to the integration layer.

Agentic architectures operate at the reasoning and control layer.

Event-Driven Architecture: Integration and Temporal Decoupling

Event-driven architecture decouples producers and consumers through asynchronous communication. Common patterns include Publish–SubscribeEvent Bus, and Event Sourcing.

Its core strengths are:

  • High scalability and throughput
  • Loose coupling and resilience
  • Near-real-time responsiveness

EDA is therefore ideal for information propagation, but it does not address why or how a system acts. It transports information; it does not interpret or decide.

Agentic Architecture: Reasoning and Adaptation

Agentic systems focus on autonomous goal-directed behaviour. They implement a cognitive control loop:

Observe → Plan → Act → Learn

This structure—present since early multi-agent research—now underpins modern frameworks such as LangGraph, AutoGen, and Microsoft’s Autonomous Agents Framework.

Core principles include:

  • Control Flow: deciding what to do next based on context
  • Memory and Context: maintaining state across reasoning cycles
  • Tool Use: interacting with APIs or systems to execute plans
  • Collaboration: coordinating with other agents to achieve shared goals

Agentic architectures are thus control-graph frameworks, not messaging infrastructures.

Why “Event-Driven Agentic Architecture” Is a Conceptual Misstep

Confusing event-driven integration with agentic reasoning conflates communication with cognition.

Common AssertionCorrect Interpretation
“Agents communicate through Kafka topics.”That describes data transport, not reasoning or collaboration.
“Event streaming enables autonomy.”Autonomy arises from goal-based planning and local state, not from asynchronous I/O.
“Event mesh = Agent mesh.”An event mesh routes bytes; an agent mesh coordinates intent.
“Streaming platforms enable multi-agent collaboration.”They enable message exchange; collaboration requires shared semantic context and decision logic.

EDA can support agentic systems—for example, as a trigger or observation channel—but it does not constitute their architectural foundation.

Maintaining Conceptual Precision

Architectural vocabulary should map to the corresponding canonical lineage:

ConcernCanonical Reference
Integration, routing, replayHohpe & Woolf, Enterprise Integration Patterns
Reasoning, autonomy, coordinationWooldridge, An Introduction to MultiAgent Systems
System decomposition, blackboard, broker stylesBuschmann et al., Pattern-Oriented Software Architecture
Modern control-flow frameworks for AI agentsLangGraph, Microsoft Autonomous Agents Framework (2024–2025)

Anchoring terminology to established pattern families preserves conceptual integrity and prevents marketing-driven drift.

Practical Implications

  1. Use event-driven design for system integration, data propagation, and observability.
  2. Use agentic design for autonomy, reasoning, and goal-oriented workflows.
  3. Keep a strict separation between data flow (how information moves) and control flow (how decisions are made).
  4. Evaluate vendor claims by tracing them back to canonical architectural literature.
  5. Foster literacy in software and integration pattern theory to maintain shared architectural clarity across teams.

Recommended Reading

  • Wooldridge, M. (2009). An Introduction to MultiAgent Systems (2nd ed.). Wiley.
  • Hohpe, G., & Woolf, B. (2003). Enterprise Integration Patterns. Addison-Wesley.
  • Buschmann, F. et al. (1996–2007). Pattern-Oriented Software Architecture Vols 1–5. Wiley.
  • Gamma, E. et al. (1994). Design Patterns. Addison-Wesley.
  • LangChain / LangGraph Documentation (2024–2025). “Agentic Design Patterns.”
  • Microsoft Autonomous Agents Framework (Preview 2025).

Conclusion

Architectural precision is not academic—it determines how systems scale, adapt, and remain intelligible.

Event-driven architectures will continue to serve as the backbone of data movement.

Agentic architectures will increasingly govern how intelligent systems reason, plan, and act.

Understanding where one ends and the other begins is essential for designing systems that are both well-connected and truly intelligent.

Making Sense of LLM Training

Introduction

We often talk about training large language models (LLMs) as if it’s one thing — but it really isn’t.

There are several distinct types of training, each with its own purpose, cost, and level of control.

Understanding the difference helps clarify what’s realistic to do in practice, and what should be left to the model labs with thousands of GPUs and power budgets larger than small towns.

Here’s a simple breakdown.

Base Training — Learning Language from Scratch

This is where it all begins.

The model learns to predict the next word in a sentence across trillions of examples. It’s how it develops a general understanding of language, reasoning, facts, and relationships between concepts.

Purpose: Build a general-purpose foundation.

Data: Huge, diverse datasets (Common Crawl, Wikipedia, code, books).

Cost: Astronomical — only done by major labs.

Analogy: Teaching a child how to speak and read.

Once complete, this produces what’s called a base model — capable, but not polite, safe, or even particularly helpful.

Post-Training — Teaching Behavior and Alignment

After base training, the model needs to learn how to behave.

This phase adjusts it to follow instructions, respond helpfully, and align with human preferences and safety policies.

It typically involves:

  • Supervised Fine-Tuning (SFT): The model learns from curated examples of correct input/output pairs.
  • Reinforcement Learning from Human Feedback (RLHF): Humans rank several model responses, and the model learns to prefer the higher-ranked ones.
  • Reinforcement Learning from AI Feedback (RLAIF): The same, but with AI systems acting as evaluators.

Purpose: Make the model cooperative and safe.

Analogy: Teaching manners, ethics, and social intelligence after language is learned.

All commercial models — GPT-4, Claude 3, Gemini, Llama 3 — go through this step before they ever reach users.

Fine-Tuning — Specializing for a Domain

Fine-tuning takes a general model and teaches it domain-specific knowledge: medicine, law, brewing, internal documentation — whatever your niche may be.

There are a few variants:

  • Full fine-tuning: retraining all model weights (rare and expensive).
  • Parameter-efficient fine-tuning (LoRA, QLoRA, PEFT): training small adapter layers on top of the frozen base model — vastly cheaper and reversible.

Purpose: Adapt to a domain or style.

Analogy: Sending a fluent speaker to medical school.

In practice, fine-tuning makes sense only if you have high-quality, well-structured data and a clear purpose — for example, improving factual recall in a specific knowledge domain or matching a company’s tone of voice.

Reinforcement Fine-Tuning — Teaching Preferences and Optimization

A newer development combines reinforcement learning with fine-tuning to optimize specific, measurable outcomes — such as factuality, brevity, or computational efficiency.

OpenAI’s Reinforcement Fine-Tuning (RFT) of the o1 model is one recent example: instead of relying on humans to rate outputs, the process automatically scores model responses using well-defined reward functions.

Purpose: Optimize behavior using measurable rewards.

Analogy: Practicing until performance metrics improve, rather than memorizing answers.

Retrieval-Augmented Generation (RAG) — Adding Knowledge Without Training

RAG isn’t training at all — it’s a retrieval technique.

The model stays frozen but is connected to a search index, database, or vector store.

When asked a question, it first retrieves relevant information and then generates an answer grounded in that content.

Purpose: Keep models current and connected to external knowledge.

Analogy: Looking something up rather than memorizing it.

RAG is ideal when data changes frequently, or when you can’t or shouldn’t embed sensitive data into the model itself.

Prompt Tuning — Lightweight Personality Shaping

The lightest-weight form of adaptation is prompt tuning, sometimes called soft prompting.

Here, a small vector (or a few tokens) is trained to steer the model’s behavior without modifying its core weights.

Purpose: Adjust tone or persona without retraining.

Analogy: Giving the same person a new job description — “today you’re the legal assistant.”

Prompt tuning is useful when you want to offer multiple personalities or roles from a single model.

What’s Reasonable to Do (and What Isn’t)

To make this practical, here’s how the different kinds of training align with the Four Categories of AI Solutions from simple Copilot-style automation to custom AI systems built from scratch.

GoalTechniqueCost & EffortCategory (from “Four Categories of AI Solutions”)Typical Use
General chatbot or CopilotNone (use aligned base model)🟢 LowCategory 1 — Copilot / built-in AI featuresOffice copilots, internal Q&A bots
Domain expertiseLoRA / adapter fine-tuning🟠 MediumCategory 2 — Configured / composable solutionsIndustry copilots, internal assistants
Keep knowledge freshRAG or hybrid RAG + fine-tuning🟠 MediumCategory 3 — Integrated or extended AI systemsResearch assistants, customer-facing search
Optimize measurable outputReinforcement fine-tuning🔴 HighCategory 4 — Custom AI / in-house LLMsScientific computing, advanced R&D
Create new model familyBase training🚫 ExtremeBeyond Category 4Reserved for foundation model labs

Summary

  • Base training — teaches language.
  • Post-training (SFT/RLHF) — teaches behavior.
  • Fine-tuning — teaches domain knowledge.
  • Reinforcement fine-tuning — teaches optimization.
  • RAG / Prompt tuning — extend without retraining.

LLMs aren’t trained once — they’re trained in layers.

Each layer shapes a different aspect of intelligence: from raw linguistic intuition to helpful conversation, domain expertise, and ongoing adaptability.

Knowing which layer you’re working with isn’t just a technical detail — it’s the difference between using AI and building with it.

Small Models, Real Acceleration: Notes from the Field

Over the past year, I’ve spent more time than I’d like to admit trying to make AI models actually work in scientific environments. Not as demos or slides, but in the kind of messy, disconnected setups that real research runs on. And after enough of those experiments, one thing keeps repeating itself: the smaller models often get the job done better.

Trying to fine-tune or adapt a multi-hundred-billion-parameter model sounds impressive until you’ve actually tried. The cost, the infrastructure, the data wrangling — it’s a full-time job for a team of specialists. Most research teams don’t have that. But give them a 3B or 7B model that runs locally, and suddenly they’re in control. These smaller models are fast, predictable, and easy to bend to specific problems. You can fine-tune them, distil from a larger one, or just shape them around the way your own data behaves.

That’s the difference between something theoretical and something usable. Scientists can now build domain-specific models on their own machines, without waiting for external infrastructure or a cloud budget to clear. You don’t need a new foundation model—you just need one that understands your work.

Working Close to the Data

Running models locally changes how you think about performance. When your data can’t leave the lab, a local model doesn’t just make sense—it’s the only option. And you start realizing that “good enough” isn’t vague at all. It’s measurable. In genomic analysis, it means sequence alignment accuracy within half a percent of a cloud model. In sensory analysis, it means predicted clusters that match what human panels taste nine times out of ten. That’s good enough to move forward.

I’ve seen small models running on local hardware produce the same analytical outputs as flagship cloud models—only faster and without the overhead. That’s when you stop talking about scale and start talking about speed.

Collaboration is the Multiplier

The real unlock isn’t just the model size—it’s the mix of people using it. Scientists who can code, or who have access to someone who can, change the pace completely. Pair one scientist with one software engineer and you often get a tenfold increase in research velocity. That combination of curiosity and technical fluency is where acceleration really happens.

And the hardware helps. With a workstation-class GPU like the NVIDIA DGX Spark, you can fine-tune a model on your own data, automate repetitive analysis, and test ideas before running a single physical experiment. It’s not about replacing scientists—it’s about removing the waiting time between ideas.

Where It’s Heading

This is the new normal for scientific computing:

  • Small, specialized models embedded directly into the research environment.
  • Agentic systems coordinating tools, data, and models in real time.
  • Scientists and engineers working side by side to shape AI tools that mirror experimental logic.

At some point, AI stops observing science and starts participating in it. And that’s where things start to get interesting.

The Culture Prism: What We Tolerate Defines Us

Gartner’s “culture prism” hit me like a hammer:

https://media.licdn.com/dms/image/v2/D4E10AQHoCPXwpRWq5A/image-shrink_1280/B4EZoRm.5CGUAM-/0/1761232023589?e=1761897600&v=beta&t=OpbnuFproveGtCsQ4InFCXhmjK8cljiKQVkiUrEbo-U
https://media.licdn.com/dms/image/v2/D4E10AQHoCPXwpRWq5A/image-shrink_1280/B4EZoRm.5CGUAM-/0/1761232023589?e=1761897600&v=beta&t=OpbnuFproveGtCsQ4InFCXhmjK8cljiKQVkiUrEbo-U

I’ve always believed that leadership starts with example — that if I live the right values, others will follow.

But the prism made me realize something uncomfortable: I’ve spent years explaining what good looks like, and far too little time explaining what bad looks like — or having the hard conversations when it happens.

In society, we start with what’s not acceptable: you can’t kill, you can’t steal, you can’t harm others — and everything else is up to you.

In organizations, we flip it. We talk about performance, excellence, and continuous improvement, but we rarely say what we won’t accept.

The result? The wrong behaviors quietly take root because nobody said stop.

Silence is not neutrality. Silence is permission.

When leaders ignore people, withhold feedback, or use offence as defence, they’re signalling that learning is dangerous and honesty is punished. That’s the opposite of continuous improvement — it breaks both the First Way (never pass a defect downstream) and the Third Way (create a culture of continual learning and experimentation) from The Three Ways of DevOps (IT Revolution – Gene Kim).

Culture isn’t built by posters or handbooks; it’s built in the small moments where someone chooses to speak up — or not.

So maybe the next evolution of our leadership handbooks shouldn’t just describe the desired behaviors. It should also draw the hard lines:

  • We don’t ignore people.
  • We don’t punish those who raise problems.
  • We don’t weaponize authority.
  • We don’t stay silent when others do.

The prism reminds us that shaping culture isn’t just about promoting excellence — it’s about refusing mediocrity in character.

Conway’s Law and the Rise of Platform Engineering: Are We Just Fixing the Silos We Created?

I recently came across a Danish article from Globeteam about how Platform Engineering can drive growth and efficiency. The experts they interviewed weren’t wrong—PE absolutely can deliver those benefits. But reading it made me think about why Platform Engineering has become such a hot topic in the first place.

Melvin Conway observed back in 1968 that “any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.” Over the decades, this became Conway’s Law, dutifully cited in architecture presentations everywhere. But I think we’re living through its most ironic chapter yet: the rise of Developer Platforms and Platform Engineering as desperate attempts to fix the very silos we designed into our organizations.

When Engineers Are Organized in Silos

When you organize engineers into business-aligned tribes or product domains, they inevitably build siloed systems. Not because they’re being difficult, but because that’s what the structure incentivizes. Each team starts building its own tools, pipelines, and cloud configurations. Cross-team collaboration becomes an act of heroism instead of the default way of working.

The Spotify-inspired model accelerated this problem. It optimized for autonomy but not alignment. When everyone owns their piece of the world, no one owns the whole. I’ve written before in Balancing Autonomy and Alignment in Engineering Teams about why I organize engineers into a single reporting line rather than under product ownership—it’s specifically to avoid this fragmentation.

The Platform That’s Also a Silo

Eventually, the fragmentation becomes impossible to ignore. Someone draws a diagram showing duplicate CI/CD pipelines, dozens of competing Terraform modules, three different secrets managers, and five ways of provisioning a Kubernetes cluster. So naturally, someone says we need a Developer Platform to unify all of this.

But here’s the problem: the team building that platform usually sits inside its own silo. Another specialized function, another reporting line, another backlog disconnected from product delivery. The result is that we now have siloed platforms, each optimized for its own part of the business but still lacking a shared engineering identity.

Platform Engineering’s Promise and Paradox

This is where Platform Engineering enters the picture—building infrastructure and tooling that cuts across silos to standardize, simplify, and accelerate development. The Globeteam article emphasizes exactly these benefits: reduced manual work, faster time-to-market, better developer experience.

And those benefits are real. When we built Gaia at Carlsberg, we absolutely achieved them. We went from infrastructure provisioning taking weeks to taking minutes. We eliminated an 80% reduction in manual DevOps work. Developers got self-service capabilities embedded directly into their GitHub workflow.

But even here, Conway’s Law lurks. Most organizations create a Platform Engineering department that is itself a silo. They end up maintaining a shared platform for the organization rather than with it. We’ve just added another layer, another interface, another team managing integration—effectively encoding organizational fragmentation into the technology stack.

What Actually Worked for Us

The reason Gaia succeeded wasn’t just the technology. It was because we didn’t treat the platform team as a separate silo. The platform engineering team is part of the broader engineering organization, working with the same standards, participating in the same guilds, aligned on the same methods. When we built Gaia’s golden path, it wasn’t a platform team dictating to developers—it was engineers building tools for other engineers based on shared understanding.

Conway’s Law wasn’t meant to be a trap. The point is that we can design our structures deliberately to achieve the systems we want. If our goal is coherent systems, then the organization itself must be coherent. That means engineers need to be organized as engineers, not divided by business lines or pseudo-tribes where technical collaboration is optional.

Platform Engineering as Symptom, Not Just Solution

Platform Engineering didn’t rise because we suddenly discovered a better way to do DevOps. It rose because many organizations lost sight of what engineering fundamentally is: a collaborative discipline. We created silos, then tried to fix them with technology. But every time we add another layer without fixing the underlying organizational structure, we risk making the problem worse.

The best developer experience doesn’t come from layers of abstraction or governance. It comes from removing the barriers that make collaboration difficult in the first place. When engineers work together as peers across products, domains, and technologies, you don’t need to build elaborate platforms to unify them. Their shared way of working becomes the platform.

This aligns with what I wrote in Balancing Autonomy and Alignment in Engineering Teams—alignment isn’t the opposite of autonomy, it’s what makes autonomy sustainable. You can give teams independence precisely because they’re working from a shared foundation of methods, tools, and standards. Our DevEx analysis against Gartner benchmarks showed this approach scoring 4.3/5, with particular strength in the areas of autonomy and cultural alignment.

So yes, Platform Engineering can absolutely drive growth and efficiency, as the Globeteam experts argue. But only if we recognize it as both a solution and a symptom. A symptom of organizational structures that work against collaboration rather than enabling it. The next evolution might not be another platform at all—it might just be building organizations where engineers can work together by default, not by exception.

Different Roles and Responsibilities for an IT Architect

Introduction

In IT being an “Architect” means something different for almost everyone and the role and responsibilities varies between industries, countries and continents. So here are my 0.02€ on this.

I like to divide architects into the following groups/layers:

  • Enterprise Architecture (EA)
  • Solution Architecture (SA)
  • Infrastructure Architecture (IA)

They can, of course, be divided even further, but in my experience, this works at a high level. I firmly believe that EA, SA, and IA should remain as distinct functions within an organization, each with its own reporting structure. This separation ensures that Enterprise Architecture (EA) focuses on strategic governance, Solution Architecture (SA) remains embedded in product teams, and Infrastructure Architecture (IA) continues to provide the necessary operational foundation.

This approach aligns with Svyatoslav Kotusev’s research on enterprise architecture governance, which suggests that keeping these disciplines distinct leads to better strategic focus, executional efficiency, and organizational alignment. Additionally, insights from “Enterprise Architecture as Strategy” (Ross, Weill, Robertson) emphasize that EA should focus on high-level strategic direction rather than detailed execution. “Fundamentals of Software Architecture” (Richards, Ford) further supports the distinction between EA and SA, reinforcing that Solution Architects must remain closely aligned with engineering teams for execution. “Team Topologies” (Skelton, Pais) highlights the importance of structuring architecture teams effectively to support flow and autonomy, while “The Art of Scalability” (Abbott, Fisher) underscores how separating governance from execution helps organizations scale more efficiently.

By structuring these functions independently, organizations can maintain a balance between governance and execution while ensuring that architecture decisions remain both strategic and practical. This separation fosters alignment between business strategy, technology execution, and infrastructure stability, ensuring that architecture is an enabler rather than a bottleneck.

Enterprise Architecture

From Wikipedia:

Enterprise architecture (EA) is an analytical discipline that provides methods to comprehensively define, organize, standardize, and document an organization’s structure and interrelationships in terms of certain critical business domains (physical, organizational, technical, etc.) characterizing the entity under analysis.

The goal of EA is to create an effective representation of the business enterprise that may be used at all levels of stewardship to guide, optimize, and transform the business as it responds to real-world conditions.

EA serves to capture the relationships and interactions between domain elements as described by their processes, functions, applications, events, data, and employed technologies.

This means that EA exists in the grey area between business and IT. It’s neither one or the other but it takes insight into both in order to understand how the business is affected by IT and vice versa.

Because of the close proximity to the business it’s usually EA that writes strategies on issues which are cross organisational and multi-year efforts. This ensures proper anchoring of strategies where IT and business (finance) must agree on business directions.

I’ve seen EA divided into something like the following:

  • Overall solution, application, integration, API etc. architectures
  • Data: Master Data Management & Analytics
  • Hosting: Cloud, hybrid, edge, HCI, Managed and Onprem
  • Security: Physical, Intellectual, IT etc.
  • Processes: IAM, SIEM, ITIL etc
  • Special areas from the business depending on industry like: Logistics, brewing, manufacturing, R&D, IoT etc.

Solution Architecture

From Wikipedia:

Software development is the process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components.

Software development involves writing and maintaining the source code, but in a broader sense, it includes all processes from the conception of the desired software through to the final manifestation of the software, typically in a planned and structured process.

Software development also includes research, new development, prototyping, modification, reuse, re-engineering, maintenance, or any other activities that result in software products

I like to add an additional role named Software Architecture to the Solution Architecture layer and I differentiate between the two through the following:

  • A Solution Architect is in charge of the overall solution architecture of a solution that may span multiple IT and business domains using different technologies and software architecture patterns.
  • A Software Architect is in charge of a part of the overall solution usually within a single business domain and technology stack.

Although both roles are highly technical the Solution Architect is a bit more of a generalist and the Software Architect is a specialist within a certain technology stack.

Depending on the size of a solution you only need a single person to handle everything to multiple people people in both roles. Usually there’s a single Solution Architect in charge.

I’ve seen SA divided into the following:

  • Building things from scratch
  • Customizing existing platforms
  • Non and Cloud Architecture Focus
  • Microsoft 365 (Workplace) Architecture
  • Mega corporation stuff like SAP, Salesforce etc

Successful organizations ensure that EA remains a strategic function rather than absorbing all architects into a single unit. Solution and Infrastructure Architects must be embedded in product teams and technology groups, ensuring a continuous feedback loop between strategy and execution. Without this distinction, architecture becomes detached from real business needs, leading to governance-heavy, execution-poor outcomes.

Svyatoslav Kotusev’s [1] research on enterprise architecture governance supports this view, emphasizing that EA should function as a decision-support structure rather than an operational execution layer. His empirical studies highlight that centralizing all architects within EA leads to inefficiencies, as solution and infrastructure architects require proximity to delivery teams to ensure architectural decisions remain practical and aligned with business realities.

Infrastructure Architecture

From Wikipedia:

Information technology operations, or IT operations, are the set of all processes and services that are both provisioned by an IT staff to their internal or external clients and used by themselves, to run themselves as a business.

With some additional skills like:

  • Data center-infrastructure management (DCIM) is the integration of IT and facility management disciplines to centralize monitoring, management and intelligent capacity planning of a data center’s critical systems. Achieved through the implementation of specialized software, hardware and sensors, DCIM enables common, real-time monitoring and management platform for all interdependent systems across IT and facility infrastructures.
  • Data center asset management is the set of business practices that join financial, contractual and inventory functions to support life cycle management and strategic decision making for the IT environment. Assets include all elements of software and hardware that are found in the business environment.

The IA is responsible for the foundation upon which all IT solutions depends on. Without IT Infrastructure nothing in IT works.

Designing, implementing and maintaining IT infrastructure spans from your internet router at home to the unbelievable physical size and complexity of cloud infrastructure data centres with under ocean network connections.

The IA takes their requirements from EA and SA and implements accordingly.

I’ve seen IA divided into the following:

  • Infrastructure Experts
    • Cloud (AWS, Azure, GCP etc)
    • On-premises
  • IaC and Monitoring
  • Hosting: Cloud, hybrid, edge, HCI, Managed and Onprem
  • Management and support teams for managed services

Cross Organizational Collaboration

In order to make sure that everyone knows what is going on all architecture is controlled through the Architecture Forum where the leads from each discipline meets and has the final say over the usage of technology and implementation of policies.

Example of how an Architecture Forum could be organized – the specific areas are examples

References

  1. “The Practice of Enterprise Architecture: A Modern Approach to Business and IT Alignment”Svyatoslav Kotusev
  2. “Enterprise Architecture as Strategy” – Jeanne W. Ross, Peter Weill, David Robertson
  3. “Team Topologies” – Matthew Skelton, Manuel Pais
  4. “The Art of Scalability” – Martin L. Abbott, Michael T. Fisher

For more references see https://birkholm-buch.dk/2021/02/12/useful-resources-on-software-systems-architecture/

Software is Eating Science – and That’s a Good Thing

Introduction

I love the internet memos like the one from Jeff Bezos about APIs and Marc Andreessen’s 2011 prediction that “software is eating the world.” Over a decade later, it’s devoured more than commerce and media—it’s now eating science, and quite frankly, it’s about time.

Scientific research, especially in domains like biology, chemistry, and medicine, has historically been a software backwater. Experiments were designed in paper notebooks, data handled via Excel, and results shared through PowerPoint screenshots. It’s only recently that leading institutions began embedding software engineering at the core of how science gets done. And the results speak for themselves. The Nobel Prize in Chemistry 2024, awarded for the use of AlphaFold in solving protein structures, is a striking example of how software—developed and scaled by engineers—has become as fundamental to scientific breakthroughs as any wet-lab technique.

The Glue That Holds Modern Science Together

Software engineers aren’t just building tools. At institutions like the Broad Institute, Allen Institute, and EMBL-EBI, they’re building scientific platforms. Terra, Code Ocean, Benchling—these aren’t developer toys, they’re scientific instruments. They standardize experimentation, automate reproducibility, and unlock collaboration at scale.

The Broad Institute’s Data Sciences Platform employs over 200 engineers supporting a staff of 3,000. Recursion Pharmaceuticals operates with an almost 1:1 engineer-to-scientist ratio. These are not exceptions—they’re exemplars.

The Real Payoff: Research Acceleration

When you embed software engineers into scientific teams, magic happens:

  • Setup time drops by up to 70%
  • Research iteration speeds triple
  • Institutional knowledge gets preserved, not lost in SharePoint folders
  • AI becomes usable beyond ChatGPT prompts—supporting actual data analysis, modeling, and automation

These are not hypothetical. They’re documented results from public case studies and internal programs at peer institutions.

From Hype to Hypothesis

While many institutions obsess over full lab digitization (think IoT pipettes), the smarter move is prioritizing where digital already exists: in workflows, data, and knowledge. With tools like Microsoft Copilot, OpenAI Enterprise, and AI language models for genomics like Evo2, AlphaFold for protein structure prediction, and DeepVariant for variant calling—tools that only become truly impactful when integrated, orchestrated, and maintained by skilled engineers who understand both the research goals and the computational landscape, researchers are now unlocking years of buried insights and accelerating modeling at scale.

Scientific software engineers are the missing link. Their work turns ad hoc experiments into reproducible pipelines. Their platforms turn pet projects into institutional capability. And their mindset—rooted in abstraction, testing, and scalability—brings scientific rigor to the scientific process itself.

What many underestimate is that building software—like conducting experiments—requires skill, discipline, and experience. Until AI is truly capable of writing production-grade code end-to-end (and it’s not—see Speed vs. Precision in AI Development), we need real software engineering best practices. Otherwise, biology labs will unknowingly recreate decades of software evolution from scratch—complete with Y2K-level tech debt, spaghetti code, and glaring security gaps.

What Now?

If you’re in research leadership and haven’t staffed up engineering talent, you’re already behind. A 1:3–1:5 engineer-to-scientist ratio is emerging as the new standard—at least in data-intensive fields like genomics, imaging, and molecular modeling—where golden-path workflows, scalable AI tools, and reproducible science demand deep software expertise.

That said, one size does not fit all. Theoretical physics or field ecology may have very different needs. What’s critical is not the exact ratio, but the recognition that modern science needs engineering—not just tools.

There are challenges. Many scientists weren’t trained to work with software engineers, and collaboration across disciplines takes time and mutual learning. There’s also a cultural risk of over-engineering—replacing rapid experimentation with too much process. But when done right, the gains are exponential.

Science isn’t just done in the lab anymore—it’s done in GitHub. And the sooner we treat software engineers as core members of scientific teams, not as service providers, the faster we’ll unlock the discoveries that matter.

Let’s stop treating software like overhead. It’s the infrastructure of modern science.

Old Wine, New Bottles: Why MCP Might Succeed Where UDDI and OData Failed

Introduction

Anthropic’s Model Context Protocol (MCP) is gaining traction as a way to standardize how AI applications connect to external tools and data sources. Looking at the planned MCP Registry and the protocol itself, I noticed some familiar patterns from earlier integration standards.

The MCP Registry bears striking resemblances to UDDI (Universal Description, Discovery and Integration) from the early 2000s. Both aim to be centralized registries for service discovery, storing metadata to help developers find available capabilities. MCP’s approach to standardized data access also echoes Microsoft’s OData protocol.

We see this pattern frequently in technology—old concepts repackaged with new implementation approaches. Sometimes the timing or context makes all the difference between success and failure.

UDDI and MCP Registry: What’s Different This Time

UDDI tried to create a universal business service registry during the early web services era. Companies could theoretically discover and connect to each other’s services automatically. The concept was sound, but adoption remained limited.

MCP Registry targets a narrower scope—AI tool integrations—in an ecosystem that already has working implementations. More importantly, AI provides the missing piece that UDDI lacked: the “magic sauce” in the middle.

With UDDI, human developers still needed to understand service interfaces and write integration code manually. With MCP, AI agents can potentially discover services, understand their capabilities, and use them automatically. The AI layer handles the complexity that made UDDI impractical for widespread adoption.

OData: The Complexity Problem

OData never achieved broad adoption despite solving a real problem—standardized access to heterogeneous data sources. The specification became complex with advanced querying, batch operations, and intricate metadata schemas.

MCP deliberately keeps things simpler: tools, resources, and prompts. That’s the entire model. OpenAPI Specification closed some of OData’s gap, but you still can’t easily connect to an API programmatically and start using it automatically. MCP’s lower barrier to entry might be what was missing the first time around.

Timing and Context Matter

Several factors suggest MCP might succeed where its predecessors struggled:

AI as the Integration Layer: AI agents can handle the complexity that overwhelmed human developers with previous standards. They can discover services, understand capabilities, and generate appropriate calls automatically. As I discussed in “AI for Data (Not Data and AI)”, AI works as a synthesizer and translator – it doesn’t need perfect, pre-cleaned interfaces. This makes automatic integration practical in ways that weren’t possible with human developers manually writing integration code.

Proven Implementation First: Companies like Cloudflare and Atlassian already have production MCP servers running. This contrasts with UDDI’s “build the registry and they will come” approach.

Focused Problem Domain: Instead of trying to solve all integration problems, MCP focuses specifically on AI-tool integration. This narrower scope increases the chances of getting the abstraction right.

Simple Core Protocol: Three concepts versus OData’s extensive specification or SOAP’s complexity. The most successful protocols (HTTP, REST) stayed simple.

The Historical Pattern

We’ve seen this “universal integration layer” vision repeatedly:

  • CORBA (1990s): Universal object access
  • SOAP/WSDL (early 2000s): Universal web services
  • UDDI (early 2000s): Universal service discovery
  • OData (2010s): Universal data access
  • GraphQL (2010s): Universal query language
  • MCP (2020s): Universal AI integration

Each generation promised to solve integration complexity. Each had technical merit. Each faced adoption friction and complexity creep.

Why This Matters

The recurring nature of this pattern suggests the underlying problem is real and persistent. Integration complexity remains a significant challenge in software development.

MCP has advantages its predecessors lacked—AI as an intermediary layer, working implementations before standardization, and deliberate simplicity. Whether it can maintain focus and resist the complexity trap that caught earlier standards remains to be seen.

The next few years will be telling. If MCP stays simple and solves real problems, it might finally deliver on the promise of standardized integration. If it expands scope and adds complexity, it will likely follow the same path as its predecessors.

The Future of Consulting: How Value Delivery Models Drive Better Client Outcomes

Introduction

The consulting industry, particularly within software engineering, is shifting away from traditional hourly billing towards value-driven and outcome-focused engagements. This evolution aligns with broader trends identified by industry analysts like Gartner and McKinsey, emphasizing outcomes and measurable impacts over time-based compensation (Gartner Hype Cycle for Consulting Services, 2023 and McKinsey: The State of Organizations 2023).

Shift from Hourly Rates to Value Delivery

Traditional hourly billing often incentivizes cost reduction at the expense of quality, creating tension between minimizing expenses and achieving meaningful results. The emerging approach—value-based consulting—aligns compensation directly with specified outcomes or deliverables. For instance, many consulting firms now employ fixed-price projects or performance-based contracts that clearly link payment to the achievement of specific business results, improving alignment and encouraging deeper collaboration between clients and consultants.

According to a 2023 McKinsey report titled ‘The State of Organizations 2023’, approximately 40% of consulting engagements are shifting towards value-based models, highlighting the industry’s evolution (https://www.mckinsey.com/capabilities/people-and-organizational-performance/our-insights/the-state-of-organizations-2023)

Leveraging Scrum and Agile Methodologies

Agile methodologies, especially Scrum, facilitate this shift by naturally aligning consulting work with measurable outputs and iterative improvements. In Scrum, value is measured through clearly defined user stories, regular sprint reviews, and tangible deliverables evaluated continuously by stakeholders. These iterative deliveries provide clear visibility into incremental progress, effectively replacing hourly tracking with meaningful metrics of success.

Challenges in Adopting Value-Based Models

Transitioning to value-based consulting is not without its challenges. Firms may encounter difficulties in accurately defining and measuring value upfront, aligning expectations, and managing the inherent risks of outcome-based agreements. Overcoming these challenges typically requires transparent communication, clear contract terms, and robust stakeholder engagement from project inception.

AI and Human Oversight

While there is significant enthusiasm and concern surrounding AI, its role remains primarily augmentative rather than fully autonomous, particularly in high-stakes decision-making. Human oversight ensures AI-driven solutions remain precise and contextually appropriate, directly supporting high-quality, outcome-focused consulting. This perspective aligns with insights discussed in Responsible AI: Enhance Human Judgment, Don’t Replace It.

Balancing Speed and Precision

AI offers substantial gains in speed but often involves trade-offs in precision. Certain fields, such as financial services or critical infrastructure, require exactness, making human judgment essential in balancing these considerations. This topic, explored in detail in Speed vs. Precision in AI Development, highlights how value-driven consulting must thoughtfully integrate AI to enhance outcomes without sacrificing accuracy.

Conclusion

The shift to outcome-focused consulting models, supported by agile frameworks and thoughtful AI integration, represents a significant evolution in the industry. By prioritizing measurable value and clearly defined outcomes over hourly rates, consulting engagements become more impactful and sustainable.

« Older posts

© 2026 Peter Birkholm-Buch

Theme by Anders NorenUp ↑