Peter Birkholm-Buch

Stuff about Software Engineering

The Bottleneck Moved

For years, the dominant constraint in software engineering was implementation cost. Writing software was expensive. Changing it was expensive. The whole discipline organized itself around that fact — estimation, sprints, backlogs, delivery teams — because throughput was the bottleneck.

AI is dismantling that bottleneck faster than most organizations have been able to think about what replaces it.

But there’s a less-noticed shift happening alongside the capability story. Quietly, and then quite visibly, the major AI vendors are moving coding workflows toward consumption-based pricing. GitHub Copilot has introduced premium requests. Anthropic has repeatedly adjusted Claude Code access as demand exploded. OpenAI has separated agent-style workflows into explicit credit models for enterprise use. The details differ — different token pools, different rate limits, different structures — but the direction is consistent: frontier inference is not economically unlimited, and the vendors are starting to price it that way.

This matters more than it might appear, for one reason: AI-assisted development is not reducing demand for software. It’s accelerating it. Developers produce more. Non-developers can now produce some. Agents execute implementation work in parallel. If software was already eating the world, AI is increasing the bite rate.

Which means total inference demand is likely to rise faster than efficiency gains bring costs down. Even as models get cheaper per token, the sheer volume of generated code, tests, pipelines, reports, and scaffolding will grow. Organizations routing large amounts of that work through frontier reasoning models are building a cost structure that may become uncomfortable quickly.


The current pattern of AI-assisted development tends to look like this: open a long-running session with a flagship model, iterate continuously, keep large context windows alive, regenerate as needed. It works. The problem is what it’s spending expensive reasoning capacity on.

A significant portion of that work is not actually ambiguous. It’s repetitive. Deterministic. Structurally predictable. Generating a stable HTML report structure for the fifteenth time doesn’t require the same model that helped you design the architecture. Implementing a transformation you’ve already fully specified doesn’t require frontier-level reasoning. But because the workflow isn’t designed to distinguish between the two, it routes everything through the same expensive path.

That’s partly an economic inefficiency. But it’s also a signal. If implementation continuously requires frontier-level reasoning, that’s usually not a model problem — it’s a decomposition problem. It means ambiguity that should have been resolved upstream is still alive in the work.

Historically, you could absorb that ambiguity because implementation throughput was the real constraint anyway. When implementation gets cheap, the constraint moves. What’s left is ambiguity, architectural clarity, decomposition quality, validation, and organizational alignment. That’s where the work now actually is.


The architectural implication is straightforward, even if acting on it isn’t. Use stronger models where ambiguity and interpretation are genuinely high — architecture, requirements, tradeoff analysis, decomposition, synthesis. Once that work is done and structure has been established, the remaining implementation should be deterministic enough that it doesn’t need the same reasoning capacity.

I’ve been exploring what this looks like in practice through two experimental repositories, Rupify and Speckify. Rupify concentrates expensive reasoning at the front of a workflow: AI-assisted stakeholder interviews, requirements normalization, formal specification generation. The goal is to resolve ambiguity deliberately and early, producing canonical artifacts that downstream work can execute against. Speckify then takes those specifications and decomposes them into atomic, traceable implementation units — work that’s structured enough that smaller, cheaper models can handle it reliably.

The point isn’t “better AI coding.” It’s redesigning the workflow so that frontier reasoning gets used once, where it’s genuinely needed, rather than continuously throughout the entire delivery cycle. The economics look very different when you do that.


For a while, the dominant question in AI-assisted engineering was capability: which model, which agent, which coding assistant. That question isn’t going away. But another constraint is now becoming visible alongside it.

The organizations that scale this well probably won’t be the ones that put AI everywhere. They’ll be the ones that figure out where ambiguity actually lives in their workflows, structure everything else aggressively, and stop paying frontier-model prices for work that stopped being uncertain three steps ago.

The competitive advantage isn’t more AI. It’s better placement.

When Implementation Becomes Cheap: Rethinking Value in Software Consulting

Introduction

There was a time when building software was the work.

Methods like Use Case Points (UCP) gave us a structured way to estimate implementation effort—because implementation was the dominant cost.

That assumption is now broken. AI coding agents have collapsed implementation time by an order of magnitude. At the same time, approaches like Rupify turn specifications into executable, verifiable inputs that can steer those agents [1].

This sets up a new tension.

The Real Tension: Speed vs. Correctness

The Wiggum Loop shows that you can brute-force progress with AI through rapid iteration [2]. But it also shows where that breaks: when failures are silent, slow, or irreversible. That is exactly the regime most enterprise systems operate in. So this is the core tension:

The Wiggum Loop is powerful precisely where it is dangerous. Fast iteration in domains where incorrect systems are costly. This is where Rupify resolves this tension. It does not slow the loop down. It constrains it.

It makes fast iteration safe enough to use in high-stakes environments by making intent explicit and verifiable.

The Failure Mode: Faster Divergence

Without that constraint, AI does not give you better outcomes. It just gives you incorrect systems, delivered quickly.

Organizationally, this looks like:

  • Systems that appear complete but encode the wrong logic
  • Silent misalignment between business intent and system behavior
  • Accelerated rework cycles where errors propagate faster than they are detected

The result is not efficiency. It is amplified waste. This is why specification becomes the control point.

The Inversion

This creates a structural inversion:

  • UCP still estimates human implementation effort
  • AI reduces actual implementation time to a fraction
  • Rupify ensures the output remains aligned with intent

So the traditional model—where effort ≈ implementation ≈ value—no longer holds.

UCP Was Measuring the Wrong Thing

You can still calculate UCP. But it no longer answers the original question: How long will this take to build?

That question is now nearly irrelevant, what remains is something more fundamental: How complex is the problem space?

UCP was always approximating this. So AI did not make UCP obsolete, it revealed what UCP was actually measuring all along.

Knowing complexity is crucial for coding with AI Agents [5].

A New Model

We end up with a new structure:

  • UCP measures problem complexity
  • Rupify translates complexity into executable intent [1]
  • AI agents handle implementation at near-zero marginal cost

What Consulting Becomes

This connects directly to outcome-based value models [4].

In this new model consulting is the discipline of reducing ambiguity. That is not a slogan. It is a structural shift.

It changes:

  • Staffing → fewer implementers, more domain modelers and specification engineers
  • Pricing → from time-based delivery to value of clarified and executable intent
  • Differentiation → ability to make complex systems unambiguous, not ability to build them

The scarce role is the person who can:

  • Extract intent from messy organizational reality
  • Structure it into a precise model
  • Express it in a form that machines can execute correctly

That capability becomes the bottleneck.

The Final Constraint: Can It Be Safely Realized?

Even perfect specifications are not sufficient.

They must be realized through a trustworthy system [3].

A correctly specified system built through a compromised pipeline is still a compromised system.

So the full model becomes:

  • Clarity of intent (Rupify)
  • Controllability of generation (AI agents)
  • Trustworthiness of realization (supply chain)

Remove any one of these, and the system fails.

The Shift

Software engineering is no longer about building systems.

It is about:

  • Describing them correctly
  • Constraining how they are generated
  • Ensuring they can be safely realized

Implementation has not disappeared, but it has lost its position as the center of value – and when that happens, everything around it has to be rethought.

Conclusion

Implementation is no longer the primary driver of cost, time, or value.

Value is created by reducing ambiguity, expressing intent precisely, and ensuring that intent can be safely realized.

  • AI accelerates execution
  • Rupify constrains it
  • UCP reveals the true complexity underneath

Consulting shifts from delivering software to making systems unambiguous and executable.

References

[1] https://birkholm-buch.dk/2026/04/09/rupify-executable-specifications-for-ai-assisted-software-engineering

[2] https://birkholm-buch.dk/2026/04/05/the-wiggum-loop-brute-forcing-business-with-ai/

[3] https://birkholm-buch.dk/2026/03/13/move-the-security-boundary-to-the-software-supply-chain/

[4] https://birkholm-buch.dk/2025/05/05/the-future-of-consulting-how-value-delivery-models-drive-better-client-outcomes/

[5] https://birkholm-buch.dk/2024/12/12/speed-vs-precision-in-ai-development/

AI Only Creates Value When Integrated into Execution

Introduction

Over time, I’ve written a number of posts on AI, software engineering, leadership, and how we apply these in practice at Carlsberg Research Laboratory. They were not intended as a single narrative—but taken together, a clear pattern emerges.

Across these posts, the same themes keep surfacing: AI is not the hard part, execution is. The real constraints are people, organizational capability, and how effectively we integrate new technology into how work actually gets done. Individual topics—AI patterns, governance, developer experience, and scientific computing—are all facets of the same underlying problem.

Looking across this body of work, it naturally clusters into six themes. Together, they describe a simple idea:

AI only creates value when it is systematically integrated into execution.

The sections below outline these themes and link to the underlying posts.

The Convergence of AI and Execution

AI is no longer scarce. Models are broadly accessible, and capabilities are rapidly commoditizing. That shifts the source of advantage away from the model itself and toward how effectively it is deployed, integrated, and scaled across real workflows.

The organizations that win are not those experimenting the most, but those embedding AI into execution—where it consistently improves outcomes, handles edge cases, and survives contact with reality. This is also why earlier frameworks sometimes need reinterpretation: what made sense as a classification of solutions or a discussion of trade-offs starts to look different once distribution and operational integration become the real differentiator.

Posts:

  • When Implementation Becomes Cheap: Rethinking Value in Software Consulting

  • AI Is Everywhere. Value Is Not (And It’s Not a Data Problem Either)

  • When the Model Breaks

AI as a Capability Enabler

AI should not be approached as a collection of isolated use cases or one-off solutions. It is better understood as a set of reusable capabilities—classification, generation, retrieval, summarization, reasoning, and automation—that can be composed into systems and patterns.

The shift is from “building AI features” to “building AI-enabled systems,” where value comes from combining these capabilities with data, workflows, and developer experience in a repeatable way. When approached this way, AI becomes an enabler that can strengthen existing platforms and practices rather than a separate, exotic layer of technology.

Posts:

  • Patterns for Artificial Intelligence Solutions

  • GitHub Copilot drives better Developer Experience

  • Four Categories of AI Solutions

The Human Factor: People Skills and Organizational Capability

The primary constraint in AI adoption is not technology. It is people and organizational capability. New roles emerge, expectations shift, and the ability to continuously learn becomes critical as tools, models, and practices evolve faster than most organizations are used to.

This creates pressure not only on hiring and role design, but also on time itself. If teams are run at full utilization, they lose the capacity to learn, adapt, and absorb change. Success depends on building teams that can translate between domain, technology, and business, while also creating enough room for skills to evolve before they become obsolete.

Posts:

  • The Half-Life of Skills: Why 100% Utilization Can Destroy Your Future

  • AI-Engineer: A Distinct and Essential Skillset

  • AI-Engineers: Why People Skills Are Central to AI Success

Governance and Human Oversight

AI introduces new risks, but also creates an opportunity to rethink governance. The goal is not to control adoption through heavy process. The goal is to create guardrails that enable safe, fast, and responsible use.

Human accountability remains central. AI should augment judgment, not replace it. Good governance connects policy, security, and developer experience so that the responsible path is also the practical one. Done well, governance becomes an enabler of adoption rather than a brake on it.

Posts:

  • Responsible AI: Enhance Human Judgment, Don’t Replace It

  • The Intersection of DevEx and DevSecOps: We need a New Way Forward

  • Building a Better Software Practice: A Guide to Policies, Rules, Standards, Processes, Guidelines and Governance

Dual-Track Strategy: Core vs. Strategic AI Projects

AI portfolios need to balance immediate value with long-term positioning. Some initiatives should focus on proven patterns, broad accessibility, and fast adoption. Others should explore new capabilities and areas of differentiation, even when they involve more uncertainty and a longer payback period.

Managing this duality is essential. Over-indexing on core initiatives leads to incrementalism. Over-indexing on strategic bets leads to fragmentation and delivery risk. A pragmatic AI strategy requires both tracks to exist at the same time, with clarity about which type of problem is being solved.

Posts:

  • AI doesn’t create advantage -distribution does

  • Patterns for Artificial Intelligence Solutions

  • Four Categories of AI Solutions

Quantifiable Impact and Future Vision

AI adoption must ultimately be judged by its impact on real outcomes: speed, quality, cost, learning, and innovation. Early productivity gains matter, but the larger transformation comes from integrating AI into end-to-end systems where improvements compound over time.

That is where the future vision becomes clearer. The long-term value is not a collection of isolated AI wins, but a broader shift in how development, research, and organizational workflows operate. In that sense, measurable productivity improvements are only the first visible signal of a much larger change.

Posts:

  • The Evolution of AI: From Frontier Models to Specialized Small Language Models

  • Accelerating Research at Carlsberg Research Laboratory using Scientific Computing

  • GitHub Copilot Probably Saves 50% of Time for Developers

AI Is Everywhere. Value Is Not (And It’s Not a Data Problem Either)

Introduction

Over the past year, AI adoption has exploded. In the Nordics, nearly every company now reports that it has implemented AI in some form. On paper, that should translate into a wave of productivity, growth, and competitive advantage. Only it doesn’t.

A recent BCG study (The Nordic AI Inflection Point: Value Creation or Value Bubble?) shows that while 99% of Nordic companies have adopted AI, only around 4% report significant returns on their investments. At the same time, executives expect AI to deliver 25–30% improvements in both revenue and cost.

This gap between adoption and value is not subtle and it’s not limited to the Nordics. A global enterprise study shows the same pattern (Enterprise AI adoption in 2026: Why 79% face challenges despite high investment):

  • Near-universal AI adoption
  • Heavy usage across employees and executives
  • Only a minority seeing real business impact

More strikingly, over half of executives report that AI adoption is creating internal tension rather than clarity — exposing gaps in strategy, ownership, and execution.

AI is not just failing quietly. It is actively stressing organizations that are not designed to absorb it. Which raises an uncomfortable question: Are we creating value — or a value bubble?

This Is Not a New Problem

In a previous post, I argued that AI doesn’t create advantage but distribution does based on facts that:

  • AI is becoming commoditized
  • Models are widely accessible
  • Tools are rapidly diffusing

So advantage cannot come from AI itself. It must come from how AI is embedded, scaled, and operationalized. The BCG findings are a direct confirmation of this. So AI is everywhere, but execution is not.

The Wrong Debate: Data Before AI

At the same time, many organizations seems to be stuck in a different discussion: “We need better data before we can scale AI.”

I’ve argued the opposite in “AI for data — not data before AI” and that waiting for perfect data is one of the most reliable ways to delay value indefinitely.

Data improves when it is used:

  • In real workflows
  • Under real decisions
  • With real feedback loops

So we end up with two truths:

  • AI alone does not create advantage
  • Data alone does not unlock AI

And yet, most organizations behave as if one of them will.

The Two Traps Killing AI Value

What we see in practice is a predictable pattern.

1. The Tool Trap

Companies deploy AI as tools:

  • Copilots
  • Assistants
  • Automation add-ons

These deliver local gains but they don’t change outcomes, they don’t scale and they don’t compound.

2. The Foundation Trap

Others go the opposite direction:

  • Multi-year data programs
  • Master data management initiatives
  • Platform modernization

AI becomes a future promise and not a present capability.

The False Choice

This leads to a false dichotomy:

  • AI first
  • Or data first

The reality is neither.

  • You don’t get better data before AI
  • You don’t get value from AI without execution

Both positions assume a linear path and AI value is not linear.

What Actually Works: AI in the Loop

The companies that are capturing real value are doing something different.

They are not thinking in steps like: Data → Platform → AI → Value

They are building feedback systems: AI → Usage → Better Data → Better Workflows → Scale → Value

But this only becomes real when you look at how it is designed.

A repeatable pattern looks like this:

  • Start with a concrete workflow (e.g. demand planning, pricing, campaign execution)
  • Apply AI to improve one critical decision point
  • Use the output to expose data gaps and inconsistencies
  • Fix only the data that matters for that workflow
  • Expand AI across adjacent steps
  • Gradually connect the process end-to-end

For example:

  • Deploy AI in demand forecasting
  • Uncover inconsistencies in product hierarchies and sales signals
  • Fix those selectively
  • Extend into inventory and replenishment

Over time, the workflow becomes:

  • More accurate
  • More automated
  • More integrated

This is not just iteration.

It is system design.

Good AI systems are not built top-down. They are grown through use — and then engineered for scale.

From Tools to Workflows

The BCG report highlights a critical distinction:

  • Most companies invest in tools
  • Leaders invest in workflows

That difference matters.

Because:

  • AI applied to tasks creates efficiency
  • AI embedded in workflows creates advantage

Why Most Companies Stall

When AI fails to scale, it’s rarely about the models.

It’s about the system.

  • Tool Trap → Fragmentation
  • Foundation Trap → Delay

Both lead to the same result:

  • Pilots everywhere
  • Duplication of effort
  • No compounding value

The deeper causes are structural:

  • Fragmented data
  • Decentralized ownership
  • Unclear decision rights
  • Limited execution capacity
  • AI treated as IT

The system is not designed to absorb and scale AI.

So AI remains additive and not transformative.

AI Doesn’t Fail. Systems Do.

AI is not underdelivering, but organizations are.

Or more precisely: AI doesn’t fail, it exposes systems that were already failing.

What we are seeing is not an AI gap, it’s a system gap:

  • Between ambition and execution
  • Between tools and transformation
  • Between experiments and scale

The Trilogy

Across three posts, the pattern becomes clear:

  1. AI doesn’t create advantage — distribution does
  2. AI for data — not data before AI
  3. AI is everywhere. Value is not

Together: AI value is not created by technology or data alone, it’s created by systems that connect them.

The Next Phase of AI

The next phase will not be defined by better models. It will be defined by better systems.

Today’s tools are built as standalone assistants:

  • Copilots
  • Chat interfaces
  • Isolated automation

They optimize individuals and not systems.

The tools themselves reinforce the Tool Trap. Which means: Organizations are not just using AI incorrectly and they are buying products that make correct usage harder.

What This Means in Practice

If you want to capture AI value:

  • Stop measuring progress by tools
  • Stop waiting for perfect data
  • Stop layering AI on top

Instead:

  • Start with workflows
  • Build feedback loops
  • Design for reuse
  • Treat AI as part of the operating model

This is not a maturity curve, it’s a design choice.

Conclusion

You don’t win with AI because you have access to it. You don’t win because your data is perfect. You win when your organization can turn AI into systems that scale.

Advantage comes from system design:

  • Not tools
  • Not data in isolation
  • Not default ways of working

Because in the end: AI doesn’t create advantage, distribution does – and distribution is built through systems — whether you design them intentionally or not.

The difference is simple: Some companies design them, but most don’t.

The Evolution of AI: From Frontier Models to Specialized Small Language Models

Where We Came From: The Frontier Model Plateau

Over the past 12–18 months, the large language model (LLM) ecosystem has continued to advance—but largely in an incremental, not disruptive, fashion. Models from OpenAI, Anthropic, and Google have steadily improved across reasoning, multimodality, and scientific benchmarks, yet the relative ordering and qualitative capabilities have remained broadly stable.

Public benchmark suites such as MMLU (Massive Multitask Language Understanding), GPQA (Graduate‑Level Google‑Proof Q&A), and HELM (Stanford Holistic Evaluation of Language Models) show year‑over‑year gains measured in percentage points rather than step‑function breakthroughs. This is not a criticism—these are remarkable systems—but it does indicate a phase of maturation rather than rupture. Frontier models are converging: better, more reliable, more general—but not fundamentally different.

For scientific research, this means frontier GenAI has become a dependable horizontal capability: excellent for literature synthesis, reasoning assistance, explanation, and orchestration—but no longer the sole locus of rapid innovation.

Where We Are Now: The Rise of Small and Specialized Models

In parallel, a very different dynamic is unfolding.

Small Language Models (SLMs) and domain‑specific foundation models are advancing rapidly, particularly in scientific domains such as genomics, protein science, chemistry, and materials research. These models fall broadly into two categories:

  1. Domain‑adapted language models – smaller LLMs fine‑tuned on specific scientific corpora (e.g. chemistry, biology, materials science).
  2. Non‑linguistic foundation models – transformer‑based models trained on alternative “languages” such as DNA, protein sequences, or molecular graphs (e.g. Evo2, ESM, AlphaFold‑class models).

These models are not generalists—and that is precisely their strength. They encode deep inductive bias for their domain, deliver strong signal from sparse data, and increasingly outperform general LLMs on narrowly scoped scientific tasks.

Critically, most of these models do not fit the SaaS GenAI paradigm. They are rarely available via Azure AI Foundry, AWS Bedrock, or similar managed services. Running them typically requires:

  • Dedicated GPU infrastructure (often NVIDIA‑specific)
  • Local fine‑tuning or adaptation
  • Tight coupling to data and experimental context

This creates a structural mismatch between where scientific model innovation is happening and where traditional enterprise AI platforms operate.

External Validation: SLMs as First-Class Scientific Tools

Recent academic work explicitly supports this shift toward small, specialized models. A 2025 paper, “SLMs as Scientific Tools” (arXiv:2512.15943), argues that capability in scientific AI is task-relative rather than size-relative. The authors show that domain-specialized SLMs can match or outperform frontier LLMs on constrained scientific tasks when correctness, structure, and tool integration matter more than linguistic breadth.

Several conclusions from the paper closely align with CRL’s direction:

  • Inference locality beats central intelligence: running models close to data improves latency, reproducibility, validation, and cost control—supporting local, HPC-adjacent, and desk-side deployment.
  • SLMs scale scientifically, not just economically: smaller models are easier to interpret, benchmark, and falsify—critical properties for hypothesis generation and experimental decision-making.
  • Tool integration matters more than prompt engineering: structured inputs and deterministic tool calls outperform free-form prompting in scientific workflows.

The paper ultimately reinforces a hybrid architectural stance: LLMs orchestrate; SLMs execute. This provides external, peer-reviewed validation that SLMs are not a compromise, but the correct abstraction for scientific computing.

A Practical Shift: From Cloud‑Only to Desk‑Side AI

This is where a meaningful, practical shift is occurring.

With the arrival of systems such as NVIDIA DGX Spark, small language models become physically accessible to individual researchers. Instead of renting over‑provisioned H100 or Grace‑Blackwell cloud instances, scientists can:

  • Run and fine‑tune SLMs locally
  • Experiment rapidly without cloud friction or cost surprises
  • Work directly with models that are otherwise unavailable as managed services

In effect, this enables a “small model on every scientist’s desk” paradigm. The value is not raw scale, but immediacy, ownership, and experimentation velocity.

At CRL, this aligns tightly with how scientific progress actually happens: iterative, exploratory, domain‑specific, and data‑proximate.

Looking Toward 2026: A Hybrid, Orchestrated Future

Looking ahead—without making speculative predictions—the most plausible trajectory is not LLMs versus SLMs, but LLMs plus SLMs.

A likely pattern is:

  • Frontier LLMs acting as generalist reasoning, planning, and orchestration layers
  • Specialized small models performing high‑fidelity domain work (genomics, proteins, chemistry, simulation)
  • Tool‑ and model‑calling as the primary integration mechanism

In this model, the LLM does not replace scientific models—it coordinates them. It becomes the interface and glue, while the real scientific signal is generated by specialized systems running locally or on targeted infrastructure.

This is not speculative technology. The building blocks already exist:

  • Tool‑calling and agent frameworks
  • Domain foundation models
  • Local GPU systems capable of running serious scientific workloads

What changes in 2026 is not the theory, but the accessibility.

Summary

  • Frontier LLMs are improving steadily, but incrementally
  • Scientific innovation is accelerating fastest in small, specialized models
  • These models do not fit cloud‑only GenAI platforms
  • Desk‑side systems like DGX Spark make SLMs practically accessible
  • The near‑term future is hybrid: generalist orchestration + specialist execution

Appendix: The Emerging Scientific SLM Ecosystem (snapshot as of 2026-01-21)

Vendor / OriginDomain FocusRepresentative ModelsTypical Scientific Use Cases
NVIDIABiology, Chemistry, ClimateBioNeMo, ChemGPT, MegaMolBART, FourCastNetMolecule generation, QSAR, virtual screening, protein design, weather & climate modeling
DeepMindHigh-impact scientific modelingAlphaFold 3, GraphCastProtein structure prediction, climate forecasting, large-scale simulation
MetaProteins, Scientific LiteratureESMFold, ProtBERT, SciBERTProtein folding, sequence modeling, scientific text analysis
Arc Institute / ProfluentDNA & Protein DesignEvo2, E1DNA sequence design, protein design, strain optimization
Academic & Research ConsortiaGenomics, Materials ScienceOpenFold, MaterialsBERT, MatSciBERTCrystal property prediction, materials discovery
Emerging VendorsSupply Chain & OptimizationSCGPT, Logistics-LLaMA, OR-LLMDemand forecasting, route optimization, constraint planning

Notes

  • Most models listed above are open, open‑weight, or research‑licensed, and evolve in close collaboration with the scientific community.
  • The ecosystem is interoperable and tool‑oriented, designed to be embedded into pipelines rather than accessed via chat interfaces.
  • In contrast, enterprise GenAI platforms primarily target closed, managed, productivity‑oriented workloads.
  • NVIDIA’s role is increasingly that of a horizontal scientific AI platform provider, spanning models, tooling, and local compute rather than acting as a single‑model vendor.
  • Unlike enterprise GenAI platforms, which are predominantly closed and productivity-oriented, the scientific SLM ecosystem is characterized by open models, research licensing, and composability— properties that align naturally with exploratory research environments such as CRL.

Rupify: Executable Specifications for AI-Assisted Software Engineering

Abstract

AI-assisted development has dramatically increased implementation speed, but not correctness. Rupify addresses this gap by turning requirements into executable, structured specifications that can be directly used by AI systems. Rather than relying on informal descriptions or heavyweight formal methods, Rupify operationalizes specifications as artifacts that can be generated, validated, and continuously enforced throughout development. Rupify is open source and available on GitHub: https://github.com/peterbb148/rupify

Why the name Rupify (RUP, UML, UCP)

Rupify takes its name from the Rational Unified Process (RUP), a structured approach to software engineering that emphasizes well-defined artifacts, traceability, and model-driven development. RUP uses the Unified Modeling Language (UML) to describe systems precisely through use cases, domain models, interaction diagrams, state machines, and deployment views. On top of this, Use Case Points (UCP) provide a way to estimate system size and effort based on functional structure rather than code.

Rupify operationalizes this chain—RUP for structure, UML for representation, and UCP for measurement—by turning it into an executable pipeline. Instead of producing documentation, it produces machine-interpretable models that AI systems can use directly for generation, validation, and estimation.

The Problem

AI systems are highly effective at generating, refining, and reviewing code, but they still depend on incomplete requirements, ambiguous intent, and inconsistent structure. This creates a fundamental mismatch where high-capability implementation systems operate on low-fidelity input.

The consequences are predictable. There is drift between intent and implementation, outputs vary across iterations, and correctness cannot be verified in a systematic way. Speed increases, but confidence does not.

The Idea Behind Rupify

Rupify introduces a structured, executable middle layer between intent and implementation. The process moves from interview to structured model, from model to executable artifacts, and from there into implementation and continuous validation.

The core idea is simple but fundamental. Specifications are not written primarily for humans; they are compiled for machines. Instead of acting as passive documentation, they become active inputs to the system.

What Rupify Does

Rupify provides a deterministic pipeline that starts with understanding a problem and ends with verifiable artifacts. Requirements are captured through structured interviews and translated into a canonical project model. From this model, Rupify generates RUP-aligned artifacts such as use cases, domain models, interaction diagrams, state models, and deployment views.

These artifacts are not static descriptions. They form the basis for use case point estimation and enable continuous validation against the original intent. The output is not just text, but a model that can be executed, tested, and checked.

Positioning

Rupify sits in the space between informal and formal approaches. On one side are notes, tickets, and lightweight specification formats. On the other are formal methods such as Z, TLA+, Alloy, and RAISE.

It provides structure without requiring full formalization, making it practical for real-world teams that need both speed and rigor. It is designed for environments where AI is already part of the workflow, but where correctness still matters.

Why This Matters Now

AI has shifted the bottleneck in software development. Writing code is no longer the primary constraint; defining correctness is. Without a structured specification layer, AI amplifies ambiguity rather than resolving it. Increased speed leads to increased drift, and verification becomes reactive instead of proactive.

Rupify addresses this by making correctness part of the input rather than an afterthought.

From Specification to Execution

Rupify enables a direct path from specification to execution. The generated artifacts are testable, traceable, and reproducible. Requirements can be followed through to implementation, estimates can be derived consistently using use case points, and systems can be continuously checked for conformance.

This allows AI agents to operate within clearly defined constraints instead of improvising from loosely defined prompts.

Practical Workflow

A typical workflow begins with a structured interview to capture intent. This is transformed into a canonical model, which in turn produces RUP artifacts. From these, estimation is derived and implementation is guided or generated. Throughout the process, validation is continuous and tied back to the specification.

The important shift is that every step is machine-interpretable and part of a coherent system.

Beyond Documentation

Traditional specifications are written, read, and eventually become outdated. Rupify specifications are generated, executed, and remain active parts of the system. They do not sit beside the implementation; they shape and constrain it.

Outlook

Rupify represents an early step toward a broader shift in software engineering. It points toward specification-driven development, where AI systems operate within executable intent and validation is built into the workflow.

The long-term direction is a move away from code-first development toward systems where specifications define, generate, and continuously validate the implementation.

Skills as a Supply Chain Risk

We’ve Seen This Before

We’ve been here before. First with open source packages, then CI/CD, then infrastructure-as-code. Each time we optimized for speed and reuse, and only later realized the real risk wasn’t what we built, but what we pulled in.

Now it’s happening again. This time with “skills.”

Skills Are a Supply Chain

Skills are emerging as reusable units in the AI stack—installable capabilities executed by agents with access to tools, data, and decisions.

They can contain code. Which means the moment you install and execute them, you’ve created a supply chain.

Early Evidence, Familiar Patterns

A recent large-scale study analyzed more than 238,000 skills across marketplaces and GitHub and found a measurable fraction to be malicious [1]. The numbers are not dramatic, but they are real. Roughly half a percent of skills were confirmed malicious after filtering noise.

More importantly, the attack patterns are familiar. The same study identifies hijacking of skills hosted in abandoned GitHub repositories as an active attack vector [1].

In other words, this is not new risk. It is old risk in a new place.

The Difference Is Execution

What is new is how these components run.

Skills are not just libraries sitting in your build. They are instructions plus executable code, often running with the same privileges as the agent invoking them, and selected dynamically at runtime [2].

That changes the boundary. You are no longer just managing dependencies. You are allowing a system to choose and execute code on your behalf.

Why This Matters

Traditional controls assume stable systems: known dependencies, predictable execution paths, and validation at build time.

That model breaks here.

When selection is dynamic and execution happens at runtime, static analysis and dependency scanning still help—but they no longer describe the system you are actually running. Broader studies of the ecosystem already show a significant portion of skills contain security weaknesses, including supply chain-style vulnerabilities and privilege escalation paths [3].

This Is Still Fixable

None of this requires new principles.

Treat skills as untrusted code.

  • Use only skills from trusted sources with security code scanning
  • Limit what agents can do by default
  • Isolate execution
  • Require provenance
  • Observe behavior at runtime

This is just software engineering discipline applied at the right boundary.

Final Thought

Skills are not just features, they are code executing on your behalf.

We’ve learned how to manage this before. The only question is how quickly we apply those lessons this time.

References

[1] Malicious or Not: Measuring the Security of Agent Skill Ecosystems. https://doi.org/10.48550/arXiv.2603.16572

[2] Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study. https://doi.org/10.48550/arXiv.2602.06547

[3] Agent Skills in the Wild: Vulnerabilities and Supply Chain Risks. https://doi.org/10.48550/arXiv.2601.10338

[4] On the Security of LLM Agents: Prompt Injection and Skill-Based Attacks. https://doi.org/10.48550/arXiv.2602.20156

When the Model Breaks

Introduction

Over the past year, I’ve written three posts that—at the time—felt consistent.

First, I described four categories of AI solutions, arguing that complexity determines where AI works and then, I introduced the trade-off between speed and precision, where fast systems are imprecise and precise systems are slow.

Both were true at the time.

Lastly I introduced the Wiggum Loop which argues that institutional memory is useless.

The original model

The underlying assumption in the two first posts was simple. AI is most effective when problems are well-bounded, precision requirements are low, and iteration costs are small. It struggles when precision is critical, domain knowledge is deep, and errors are expensive. In other words, AI accelerates simple work, while humans remain essential for complex work.

The crack in the model

The Wiggum Loop challenges that assumption. If solutions can be reached through repeated iteration rather than upfront understanding, then precision is no longer a prerequisite—it becomes something you converge on. This changes the equation. Complexity no longer blocks AI in the same way; it simply increases the number of iterations required.

From capability to convergence

The original model was about capability—what AI can do well. The emerging model is about convergence—how quickly a system can explore the solution space and arrive at something that works. Once iteration is cheap and automated, the constraint shifts. It is no longer about whether we can solve a problem, but whether we can recognize when it has been solved.

Reinterpreting the three posts

Seen together, the three posts describe a transition. 

The model does not disappear—it shifts.

The new boundary

The real boundary is no longer complexity or precision. It is whether a problem can be expressed in a way that supports iteration. That requires a clearly defined outcome, explicit constraints, and a way to evaluate results. If those exist, iteration can often replace deep understanding; if they do not, it cannot.

This does not remove expertise—it relocates it. The hard part is no longer solving the problem directly, but defining what success looks like, encoding the right constraints, and deciding how results are evaluated.

What this means for organizations

This is not just a technical shift—it changes how organizations create value. Historically, value came from expertise, experience, and accumulated knowledge. Increasingly, it comes from defining problems clearly, encoding constraints explicitly, and running and governing iterative systems. The center of gravity moves.

The uncomfortable alignment

Taken together, the three posts lead to a slightly uncomfortable conclusion. Much of what we treat as essential organizational knowledge is actually context-bound constraint—decisions made under conditions that no longer apply.

If iteration can rediscover solutions faster than we can recall them, then memory becomes less valuable than exploration. That has consequences. Expertise shifts from knowing answers to defining problems and constraints. Institutional memory becomes less of an authority and more of a hypothesis archive—useful, but not decisive. Roles built around recall and experience start to erode, while roles focused on framing, validation, and governance become more central.

This does not remove humans, but it changes what humans are for—from remembering why things failed to defining what success looks like.

Where this leaves us

The original model still holds, but it is no longer the full picture. AI is not just a tool for solving known problems faster—it is becoming a system for exploring unknown solutions through iteration.

There is a subtle tension here. This trilogy itself depends on cumulative understanding, where each post builds on the last—a small act of institutional memory arguing against institutional memory. Exploration does not replace memory entirely; it changes what kind of memory matters. Constraint-memory becomes less valuable, while model-building and interpretation become more important.

Final thought

We started by asking where AI works. We then asked how precise it needs to be. The emerging question is different: how fast can we iterate—and how well can we recognize success?

That is the thread connecting all three posts, and it is where the model begins to break.

The Wiggum Loop: Brute-Forcing Business with AI

What if persistence beats knowledge?

We’ve spent decades optimizing how organizations think. We built processes, governance structures, architecture reviews, and layers of institutional knowledge. Entire careers are built on knowing why something won’t work.

But what if the fastest path to solving a problem is no longer thinking harder—but trying more? Not smarter. Not deeper. Just… more.

This pattern—often referred to as the Ralph Wiggum loop in AI coding circles—is already well established (https://www.leanware.co/insights/ralph-wiggum-ai-coding). What’s interesting is not the name, but what happens when we apply the same idea outside of coding.

The shift: from knowing to looping

AI coding agents, orchestration platforms, and cheap, elastic compute have changed the economics of problem solving. What used to require deep domain expertise and careful design can now be approached differently. Instead of relying on understanding upfront, we can define the outcome, set guardrails—legal, ethical, and architectural—let agents iterate, and then select what works. This can be repeated at scale.

It is already visible in modern coding workflows, where agents generate, test, and refine code in loops, where skills and tools extend capabilities dynamically, and where tasks can be scheduled, retried, and recomposed. We are no longer limited by how fast we can think, but by how fast we can iterate.

The Wiggum Loop

Named after Ralph Wiggum from The Simpsons, this approach embraces a simple idea:

Try. Fail. Try again. Repeat until something works.

At scale, this stops being naive and starts becoming powerful.

Because the world changes. What failed before may succeed now as technology evolves, constraints shift, data improves, costs drop, and interfaces change. Organizational memory often encodes past constraints as permanent truths, but the Wiggum Loop ignores that and re-attempts relentlessly.

Removing the wrong human from the loop

This is not about removing humans entirely. It is about removing a specific role humans play in organizations—the carrier of historical constraints.

This is the person who says, “We’ve tried that before.” In many cases, that statement is technically correct and strategically wrong.

The Wiggum Loop removes this layer from execution. Humans define the goal and the boundaries, while machines explore the solution space. Humans still decide, but they no longer prematurely constrain.

From knowledge-driven to search-driven organizations

Traditionally, organizations solve problems by gathering expertise, modeling the problem, designing the solution, and then executing.

The Wiggum Loop flips this. Instead, we define the outcome, encode constraints—a kind of “constitution”—generate and test many solutions, and keep what works.

This represents a shift from knowledge-driven systems to search-driven systems. Where knowledge is incomplete or outdated, search wins.

When search beats knowledge—and when it doesn’t

This only works under specific conditions.

Search dominates when outcomes are testable, feedback loops are fast, and failures are cheap or reversible. This describes a large portion of business problems—optimization, configuration, planning, and software-enabled processes.

But the loop breaks when failures are silent or slow, when consequences are irreversible, or when correctness cannot be evaluated. In these cases, iteration can outrun detection, and brute force becomes risk.

The point is not that knowledge disappears. It is that in many domains, it is no longer the primary constraint.

Why this is suddenly viable

Three things have changed at the same time.

  1. Agents can act. They do not just generate outputs but can execute, test, retry, and adapt.
  2. Loops are native, meaning iterative workflows can be run programmatically rather than manually.
  3. Compute is cheap enough that brute force is no longer absurd—it is often practical.

Together, these changes enable systematic, automated exploration of solution spaces at scale.

A practical example: procurement

Consider procurement. Traditionally, sourcing decisions rely heavily on experience, supplier relationships, and historical outcomes, which also means they inherit historical biases and constraints.

Now imagine a Wiggum Loop approach. The objective is defined in terms of cost, reliability, sustainability, and risk. Constraints such as contracts, regulations, and policies are encoded. Agents then explore supplier combinations, simulate scenarios, generate negotiation strategies, and rerun the process with variations.

This results in thousands of iterations, where most will be wrong, but some will be better than anything previously attempted. Crucially, no one needs to remember why something didn’t work in 2018.

Governance without paralysis

This approach only works if guardrails are explicit—and this is the hard part.

Think of it as a constitution that defines what is allowed, what is forbidden, and what must be optimized. Instead of embedding constraints in people, we embed them in systems.

In practice, this means turning intent into executable constraints—tests, policies, specifications, and evaluation criteria that can be applied automatically at scale. We are early in this transition, and most organizations are not yet good at it.

Without this, the loop becomes chaos. With it, the loop becomes power.

The uncomfortable implication

If this works, it challenges something fundamental: how much of organizational value is knowledge, and how much is inertia?

A significant portion of what we call “knowledge” is accumulated constraint—decisions made under conditions that no longer apply. When those constraints are encoded in people, they persist long after the world has changed.

If problems can be solved through clear intent, explicit constraints, and massive iteration, then much of that embedded knowledge becomes optional.

This does not remove humans, but it changes what humans are for—from remembering why things failed, to defining what success looks like.

So the real question is not technical

We already have agents, loops, orchestration, and compute.

The real question is cultural: do we have the courage to try again? To ignore “we’ve done that before,” to let systems explore without prematurely shutting them down, and to trust iteration over intuition—at least long enough to see what emerges.

Final thought

The Wiggum Loop is not about being careless. It is about being relentless in a changing world.

And maybe—just maybe—the organizations that win won’t be the ones that know the most, but the ones that search the best.

From Roles to Work: What Each IT Architect Actually Does

Introduction

In a previous post (Different Roles and Responsibilities for an IT Architect), I outlined the different roles in architecture. The natural next question is: what work actually sits with each role?

This is where I see organizational struggle—not because roles are unclear, but because the work boundaries are.

A useful lens here comes from Svyatoslav Kotusev’s The Practice of Enterprise Architecture, where architecture is described not as a set of roles, but as practices operating at different levels of the organization.

What follows is a practical way to make that explicit.

Note: In my previous post I also included Infrastructure Architects. They are intentionally left out here to keep the focus on how application and solution-level architecture work is split. Infrastructure Architecture operates with similar principles, but across platform and environment concerns.

The Core Principle

For clarity on naming:

  • Enterprise Architect (EA)
  • Domain Architect (DA) — equivalent to what many organizations call Solution Architect
  • Software Architect (SA) — equivalent to Tech Lead

The SA abbreviation is overloaded in many organizations, so in this post SA refers to Software Architect, not Solution Architect.

Each role operates on a different level of abstraction and time horizon:

  • Enterprise Architecture (EA) → direction and constraints  — Sets business-driven direction and guardrails that shape all downstream decisions.
  • Domain Architecture (DA) → alignment and structure  — Translates direction into coherent structures and boundaries across a business area.
  • Software Architecture (SA) → design and execution  — Turns structures into concrete, implementable systems and makes final design decisions.

Enterprise is horizontal across the organization (cross-cutting capabilities, standards, and direction), while Solution/Software is vertical (aligned to specific business areas and initiatives).

Examples:

  • Enterprise looks at things like Customer Management, Product Management, Order Management, Finance, or Supply Chain across all business areas.
  • Domain Architects works within a specific area or initiative and ensures systems in that context fit together.
  • Software Architects decides on software architecture implementation patterns.

If those are confused, enterprise architects turn into domain or software architects—and everything fragments.

Enterprise Architect — The Direction Layer

This layer focuses on business-driven direction and constraints.

Primary work:

  • Define architectural principles and guardrails
  • Align architecture with business strategy and operating model
  • Set direction based on business capabilities and needs
  • Establish governance and decision frameworks

Artifacts:

  • Principles
  • Target architecture (at capability level — e.g. Customer Management, Product Management, Order Management, Finance, or Supply Chain as cross-cutting business capabilities shared across the organization — not specific systems or tools)
  • Strategic direction

What it’s not:

  • Deciding architectural styles (e.g. event-driven vs request/response)
  • Choosing integration patterns or technologies
  • Designing systems or interactions
  • Translating direction into technical solutions

Enterprise architecture answers why and in which direction, not how.

Domain Architect — The Alignment, Design, and Execution Layer

This is where architecture becomes concrete.

Primary work:

  • Shape how business capabilities are realized across systems in a given domain or initiative
  • Ensure consistency and coherence across solutions
  • Design the solution end-to-end
  • Translate enterprise direction into a working architecture
  • Make concrete design choices (e.g. event-driven vs request/response)
  • Define APIs, data flows, and interactions
  • Make trade-offs under real constraints
  • Ensure compliance with standards and principles

This is where architectural intent meets real delivery and must align with defined rules and processes.

Artifacts:

  • Solution designs
  • Architecture decision records
  • Reference patterns (within the context of the domain/initiative)

What it’s not:

  • Defining enterprise-wide principles
  • Working purely at strategy level without delivery responsibility
  • Escalating every decision upward

This is the level where decisions like event-driven vs request/responseKafka vs RESTdata ownership, and consistency models are actually made.

Software Architect — The Reality Check

This is where architecture meets code.

Primary work:

  • Translate architecture into implementation
  • Own technical quality and execution
  • Challenge designs based on reality
  • Ensure operability

What it’s not:

  • Redefining architecture because it’s inconvenient
  • Ignoring constraints set at higher levels
  • Acting only as a senior developer

How the Work Connects

  1. Enterprise (EA) defines direction
  2. Domain (DA) shapes, designs, and makes decisions
  3. Software Architect (SA) ensures it works in practice

The key is that decisions are made at the lowest responsible level.

If Enterprise work is not protected, it will collapse into Solution work.

Final Thought

Architecture breaks down when decisions are made at the wrong level:

  • If enterprise architects decide on Kafka, you lose flexibility.
  • If solution architects define enterprise principles, you lose coherence.

Kotusev’s point is simple: architecture is a system of practices and the value comes from keeping those practices separate—and connected.

« Older posts

© 2026 Peter Birkholm-Buch

Theme by Anders NorenUp ↑