Stuff about Software Engineering

Category: Ikke kategoriseret (Page 1 of 14)

The Bottleneck Moved

For years, the dominant constraint in software engineering was implementation cost. Writing software was expensive. Changing it was expensive. The whole discipline organized itself around that fact — estimation, sprints, backlogs, delivery teams — because throughput was the bottleneck.

AI is dismantling that bottleneck faster than most organizations have been able to think about what replaces it.

But there’s a less-noticed shift happening alongside the capability story. Quietly, and then quite visibly, the major AI vendors are moving coding workflows toward consumption-based pricing. GitHub Copilot has introduced premium requests. Anthropic has repeatedly adjusted Claude Code access as demand exploded. OpenAI has separated agent-style workflows into explicit credit models for enterprise use. The details differ — different token pools, different rate limits, different structures — but the direction is consistent: frontier inference is not economically unlimited, and the vendors are starting to price it that way.

This matters more than it might appear, for one reason: AI-assisted development is not reducing demand for software. It’s accelerating it. Developers produce more. Non-developers can now produce some. Agents execute implementation work in parallel. If software was already eating the world, AI is increasing the bite rate.

Which means total inference demand is likely to rise faster than efficiency gains bring costs down. Even as models get cheaper per token, the sheer volume of generated code, tests, pipelines, reports, and scaffolding will grow. Organizations routing large amounts of that work through frontier reasoning models are building a cost structure that may become uncomfortable quickly.


The current pattern of AI-assisted development tends to look like this: open a long-running session with a flagship model, iterate continuously, keep large context windows alive, regenerate as needed. It works. The problem is what it’s spending expensive reasoning capacity on.

A significant portion of that work is not actually ambiguous. It’s repetitive. Deterministic. Structurally predictable. Generating a stable HTML report structure for the fifteenth time doesn’t require the same model that helped you design the architecture. Implementing a transformation you’ve already fully specified doesn’t require frontier-level reasoning. But because the workflow isn’t designed to distinguish between the two, it routes everything through the same expensive path.

That’s partly an economic inefficiency. But it’s also a signal. If implementation continuously requires frontier-level reasoning, that’s usually not a model problem — it’s a decomposition problem. It means ambiguity that should have been resolved upstream is still alive in the work.

Historically, you could absorb that ambiguity because implementation throughput was the real constraint anyway. When implementation gets cheap, the constraint moves. What’s left is ambiguity, architectural clarity, decomposition quality, validation, and organizational alignment. That’s where the work now actually is.


The architectural implication is straightforward, even if acting on it isn’t. Use stronger models where ambiguity and interpretation are genuinely high — architecture, requirements, tradeoff analysis, decomposition, synthesis. Once that work is done and structure has been established, the remaining implementation should be deterministic enough that it doesn’t need the same reasoning capacity.

I’ve been exploring what this looks like in practice through two experimental repositories, Rupify and Speckify. Rupify concentrates expensive reasoning at the front of a workflow: AI-assisted stakeholder interviews, requirements normalization, formal specification generation. The goal is to resolve ambiguity deliberately and early, producing canonical artifacts that downstream work can execute against. Speckify then takes those specifications and decomposes them into atomic, traceable implementation units — work that’s structured enough that smaller, cheaper models can handle it reliably.

The point isn’t “better AI coding.” It’s redesigning the workflow so that frontier reasoning gets used once, where it’s genuinely needed, rather than continuously throughout the entire delivery cycle. The economics look very different when you do that.


For a while, the dominant question in AI-assisted engineering was capability: which model, which agent, which coding assistant. That question isn’t going away. But another constraint is now becoming visible alongside it.

The organizations that scale this well probably won’t be the ones that put AI everywhere. They’ll be the ones that figure out where ambiguity actually lives in their workflows, structure everything else aggressively, and stop paying frontier-model prices for work that stopped being uncertain three steps ago.

The competitive advantage isn’t more AI. It’s better placement.

AI Only Creates Value When Integrated into Execution

Introduction

Over time, I’ve written a number of posts on AI, software engineering, leadership, and how we apply these in practice at Carlsberg Research Laboratory. They were not intended as a single narrative—but taken together, a clear pattern emerges.

Across these posts, the same themes keep surfacing: AI is not the hard part, execution is. The real constraints are people, organizational capability, and how effectively we integrate new technology into how work actually gets done. Individual topics—AI patterns, governance, developer experience, and scientific computing—are all facets of the same underlying problem.

Looking across this body of work, it naturally clusters into six themes. Together, they describe a simple idea:

AI only creates value when it is systematically integrated into execution.

The sections below outline these themes and link to the underlying posts.

The Convergence of AI and Execution

AI is no longer scarce. Models are broadly accessible, and capabilities are rapidly commoditizing. That shifts the source of advantage away from the model itself and toward how effectively it is deployed, integrated, and scaled across real workflows.

The organizations that win are not those experimenting the most, but those embedding AI into execution—where it consistently improves outcomes, handles edge cases, and survives contact with reality. This is also why earlier frameworks sometimes need reinterpretation: what made sense as a classification of solutions or a discussion of trade-offs starts to look different once distribution and operational integration become the real differentiator.

Posts:

  • When Implementation Becomes Cheap: Rethinking Value in Software Consulting

  • AI Is Everywhere. Value Is Not (And It’s Not a Data Problem Either)

  • When the Model Breaks

AI as a Capability Enabler

AI should not be approached as a collection of isolated use cases or one-off solutions. It is better understood as a set of reusable capabilities—classification, generation, retrieval, summarization, reasoning, and automation—that can be composed into systems and patterns.

The shift is from “building AI features” to “building AI-enabled systems,” where value comes from combining these capabilities with data, workflows, and developer experience in a repeatable way. When approached this way, AI becomes an enabler that can strengthen existing platforms and practices rather than a separate, exotic layer of technology.

Posts:

  • Patterns for Artificial Intelligence Solutions

  • GitHub Copilot drives better Developer Experience

  • Four Categories of AI Solutions

The Human Factor: People Skills and Organizational Capability

The primary constraint in AI adoption is not technology. It is people and organizational capability. New roles emerge, expectations shift, and the ability to continuously learn becomes critical as tools, models, and practices evolve faster than most organizations are used to.

This creates pressure not only on hiring and role design, but also on time itself. If teams are run at full utilization, they lose the capacity to learn, adapt, and absorb change. Success depends on building teams that can translate between domain, technology, and business, while also creating enough room for skills to evolve before they become obsolete.

Posts:

  • The Half-Life of Skills: Why 100% Utilization Can Destroy Your Future

  • AI-Engineer: A Distinct and Essential Skillset

  • AI-Engineers: Why People Skills Are Central to AI Success

Governance and Human Oversight

AI introduces new risks, but also creates an opportunity to rethink governance. The goal is not to control adoption through heavy process. The goal is to create guardrails that enable safe, fast, and responsible use.

Human accountability remains central. AI should augment judgment, not replace it. Good governance connects policy, security, and developer experience so that the responsible path is also the practical one. Done well, governance becomes an enabler of adoption rather than a brake on it.

Posts:

  • Responsible AI: Enhance Human Judgment, Don’t Replace It

  • The Intersection of DevEx and DevSecOps: We need a New Way Forward

  • Building a Better Software Practice: A Guide to Policies, Rules, Standards, Processes, Guidelines and Governance

Dual-Track Strategy: Core vs. Strategic AI Projects

AI portfolios need to balance immediate value with long-term positioning. Some initiatives should focus on proven patterns, broad accessibility, and fast adoption. Others should explore new capabilities and areas of differentiation, even when they involve more uncertainty and a longer payback period.

Managing this duality is essential. Over-indexing on core initiatives leads to incrementalism. Over-indexing on strategic bets leads to fragmentation and delivery risk. A pragmatic AI strategy requires both tracks to exist at the same time, with clarity about which type of problem is being solved.

Posts:

  • AI doesn’t create advantage -distribution does

  • Patterns for Artificial Intelligence Solutions

  • Four Categories of AI Solutions

Quantifiable Impact and Future Vision

AI adoption must ultimately be judged by its impact on real outcomes: speed, quality, cost, learning, and innovation. Early productivity gains matter, but the larger transformation comes from integrating AI into end-to-end systems where improvements compound over time.

That is where the future vision becomes clearer. The long-term value is not a collection of isolated AI wins, but a broader shift in how development, research, and organizational workflows operate. In that sense, measurable productivity improvements are only the first visible signal of a much larger change.

Posts:

  • The Evolution of AI: From Frontier Models to Specialized Small Language Models

  • Accelerating Research at Carlsberg Research Laboratory using Scientific Computing

  • GitHub Copilot Probably Saves 50% of Time for Developers

AI Is Everywhere. Value Is Not (And It’s Not a Data Problem Either)

Introduction

Over the past year, AI adoption has exploded. In the Nordics, nearly every company now reports that it has implemented AI in some form. On paper, that should translate into a wave of productivity, growth, and competitive advantage. Only it doesn’t.

A recent BCG study (The Nordic AI Inflection Point: Value Creation or Value Bubble?) shows that while 99% of Nordic companies have adopted AI, only around 4% report significant returns on their investments. At the same time, executives expect AI to deliver 25–30% improvements in both revenue and cost.

This gap between adoption and value is not subtle and it’s not limited to the Nordics. A global enterprise study shows the same pattern (Enterprise AI adoption in 2026: Why 79% face challenges despite high investment):

  • Near-universal AI adoption
  • Heavy usage across employees and executives
  • Only a minority seeing real business impact

More strikingly, over half of executives report that AI adoption is creating internal tension rather than clarity — exposing gaps in strategy, ownership, and execution.

AI is not just failing quietly. It is actively stressing organizations that are not designed to absorb it. Which raises an uncomfortable question: Are we creating value — or a value bubble?

This Is Not a New Problem

In a previous post, I argued that AI doesn’t create advantage but distribution does based on facts that:

  • AI is becoming commoditized
  • Models are widely accessible
  • Tools are rapidly diffusing

So advantage cannot come from AI itself. It must come from how AI is embedded, scaled, and operationalized. The BCG findings are a direct confirmation of this. So AI is everywhere, but execution is not.

The Wrong Debate: Data Before AI

At the same time, many organizations seems to be stuck in a different discussion: “We need better data before we can scale AI.”

I’ve argued the opposite in “AI for data — not data before AI” and that waiting for perfect data is one of the most reliable ways to delay value indefinitely.

Data improves when it is used:

  • In real workflows
  • Under real decisions
  • With real feedback loops

So we end up with two truths:

  • AI alone does not create advantage
  • Data alone does not unlock AI

And yet, most organizations behave as if one of them will.

The Two Traps Killing AI Value

What we see in practice is a predictable pattern.

1. The Tool Trap

Companies deploy AI as tools:

  • Copilots
  • Assistants
  • Automation add-ons

These deliver local gains but they don’t change outcomes, they don’t scale and they don’t compound.

2. The Foundation Trap

Others go the opposite direction:

  • Multi-year data programs
  • Master data management initiatives
  • Platform modernization

AI becomes a future promise and not a present capability.

The False Choice

This leads to a false dichotomy:

  • AI first
  • Or data first

The reality is neither.

  • You don’t get better data before AI
  • You don’t get value from AI without execution

Both positions assume a linear path and AI value is not linear.

What Actually Works: AI in the Loop

The companies that are capturing real value are doing something different.

They are not thinking in steps like: Data → Platform → AI → Value

They are building feedback systems: AI → Usage → Better Data → Better Workflows → Scale → Value

But this only becomes real when you look at how it is designed.

A repeatable pattern looks like this:

  • Start with a concrete workflow (e.g. demand planning, pricing, campaign execution)
  • Apply AI to improve one critical decision point
  • Use the output to expose data gaps and inconsistencies
  • Fix only the data that matters for that workflow
  • Expand AI across adjacent steps
  • Gradually connect the process end-to-end

For example:

  • Deploy AI in demand forecasting
  • Uncover inconsistencies in product hierarchies and sales signals
  • Fix those selectively
  • Extend into inventory and replenishment

Over time, the workflow becomes:

  • More accurate
  • More automated
  • More integrated

This is not just iteration.

It is system design.

Good AI systems are not built top-down. They are grown through use — and then engineered for scale.

From Tools to Workflows

The BCG report highlights a critical distinction:

  • Most companies invest in tools
  • Leaders invest in workflows

That difference matters.

Because:

  • AI applied to tasks creates efficiency
  • AI embedded in workflows creates advantage

Why Most Companies Stall

When AI fails to scale, it’s rarely about the models.

It’s about the system.

  • Tool Trap → Fragmentation
  • Foundation Trap → Delay

Both lead to the same result:

  • Pilots everywhere
  • Duplication of effort
  • No compounding value

The deeper causes are structural:

  • Fragmented data
  • Decentralized ownership
  • Unclear decision rights
  • Limited execution capacity
  • AI treated as IT

The system is not designed to absorb and scale AI.

So AI remains additive and not transformative.

AI Doesn’t Fail. Systems Do.

AI is not underdelivering, but organizations are.

Or more precisely: AI doesn’t fail, it exposes systems that were already failing.

What we are seeing is not an AI gap, it’s a system gap:

  • Between ambition and execution
  • Between tools and transformation
  • Between experiments and scale

The Trilogy

Across three posts, the pattern becomes clear:

  1. AI doesn’t create advantage — distribution does
  2. AI for data — not data before AI
  3. AI is everywhere. Value is not

Together: AI value is not created by technology or data alone, it’s created by systems that connect them.

The Next Phase of AI

The next phase will not be defined by better models. It will be defined by better systems.

Today’s tools are built as standalone assistants:

  • Copilots
  • Chat interfaces
  • Isolated automation

They optimize individuals and not systems.

The tools themselves reinforce the Tool Trap. Which means: Organizations are not just using AI incorrectly and they are buying products that make correct usage harder.

What This Means in Practice

If you want to capture AI value:

  • Stop measuring progress by tools
  • Stop waiting for perfect data
  • Stop layering AI on top

Instead:

  • Start with workflows
  • Build feedback loops
  • Design for reuse
  • Treat AI as part of the operating model

This is not a maturity curve, it’s a design choice.

Conclusion

You don’t win with AI because you have access to it. You don’t win because your data is perfect. You win when your organization can turn AI into systems that scale.

Advantage comes from system design:

  • Not tools
  • Not data in isolation
  • Not default ways of working

Because in the end: AI doesn’t create advantage, distribution does – and distribution is built through systems — whether you design them intentionally or not.

The difference is simple: Some companies design them, but most don’t.

The Evolution of AI: From Frontier Models to Specialized Small Language Models

Where We Came From: The Frontier Model Plateau

Over the past 12–18 months, the large language model (LLM) ecosystem has continued to advance—but largely in an incremental, not disruptive, fashion. Models from OpenAI, Anthropic, and Google have steadily improved across reasoning, multimodality, and scientific benchmarks, yet the relative ordering and qualitative capabilities have remained broadly stable.

Public benchmark suites such as MMLU (Massive Multitask Language Understanding), GPQA (Graduate‑Level Google‑Proof Q&A), and HELM (Stanford Holistic Evaluation of Language Models) show year‑over‑year gains measured in percentage points rather than step‑function breakthroughs. This is not a criticism—these are remarkable systems—but it does indicate a phase of maturation rather than rupture. Frontier models are converging: better, more reliable, more general—but not fundamentally different.

For scientific research, this means frontier GenAI has become a dependable horizontal capability: excellent for literature synthesis, reasoning assistance, explanation, and orchestration—but no longer the sole locus of rapid innovation.

Where We Are Now: The Rise of Small and Specialized Models

In parallel, a very different dynamic is unfolding.

Small Language Models (SLMs) and domain‑specific foundation models are advancing rapidly, particularly in scientific domains such as genomics, protein science, chemistry, and materials research. These models fall broadly into two categories:

  1. Domain‑adapted language models – smaller LLMs fine‑tuned on specific scientific corpora (e.g. chemistry, biology, materials science).
  2. Non‑linguistic foundation models – transformer‑based models trained on alternative “languages” such as DNA, protein sequences, or molecular graphs (e.g. Evo2, ESM, AlphaFold‑class models).

These models are not generalists—and that is precisely their strength. They encode deep inductive bias for their domain, deliver strong signal from sparse data, and increasingly outperform general LLMs on narrowly scoped scientific tasks.

Critically, most of these models do not fit the SaaS GenAI paradigm. They are rarely available via Azure AI Foundry, AWS Bedrock, or similar managed services. Running them typically requires:

  • Dedicated GPU infrastructure (often NVIDIA‑specific)
  • Local fine‑tuning or adaptation
  • Tight coupling to data and experimental context

This creates a structural mismatch between where scientific model innovation is happening and where traditional enterprise AI platforms operate.

External Validation: SLMs as First-Class Scientific Tools

Recent academic work explicitly supports this shift toward small, specialized models. A 2025 paper, “SLMs as Scientific Tools” (arXiv:2512.15943), argues that capability in scientific AI is task-relative rather than size-relative. The authors show that domain-specialized SLMs can match or outperform frontier LLMs on constrained scientific tasks when correctness, structure, and tool integration matter more than linguistic breadth.

Several conclusions from the paper closely align with CRL’s direction:

  • Inference locality beats central intelligence: running models close to data improves latency, reproducibility, validation, and cost control—supporting local, HPC-adjacent, and desk-side deployment.
  • SLMs scale scientifically, not just economically: smaller models are easier to interpret, benchmark, and falsify—critical properties for hypothesis generation and experimental decision-making.
  • Tool integration matters more than prompt engineering: structured inputs and deterministic tool calls outperform free-form prompting in scientific workflows.

The paper ultimately reinforces a hybrid architectural stance: LLMs orchestrate; SLMs execute. This provides external, peer-reviewed validation that SLMs are not a compromise, but the correct abstraction for scientific computing.

A Practical Shift: From Cloud‑Only to Desk‑Side AI

This is where a meaningful, practical shift is occurring.

With the arrival of systems such as NVIDIA DGX Spark, small language models become physically accessible to individual researchers. Instead of renting over‑provisioned H100 or Grace‑Blackwell cloud instances, scientists can:

  • Run and fine‑tune SLMs locally
  • Experiment rapidly without cloud friction or cost surprises
  • Work directly with models that are otherwise unavailable as managed services

In effect, this enables a “small model on every scientist’s desk” paradigm. The value is not raw scale, but immediacy, ownership, and experimentation velocity.

At CRL, this aligns tightly with how scientific progress actually happens: iterative, exploratory, domain‑specific, and data‑proximate.

Looking Toward 2026: A Hybrid, Orchestrated Future

Looking ahead—without making speculative predictions—the most plausible trajectory is not LLMs versus SLMs, but LLMs plus SLMs.

A likely pattern is:

  • Frontier LLMs acting as generalist reasoning, planning, and orchestration layers
  • Specialized small models performing high‑fidelity domain work (genomics, proteins, chemistry, simulation)
  • Tool‑ and model‑calling as the primary integration mechanism

In this model, the LLM does not replace scientific models—it coordinates them. It becomes the interface and glue, while the real scientific signal is generated by specialized systems running locally or on targeted infrastructure.

This is not speculative technology. The building blocks already exist:

  • Tool‑calling and agent frameworks
  • Domain foundation models
  • Local GPU systems capable of running serious scientific workloads

What changes in 2026 is not the theory, but the accessibility.

Summary

  • Frontier LLMs are improving steadily, but incrementally
  • Scientific innovation is accelerating fastest in small, specialized models
  • These models do not fit cloud‑only GenAI platforms
  • Desk‑side systems like DGX Spark make SLMs practically accessible
  • The near‑term future is hybrid: generalist orchestration + specialist execution

Appendix: The Emerging Scientific SLM Ecosystem (snapshot as of 2026-01-21)

Vendor / OriginDomain FocusRepresentative ModelsTypical Scientific Use Cases
NVIDIABiology, Chemistry, ClimateBioNeMo, ChemGPT, MegaMolBART, FourCastNetMolecule generation, QSAR, virtual screening, protein design, weather & climate modeling
DeepMindHigh-impact scientific modelingAlphaFold 3, GraphCastProtein structure prediction, climate forecasting, large-scale simulation
MetaProteins, Scientific LiteratureESMFold, ProtBERT, SciBERTProtein folding, sequence modeling, scientific text analysis
Arc Institute / ProfluentDNA & Protein DesignEvo2, E1DNA sequence design, protein design, strain optimization
Academic & Research ConsortiaGenomics, Materials ScienceOpenFold, MaterialsBERT, MatSciBERTCrystal property prediction, materials discovery
Emerging VendorsSupply Chain & OptimizationSCGPT, Logistics-LLaMA, OR-LLMDemand forecasting, route optimization, constraint planning

Notes

  • Most models listed above are open, open‑weight, or research‑licensed, and evolve in close collaboration with the scientific community.
  • The ecosystem is interoperable and tool‑oriented, designed to be embedded into pipelines rather than accessed via chat interfaces.
  • In contrast, enterprise GenAI platforms primarily target closed, managed, productivity‑oriented workloads.
  • NVIDIA’s role is increasingly that of a horizontal scientific AI platform provider, spanning models, tooling, and local compute rather than acting as a single‑model vendor.
  • Unlike enterprise GenAI platforms, which are predominantly closed and productivity-oriented, the scientific SLM ecosystem is characterized by open models, research licensing, and composability— properties that align naturally with exploratory research environments such as CRL.

Rupify: Executable Specifications for AI-Assisted Software Engineering

Abstract

AI-assisted development has dramatically increased implementation speed, but not correctness. Rupify addresses this gap by turning requirements into executable, structured specifications that can be directly used by AI systems. Rather than relying on informal descriptions or heavyweight formal methods, Rupify operationalizes specifications as artifacts that can be generated, validated, and continuously enforced throughout development. Rupify is open source and available on GitHub: https://github.com/peterbb148/rupify

Why the name Rupify (RUP, UML, UCP)

Rupify takes its name from the Rational Unified Process (RUP), a structured approach to software engineering that emphasizes well-defined artifacts, traceability, and model-driven development. RUP uses the Unified Modeling Language (UML) to describe systems precisely through use cases, domain models, interaction diagrams, state machines, and deployment views. On top of this, Use Case Points (UCP) provide a way to estimate system size and effort based on functional structure rather than code.

Rupify operationalizes this chain—RUP for structure, UML for representation, and UCP for measurement—by turning it into an executable pipeline. Instead of producing documentation, it produces machine-interpretable models that AI systems can use directly for generation, validation, and estimation.

The Problem

AI systems are highly effective at generating, refining, and reviewing code, but they still depend on incomplete requirements, ambiguous intent, and inconsistent structure. This creates a fundamental mismatch where high-capability implementation systems operate on low-fidelity input.

The consequences are predictable. There is drift between intent and implementation, outputs vary across iterations, and correctness cannot be verified in a systematic way. Speed increases, but confidence does not.

The Idea Behind Rupify

Rupify introduces a structured, executable middle layer between intent and implementation. The process moves from interview to structured model, from model to executable artifacts, and from there into implementation and continuous validation.

The core idea is simple but fundamental. Specifications are not written primarily for humans; they are compiled for machines. Instead of acting as passive documentation, they become active inputs to the system.

What Rupify Does

Rupify provides a deterministic pipeline that starts with understanding a problem and ends with verifiable artifacts. Requirements are captured through structured interviews and translated into a canonical project model. From this model, Rupify generates RUP-aligned artifacts such as use cases, domain models, interaction diagrams, state models, and deployment views.

These artifacts are not static descriptions. They form the basis for use case point estimation and enable continuous validation against the original intent. The output is not just text, but a model that can be executed, tested, and checked.

Positioning

Rupify sits in the space between informal and formal approaches. On one side are notes, tickets, and lightweight specification formats. On the other are formal methods such as Z, TLA+, Alloy, and RAISE.

It provides structure without requiring full formalization, making it practical for real-world teams that need both speed and rigor. It is designed for environments where AI is already part of the workflow, but where correctness still matters.

Why This Matters Now

AI has shifted the bottleneck in software development. Writing code is no longer the primary constraint; defining correctness is. Without a structured specification layer, AI amplifies ambiguity rather than resolving it. Increased speed leads to increased drift, and verification becomes reactive instead of proactive.

Rupify addresses this by making correctness part of the input rather than an afterthought.

From Specification to Execution

Rupify enables a direct path from specification to execution. The generated artifacts are testable, traceable, and reproducible. Requirements can be followed through to implementation, estimates can be derived consistently using use case points, and systems can be continuously checked for conformance.

This allows AI agents to operate within clearly defined constraints instead of improvising from loosely defined prompts.

Practical Workflow

A typical workflow begins with a structured interview to capture intent. This is transformed into a canonical model, which in turn produces RUP artifacts. From these, estimation is derived and implementation is guided or generated. Throughout the process, validation is continuous and tied back to the specification.

The important shift is that every step is machine-interpretable and part of a coherent system.

Beyond Documentation

Traditional specifications are written, read, and eventually become outdated. Rupify specifications are generated, executed, and remain active parts of the system. They do not sit beside the implementation; they shape and constrain it.

Outlook

Rupify represents an early step toward a broader shift in software engineering. It points toward specification-driven development, where AI systems operate within executable intent and validation is built into the workflow.

The long-term direction is a move away from code-first development toward systems where specifications define, generate, and continuously validate the implementation.

Skills as a Supply Chain Risk

We’ve Seen This Before

We’ve been here before. First with open source packages, then CI/CD, then infrastructure-as-code. Each time we optimized for speed and reuse, and only later realized the real risk wasn’t what we built, but what we pulled in.

Now it’s happening again. This time with “skills.”

Skills Are a Supply Chain

Skills are emerging as reusable units in the AI stack—installable capabilities executed by agents with access to tools, data, and decisions.

They can contain code. Which means the moment you install and execute them, you’ve created a supply chain.

Early Evidence, Familiar Patterns

A recent large-scale study analyzed more than 238,000 skills across marketplaces and GitHub and found a measurable fraction to be malicious [1]. The numbers are not dramatic, but they are real. Roughly half a percent of skills were confirmed malicious after filtering noise.

More importantly, the attack patterns are familiar. The same study identifies hijacking of skills hosted in abandoned GitHub repositories as an active attack vector [1].

In other words, this is not new risk. It is old risk in a new place.

The Difference Is Execution

What is new is how these components run.

Skills are not just libraries sitting in your build. They are instructions plus executable code, often running with the same privileges as the agent invoking them, and selected dynamically at runtime [2].

That changes the boundary. You are no longer just managing dependencies. You are allowing a system to choose and execute code on your behalf.

Why This Matters

Traditional controls assume stable systems: known dependencies, predictable execution paths, and validation at build time.

That model breaks here.

When selection is dynamic and execution happens at runtime, static analysis and dependency scanning still help—but they no longer describe the system you are actually running. Broader studies of the ecosystem already show a significant portion of skills contain security weaknesses, including supply chain-style vulnerabilities and privilege escalation paths [3].

This Is Still Fixable

None of this requires new principles.

Treat skills as untrusted code.

  • Use only skills from trusted sources with security code scanning
  • Limit what agents can do by default
  • Isolate execution
  • Require provenance
  • Observe behavior at runtime

This is just software engineering discipline applied at the right boundary.

Final Thought

Skills are not just features, they are code executing on your behalf.

We’ve learned how to manage this before. The only question is how quickly we apply those lessons this time.

References

[1] Malicious or Not: Measuring the Security of Agent Skill Ecosystems. https://doi.org/10.48550/arXiv.2603.16572

[2] Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study. https://doi.org/10.48550/arXiv.2602.06547

[3] Agent Skills in the Wild: Vulnerabilities and Supply Chain Risks. https://doi.org/10.48550/arXiv.2601.10338

[4] On the Security of LLM Agents: Prompt Injection and Skill-Based Attacks. https://doi.org/10.48550/arXiv.2602.20156

When the Model Breaks

Introduction

Over the past year, I’ve written three posts that—at the time—felt consistent.

First, I described four categories of AI solutions, arguing that complexity determines where AI works and then, I introduced the trade-off between speed and precision, where fast systems are imprecise and precise systems are slow.

Both were true at the time.

Lastly I introduced the Wiggum Loop which argues that institutional memory is useless.

The original model

The underlying assumption in the two first posts was simple. AI is most effective when problems are well-bounded, precision requirements are low, and iteration costs are small. It struggles when precision is critical, domain knowledge is deep, and errors are expensive. In other words, AI accelerates simple work, while humans remain essential for complex work.

The crack in the model

The Wiggum Loop challenges that assumption. If solutions can be reached through repeated iteration rather than upfront understanding, then precision is no longer a prerequisite—it becomes something you converge on. This changes the equation. Complexity no longer blocks AI in the same way; it simply increases the number of iterations required.

From capability to convergence

The original model was about capability—what AI can do well. The emerging model is about convergence—how quickly a system can explore the solution space and arrive at something that works. Once iteration is cheap and automated, the constraint shifts. It is no longer about whether we can solve a problem, but whether we can recognize when it has been solved.

Reinterpreting the three posts

Seen together, the three posts describe a transition. 

The model does not disappear—it shifts.

The new boundary

The real boundary is no longer complexity or precision. It is whether a problem can be expressed in a way that supports iteration. That requires a clearly defined outcome, explicit constraints, and a way to evaluate results. If those exist, iteration can often replace deep understanding; if they do not, it cannot.

This does not remove expertise—it relocates it. The hard part is no longer solving the problem directly, but defining what success looks like, encoding the right constraints, and deciding how results are evaluated.

What this means for organizations

This is not just a technical shift—it changes how organizations create value. Historically, value came from expertise, experience, and accumulated knowledge. Increasingly, it comes from defining problems clearly, encoding constraints explicitly, and running and governing iterative systems. The center of gravity moves.

The uncomfortable alignment

Taken together, the three posts lead to a slightly uncomfortable conclusion. Much of what we treat as essential organizational knowledge is actually context-bound constraint—decisions made under conditions that no longer apply.

If iteration can rediscover solutions faster than we can recall them, then memory becomes less valuable than exploration. That has consequences. Expertise shifts from knowing answers to defining problems and constraints. Institutional memory becomes less of an authority and more of a hypothesis archive—useful, but not decisive. Roles built around recall and experience start to erode, while roles focused on framing, validation, and governance become more central.

This does not remove humans, but it changes what humans are for—from remembering why things failed to defining what success looks like.

Where this leaves us

The original model still holds, but it is no longer the full picture. AI is not just a tool for solving known problems faster—it is becoming a system for exploring unknown solutions through iteration.

There is a subtle tension here. This trilogy itself depends on cumulative understanding, where each post builds on the last—a small act of institutional memory arguing against institutional memory. Exploration does not replace memory entirely; it changes what kind of memory matters. Constraint-memory becomes less valuable, while model-building and interpretation become more important.

Final thought

We started by asking where AI works. We then asked how precise it needs to be. The emerging question is different: how fast can we iterate—and how well can we recognize success?

That is the thread connecting all three posts, and it is where the model begins to break.

The Wiggum Loop: Brute-Forcing Business with AI

What if persistence beats knowledge?

We’ve spent decades optimizing how organizations think. We built processes, governance structures, architecture reviews, and layers of institutional knowledge. Entire careers are built on knowing why something won’t work.

But what if the fastest path to solving a problem is no longer thinking harder—but trying more? Not smarter. Not deeper. Just… more.

This pattern—often referred to as the Ralph Wiggum loop in AI coding circles—is already well established (https://www.leanware.co/insights/ralph-wiggum-ai-coding). What’s interesting is not the name, but what happens when we apply the same idea outside of coding.

The shift: from knowing to looping

AI coding agents, orchestration platforms, and cheap, elastic compute have changed the economics of problem solving. What used to require deep domain expertise and careful design can now be approached differently. Instead of relying on understanding upfront, we can define the outcome, set guardrails—legal, ethical, and architectural—let agents iterate, and then select what works. This can be repeated at scale.

It is already visible in modern coding workflows, where agents generate, test, and refine code in loops, where skills and tools extend capabilities dynamically, and where tasks can be scheduled, retried, and recomposed. We are no longer limited by how fast we can think, but by how fast we can iterate.

The Wiggum Loop

Named after Ralph Wiggum from The Simpsons, this approach embraces a simple idea:

Try. Fail. Try again. Repeat until something works.

At scale, this stops being naive and starts becoming powerful.

Because the world changes. What failed before may succeed now as technology evolves, constraints shift, data improves, costs drop, and interfaces change. Organizational memory often encodes past constraints as permanent truths, but the Wiggum Loop ignores that and re-attempts relentlessly.

Removing the wrong human from the loop

This is not about removing humans entirely. It is about removing a specific role humans play in organizations—the carrier of historical constraints.

This is the person who says, “We’ve tried that before.” In many cases, that statement is technically correct and strategically wrong.

The Wiggum Loop removes this layer from execution. Humans define the goal and the boundaries, while machines explore the solution space. Humans still decide, but they no longer prematurely constrain.

From knowledge-driven to search-driven organizations

Traditionally, organizations solve problems by gathering expertise, modeling the problem, designing the solution, and then executing.

The Wiggum Loop flips this. Instead, we define the outcome, encode constraints—a kind of “constitution”—generate and test many solutions, and keep what works.

This represents a shift from knowledge-driven systems to search-driven systems. Where knowledge is incomplete or outdated, search wins.

When search beats knowledge—and when it doesn’t

This only works under specific conditions.

Search dominates when outcomes are testable, feedback loops are fast, and failures are cheap or reversible. This describes a large portion of business problems—optimization, configuration, planning, and software-enabled processes.

But the loop breaks when failures are silent or slow, when consequences are irreversible, or when correctness cannot be evaluated. In these cases, iteration can outrun detection, and brute force becomes risk.

The point is not that knowledge disappears. It is that in many domains, it is no longer the primary constraint.

Why this is suddenly viable

Three things have changed at the same time.

  1. Agents can act. They do not just generate outputs but can execute, test, retry, and adapt.
  2. Loops are native, meaning iterative workflows can be run programmatically rather than manually.
  3. Compute is cheap enough that brute force is no longer absurd—it is often practical.

Together, these changes enable systematic, automated exploration of solution spaces at scale.

A practical example: procurement

Consider procurement. Traditionally, sourcing decisions rely heavily on experience, supplier relationships, and historical outcomes, which also means they inherit historical biases and constraints.

Now imagine a Wiggum Loop approach. The objective is defined in terms of cost, reliability, sustainability, and risk. Constraints such as contracts, regulations, and policies are encoded. Agents then explore supplier combinations, simulate scenarios, generate negotiation strategies, and rerun the process with variations.

This results in thousands of iterations, where most will be wrong, but some will be better than anything previously attempted. Crucially, no one needs to remember why something didn’t work in 2018.

Governance without paralysis

This approach only works if guardrails are explicit—and this is the hard part.

Think of it as a constitution that defines what is allowed, what is forbidden, and what must be optimized. Instead of embedding constraints in people, we embed them in systems.

In practice, this means turning intent into executable constraints—tests, policies, specifications, and evaluation criteria that can be applied automatically at scale. We are early in this transition, and most organizations are not yet good at it.

Without this, the loop becomes chaos. With it, the loop becomes power.

The uncomfortable implication

If this works, it challenges something fundamental: how much of organizational value is knowledge, and how much is inertia?

A significant portion of what we call “knowledge” is accumulated constraint—decisions made under conditions that no longer apply. When those constraints are encoded in people, they persist long after the world has changed.

If problems can be solved through clear intent, explicit constraints, and massive iteration, then much of that embedded knowledge becomes optional.

This does not remove humans, but it changes what humans are for—from remembering why things failed, to defining what success looks like.

So the real question is not technical

We already have agents, loops, orchestration, and compute.

The real question is cultural: do we have the courage to try again? To ignore “we’ve done that before,” to let systems explore without prematurely shutting them down, and to trust iteration over intuition—at least long enough to see what emerges.

Final thought

The Wiggum Loop is not about being careless. It is about being relentless in a changing world.

And maybe—just maybe—the organizations that win won’t be the ones that know the most, but the ones that search the best.

From Roles to Work: What Each IT Architect Actually Does

Introduction

In a previous post (Different Roles and Responsibilities for an IT Architect), I outlined the different roles in architecture. The natural next question is: what work actually sits with each role?

This is where I see organizational struggle—not because roles are unclear, but because the work boundaries are.

A useful lens here comes from Svyatoslav Kotusev’s The Practice of Enterprise Architecture, where architecture is described not as a set of roles, but as practices operating at different levels of the organization.

What follows is a practical way to make that explicit.

Note: In my previous post I also included Infrastructure Architects. They are intentionally left out here to keep the focus on how application and solution-level architecture work is split. Infrastructure Architecture operates with similar principles, but across platform and environment concerns.

The Core Principle

For clarity on naming:

  • Enterprise Architect (EA)
  • Domain Architect (DA) — equivalent to what many organizations call Solution Architect
  • Software Architect (SA) — equivalent to Tech Lead

The SA abbreviation is overloaded in many organizations, so in this post SA refers to Software Architect, not Solution Architect.

Each role operates on a different level of abstraction and time horizon:

  • Enterprise Architecture (EA) → direction and constraints  — Sets business-driven direction and guardrails that shape all downstream decisions.
  • Domain Architecture (DA) → alignment and structure  — Translates direction into coherent structures and boundaries across a business area.
  • Software Architecture (SA) → design and execution  — Turns structures into concrete, implementable systems and makes final design decisions.

Enterprise is horizontal across the organization (cross-cutting capabilities, standards, and direction), while Solution/Software is vertical (aligned to specific business areas and initiatives).

Examples:

  • Enterprise looks at things like Customer Management, Product Management, Order Management, Finance, or Supply Chain across all business areas.
  • Domain Architects works within a specific area or initiative and ensures systems in that context fit together.
  • Software Architects decides on software architecture implementation patterns.

If those are confused, enterprise architects turn into domain or software architects—and everything fragments.

Enterprise Architect — The Direction Layer

This layer focuses on business-driven direction and constraints.

Primary work:

  • Define architectural principles and guardrails
  • Align architecture with business strategy and operating model
  • Set direction based on business capabilities and needs
  • Establish governance and decision frameworks

Artifacts:

  • Principles
  • Target architecture (at capability level — e.g. Customer Management, Product Management, Order Management, Finance, or Supply Chain as cross-cutting business capabilities shared across the organization — not specific systems or tools)
  • Strategic direction

What it’s not:

  • Deciding architectural styles (e.g. event-driven vs request/response)
  • Choosing integration patterns or technologies
  • Designing systems or interactions
  • Translating direction into technical solutions

Enterprise architecture answers why and in which direction, not how.

Domain Architect — The Alignment, Design, and Execution Layer

This is where architecture becomes concrete.

Primary work:

  • Shape how business capabilities are realized across systems in a given domain or initiative
  • Ensure consistency and coherence across solutions
  • Design the solution end-to-end
  • Translate enterprise direction into a working architecture
  • Make concrete design choices (e.g. event-driven vs request/response)
  • Define APIs, data flows, and interactions
  • Make trade-offs under real constraints
  • Ensure compliance with standards and principles

This is where architectural intent meets real delivery and must align with defined rules and processes.

Artifacts:

  • Solution designs
  • Architecture decision records
  • Reference patterns (within the context of the domain/initiative)

What it’s not:

  • Defining enterprise-wide principles
  • Working purely at strategy level without delivery responsibility
  • Escalating every decision upward

This is the level where decisions like event-driven vs request/responseKafka vs RESTdata ownership, and consistency models are actually made.

Software Architect — The Reality Check

This is where architecture meets code.

Primary work:

  • Translate architecture into implementation
  • Own technical quality and execution
  • Challenge designs based on reality
  • Ensure operability

What it’s not:

  • Redefining architecture because it’s inconvenient
  • Ignoring constraints set at higher levels
  • Acting only as a senior developer

How the Work Connects

  1. Enterprise (EA) defines direction
  2. Domain (DA) shapes, designs, and makes decisions
  3. Software Architect (SA) ensures it works in practice

The key is that decisions are made at the lowest responsible level.

If Enterprise work is not protected, it will collapse into Solution work.

Final Thought

Architecture breaks down when decisions are made at the wrong level:

  • If enterprise architects decide on Kafka, you lose flexibility.
  • If solution architects define enterprise principles, you lose coherence.

Kotusev’s point is simple: architecture is a system of practices and the value comes from keeping those practices separate—and connected.

AI doesn’t create advantage -distribution does

Introduction

In the early 20th century, factories did not gain much by simply replacing steam engines with electric motors. The real gains came later, when they reorganized how work was done—redesigning layouts, workflows, and roles to take advantage of distributed power [9]. AI is following the same pattern. The technology itself is not the differentiator. How it is distributed inside the organization is.

Across domains, the pattern is already visible.

In scientific research, systems like AlphaFold and other AI models in biology and chemistry are shifting the frontier of what good looks like. Researchers who integrate these tools into their workflows move faster, explore more hypotheses, and expand output. Others are not just slower—they are operating below a moving baseline.

In software engineering, the dynamic is different but related. AI compresses the time required to produce code, but also compresses the time required to produce failure. Teams that combine strong engineering practices with AI accelerate safely. Teams that rely on generated output without discipline introduce risk at speed.

In both cases, the effect is not uniform improvement. It is divergence.

The hidden failure mode: uneven distribution

What is emerging is not a lack of AI capability, but an uneven distribution of it.

Some individuals and teams gain early access, experiment, and build fluency through use. Others wait for guidance, are constrained by governance, or never fully integrate AI into how they work. Over time, this creates a gap in capability that compounds.

This is where the A and B teams begin to appear—not as a deliberate strategy, but as a consequence of how access, learning, and incentives are structured.

AI literacy beats AI elites

Organizations that scale AI successfully distribute capability rather than concentrate it [1][2].

When AI is centralized, teams depend on specialists. Demand exceeds capacity, and most of the organization remains passive. When capability is distributed, teams solve problems locally, and learning happens through application rather than instruction.

McKinsey consistently finds that only a minority of companies capture meaningful value from AI, and those that do embed it across functions rather than isolating it [1]. Experimental evidence reinforces that productivity gains depend on how individuals integrate AI into their work, not just whether they have access to it [11][12].

The constraint is not the model. It is whether people know how to use it effectively in context.

The Center of Excellence trap

The default enterprise response is to centralize AI into a Center of Excellence. This improves oversight and consistency, but it also creates a structural bottleneck. Every team now depends on a central unit for access, prioritization, and delivery, which does not scale with demand.

More importantly, it concentrates knowledge. Patterns, practices, and hard-won lessons accumulate inside the CoE rather than flowing through the organization. Capability becomes something you request, not something you build.

This is why many organizations are exploring federated and embedded operating models [3][4], though the transition is often incomplete and uneven. The goal is not just to distribute execution—it is to distribute capability.

This is where platform engineering provides a better mental model. Instead of acting as a delivery function, the central team builds golden paths: paved, opinionated ways of working that make the right thing the easy thing. Tooling, templates, guardrails, and reusable components are exposed directly to teams, enabling them to move independently while staying within defined boundaries.

The difference is fundamental. A CoE pulls work toward itself. A platform pushes capability outward. One creates queues. The other creates flow.

If AI is treated as a centralized service, it will scale linearly at best. If it is treated as a platform, it can scale with the organization.

AI creates uneven gains, not uniform uplift

Research consistently shows average productivity gains in the range of 10–20%, combined with substantial variation across users and tasks [5][10][12]. The variation is the important part.

In some contexts, less experienced workers benefit significantly because AI transfers best practices and reduces barriers to entry. In others, highly skilled workers gain more when operating within the effective frontier of the technology. Outcomes depend on skill, task, and how well AI is integrated into the workflow.

The result is not a level playing field, but a changing gradient. People and teams that adapt effectively accelerate. Those who do not fall behind, even when they have access to the same tools.

Governance is becoming the bottleneck

Organizations respond to AI risk by increasing control: approvals, restrictions, and policy layers. While necessary, this often introduces systemic friction.

Industry and institutional research consistently identify organizational barriers—not technical limitations—as the primary constraint on AI value creation [1][3]. The issue is less about building capability and more about enabling its use.

A more effective approach is proportional governance. Low-risk, individual use cases require minimal control. Team-level workflows benefit from lightweight oversight. High-impact, enterprise-critical systems require full governance. This aligns with risk-based approaches such as those from the OECD [8].

Without this proportionality, governance becomes a bottleneck rather than a safeguard.

How the divide compounds

The gap between A and B teams develops through small, compounding differences in access, learning environments, and culture.

Some teams have direct access to tools and are encouraged to experiment. Others operate through restricted interfaces and formal processes. Some learn through iteration; others wait for approval.

Over time, these differences accumulate. One part of the organization develops new capabilities and ways of working, while another continues with established practices. Eventually, they are no longer operating at the same level.

Distribution requires giving up some control

Avoiding this outcome requires accepting a degree of decentralization. Teams need the ability to experiment locally, and organizations need to tolerate variation in tools and approaches.

This introduces a temporary phase where things feel less controlled and less consistent. That phase is where learning happens. Eliminating it too early suppresses adoption and reinforces the divide.

AI as infrastructure

If AI remains confined to specialists, organizations create internal inequality and limit their ability to adapt. If it becomes embedded in everyday work—more like electricity than expertise—it enables continuous, distributed improvement.

The objective is not to build a stronger AI team, but to remove the distinction altogether. Because the organizations that benefit most from AI will not be those with the most advanced models, but those where its use is widespread, routine, and integrated into how work gets done.

References

[1] McKinsey & Company – The State of AI
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

[2] Boston Consulting Group – Artificial Intelligence Capabilities
https://www.bcg.com/capabilities/artificial-intelligence

[3] Deloitte – State of AI in the Enterprise
https://www2.deloitte.com/us/en/insights/focus/cognitive-technologies/state-of-ai-and-intelligent-automation-in-business-survey.html

[4] Gartner – How to Scale AI in the Enterprise
https://www.gartner.com/en/articles/how-to-scale-ai-in-the-enterprise

[5] National Bureau of Economic Research – Generative AI at Work
https://www.nber.org/papers/w31161

[8] OECD – AI Principles
https://oecd.ai/en/ai-principles

[9] Paul A. David – The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox
https://doi.org/10.3386/w5099

[10] Quarterly Journal of Economics – Generative AI at Work
https://academic.oup.com/qje/article/140/2/889/7990658

[11] MIT Sloan – How Generative AI Can Boost Highly Skilled Workers’ Productivity
https://mitsloan.mit.edu/ideas-made-to-matter/how-generative-ai-can-boost-highly-skilled-workers-productivity

[12] MIT Economics – Experimental Evidence on Generative AI
https://economics.mit.edu/sites/default/files/inline-files/Noy_Zhang_1.pdf

« Older posts

© 2026 Peter Birkholm-Buch

Theme by Anders NorenUp ↑