Stuff about Software Engineering

Category: Quantifiable Impact and Future Vision

AI adoption must ultimately be judged by its impact on real outcomes: speed, quality, cost, learning, and innovation. Early productivity gains matter, but the larger transformation comes from integrating AI into end-to-end systems where improvements compound over time.

That is where the future vision becomes clearer. The long-term value is not a collection of isolated AI wins, but a broader shift in how development, research, and organizational workflows operate. In that sense, measurable productivity improvements are only the first visible signal of a much larger change.

The Evolution of AI: From Frontier Models to Specialized Small Language Models

Where We Came From: The Frontier Model Plateau

Over the past 12–18 months, the large language model (LLM) ecosystem has continued to advance—but largely in an incremental, not disruptive, fashion. Models from OpenAI, Anthropic, and Google have steadily improved across reasoning, multimodality, and scientific benchmarks, yet the relative ordering and qualitative capabilities have remained broadly stable.

Public benchmark suites such as MMLU (Massive Multitask Language Understanding), GPQA (Graduate‑Level Google‑Proof Q&A), and HELM (Stanford Holistic Evaluation of Language Models) show year‑over‑year gains measured in percentage points rather than step‑function breakthroughs. This is not a criticism—these are remarkable systems—but it does indicate a phase of maturation rather than rupture. Frontier models are converging: better, more reliable, more general—but not fundamentally different.

For scientific research, this means frontier GenAI has become a dependable horizontal capability: excellent for literature synthesis, reasoning assistance, explanation, and orchestration—but no longer the sole locus of rapid innovation.

Where We Are Now: The Rise of Small and Specialized Models

In parallel, a very different dynamic is unfolding.

Small Language Models (SLMs) and domain‑specific foundation models are advancing rapidly, particularly in scientific domains such as genomics, protein science, chemistry, and materials research. These models fall broadly into two categories:

  1. Domain‑adapted language models – smaller LLMs fine‑tuned on specific scientific corpora (e.g. chemistry, biology, materials science).
  2. Non‑linguistic foundation models – transformer‑based models trained on alternative “languages” such as DNA, protein sequences, or molecular graphs (e.g. Evo2, ESM, AlphaFold‑class models).

These models are not generalists—and that is precisely their strength. They encode deep inductive bias for their domain, deliver strong signal from sparse data, and increasingly outperform general LLMs on narrowly scoped scientific tasks.

Critically, most of these models do not fit the SaaS GenAI paradigm. They are rarely available via Azure AI Foundry, AWS Bedrock, or similar managed services. Running them typically requires:

  • Dedicated GPU infrastructure (often NVIDIA‑specific)
  • Local fine‑tuning or adaptation
  • Tight coupling to data and experimental context

This creates a structural mismatch between where scientific model innovation is happening and where traditional enterprise AI platforms operate.

External Validation: SLMs as First-Class Scientific Tools

Recent academic work explicitly supports this shift toward small, specialized models. A 2025 paper, “SLMs as Scientific Tools” (arXiv:2512.15943), argues that capability in scientific AI is task-relative rather than size-relative. The authors show that domain-specialized SLMs can match or outperform frontier LLMs on constrained scientific tasks when correctness, structure, and tool integration matter more than linguistic breadth.

Several conclusions from the paper closely align with CRL’s direction:

  • Inference locality beats central intelligence: running models close to data improves latency, reproducibility, validation, and cost control—supporting local, HPC-adjacent, and desk-side deployment.
  • SLMs scale scientifically, not just economically: smaller models are easier to interpret, benchmark, and falsify—critical properties for hypothesis generation and experimental decision-making.
  • Tool integration matters more than prompt engineering: structured inputs and deterministic tool calls outperform free-form prompting in scientific workflows.

The paper ultimately reinforces a hybrid architectural stance: LLMs orchestrate; SLMs execute. This provides external, peer-reviewed validation that SLMs are not a compromise, but the correct abstraction for scientific computing.

A Practical Shift: From Cloud‑Only to Desk‑Side AI

This is where a meaningful, practical shift is occurring.

With the arrival of systems such as NVIDIA DGX Spark, small language models become physically accessible to individual researchers. Instead of renting over‑provisioned H100 or Grace‑Blackwell cloud instances, scientists can:

  • Run and fine‑tune SLMs locally
  • Experiment rapidly without cloud friction or cost surprises
  • Work directly with models that are otherwise unavailable as managed services

In effect, this enables a “small model on every scientist’s desk” paradigm. The value is not raw scale, but immediacy, ownership, and experimentation velocity.

At CRL, this aligns tightly with how scientific progress actually happens: iterative, exploratory, domain‑specific, and data‑proximate.

Looking Toward 2026: A Hybrid, Orchestrated Future

Looking ahead—without making speculative predictions—the most plausible trajectory is not LLMs versus SLMs, but LLMs plus SLMs.

A likely pattern is:

  • Frontier LLMs acting as generalist reasoning, planning, and orchestration layers
  • Specialized small models performing high‑fidelity domain work (genomics, proteins, chemistry, simulation)
  • Tool‑ and model‑calling as the primary integration mechanism

In this model, the LLM does not replace scientific models—it coordinates them. It becomes the interface and glue, while the real scientific signal is generated by specialized systems running locally or on targeted infrastructure.

This is not speculative technology. The building blocks already exist:

  • Tool‑calling and agent frameworks
  • Domain foundation models
  • Local GPU systems capable of running serious scientific workloads

What changes in 2026 is not the theory, but the accessibility.

Summary

  • Frontier LLMs are improving steadily, but incrementally
  • Scientific innovation is accelerating fastest in small, specialized models
  • These models do not fit cloud‑only GenAI platforms
  • Desk‑side systems like DGX Spark make SLMs practically accessible
  • The near‑term future is hybrid: generalist orchestration + specialist execution

Appendix: The Emerging Scientific SLM Ecosystem (snapshot as of 2026-01-21)

Vendor / OriginDomain FocusRepresentative ModelsTypical Scientific Use Cases
NVIDIABiology, Chemistry, ClimateBioNeMo, ChemGPT, MegaMolBART, FourCastNetMolecule generation, QSAR, virtual screening, protein design, weather & climate modeling
DeepMindHigh-impact scientific modelingAlphaFold 3, GraphCastProtein structure prediction, climate forecasting, large-scale simulation
MetaProteins, Scientific LiteratureESMFold, ProtBERT, SciBERTProtein folding, sequence modeling, scientific text analysis
Arc Institute / ProfluentDNA & Protein DesignEvo2, E1DNA sequence design, protein design, strain optimization
Academic & Research ConsortiaGenomics, Materials ScienceOpenFold, MaterialsBERT, MatSciBERTCrystal property prediction, materials discovery
Emerging VendorsSupply Chain & OptimizationSCGPT, Logistics-LLaMA, OR-LLMDemand forecasting, route optimization, constraint planning

Notes

  • Most models listed above are open, open‑weight, or research‑licensed, and evolve in close collaboration with the scientific community.
  • The ecosystem is interoperable and tool‑oriented, designed to be embedded into pipelines rather than accessed via chat interfaces.
  • In contrast, enterprise GenAI platforms primarily target closed, managed, productivity‑oriented workloads.
  • NVIDIA’s role is increasingly that of a horizontal scientific AI platform provider, spanning models, tooling, and local compute rather than acting as a single‑model vendor.
  • Unlike enterprise GenAI platforms, which are predominantly closed and productivity-oriented, the scientific SLM ecosystem is characterized by open models, research licensing, and composability— properties that align naturally with exploratory research environments such as CRL.

Accelerating Research at Carlsberg Research Laboratory using Scientific Computing

Introduction

Scientific discovery is no longer just about what happens in the lab—it’s about how we enable research through computing, automation, and AI. At Carlsberg Research Laboratory (CRL), our Accelerate Research initiative is designed to remove bottlenecks and drive breakthroughs by embedding cutting-edge technology into every step of the scientific process.

The Five Core Principles of Acceleration

To ensure our researchers can spend more time on discovery we are focusing on:

  • Digitizing the Laboratory – Moving beyond manual processes to automated, IoT-enabled research environments.
  • Data Platform – Creating scalable, accessible, and AI-ready data infrastructure that eliminates data silos.
  • Reusable Workflows – Standardizing and automating research pipelines to improve efficiency and reproducibility.
  • High-Performance Computing (HPC) – Powering complex simulations and large-scale data analysis. We are also preparing for the future of quantum computing, which promises to transform how we model molecular behavior and simulate complex biochemical systems at unprecedented speed and scale.
  • Artificial Intelligence – Enhancing data analysis, predictions, and research automation beyond just generative AI.

The Expected Impact

By modernizing our approach, we aim to:

  • Reduce research setup time by up to 70%
  • Accelerate experiment iteration by 3x
  • Improve cross-team collaboration efficiency by 5x
  • Unlock deeper insights through AI-driven analysis and automation

We’re not just improving research at CRL; we’re redefining how scientific computing fuels innovation. The future of research is fast, automated, and AI-driven.

GitHub Copilot Probably Saves 50% of Time for Developers

Introduction

Recently GitHub released the GitHub Copilot Metrics API which provides customers the ability to view how Copilot is used and as usual someone created an Open Source tool to view the data: github-copilot-resources/copilot-metrics-viewer.

So let’s take a look at the usage of Copilot in Software Engineering in Carlsberg from end of May to end of June 2024.

I’m focusing on the following three metrics:

  • Total Suggestions
  • Total Lines Suggested
  • Acceptance Rate

As I think they are useful for understanding how effective Copilot is and I would like to get closer to an actual understanding of the usefulnes of Copilot rather than the broad statement offered by both GitHub and our own developers that it saves 50% of their time.

The missing data in the charts is due to an error in the GitHub data pipeline at the time of writing and data will be made available at a later stage.

The low usage in the middle of June is due to some public holidays with lots of people taking time off.

Total Suggestions

Total Lines Suggested: Showcases the total number of lines of code suggested by GitHub Copilot. This gives an idea of the volume of code generation and assistance provided.

Total Lines Suggested

Total Lines Accepted: The total lines of code accepted by users (full acceptances) offering insights into how much of the suggested code is actually being utilized incorporated to the codebase.

Acceptance Rate

Acceptance Rate: This metric represents the ratio of accepted lines to the total lines suggested by GitHub Copilot. This rate is an indicator of the relevance and usefulness of Copilot’s suggestions.

Conclusion

The overall acceptance rate is about 20% which resonates with my experience as Copilot tends to either slightly miss the objective and/or be verbose so that you have to trim/change a lot of code. So if Copilot suggests 100 lines of code you end up accepting 20.

Does this then align with the statements from developers in Software Engineering and GitHub which claim that you save 50% of time using Copilot?

Clearly reviewing and changing code is faster than writing, so even if you end up only using 20% of the suggested code, you will save time.

Unfortunately we don’t track actual time to complete tasks in Jira, so we don’t have hard data to prove the claim.

But is the claim true? Probably – however, I’m 100% convinced that GitHub Copilot drives better Developer Experience.

GitHub Copilot drives better Developer Experience

Introduction

This is part 3 of:

Explaining how Carlsberg unifies development on GitHub and accelerates innovation with Copilot in more detail.

By integrating GitHub Copilot into our development workflow, Carlsberg has significantly enhanced the developer experience. Copilot acts as an intelligent coding assistant, offering real-time suggestions and code completions. This seamless integration enables our developers to write more efficient and error-free code. From a business perspective, this translates to accelerated development cycles and a boost in productivity, allowing us to bring innovations to market faster and maintain a competitive edge.

Understand Code Faster

GitHub Copilot transcends simple code suggestions by providing developers with the ability to quickly understand existing codebases and even entire projects. This feature is invaluable for onboarding new team members and tackling complex legacy systems. By asking Copilot to explain intricate code, developers can rapidly grasp functionality without deep-diving into documentation or consulting peers. For Carlsberg, this means reduced ramp-up times for new projects and more efficient utilization of developer time, leading to cost savings and faster project deliveries.

Spend Less Time on Scaffolding

Scaffolding, while necessary, often consumes valuable time that could be better spent on developing business-critical features. GitHub Copilot streamlines this process by generating the foundational code structures automatically. This allows our developers at Carlsberg to concentrate on crafting the unique aspects of our solutions that drive real business value. The direct result is a more agile development process, with resources optimally allocated towards innovation and creating competitive advantages.

Lower the Learning Curve

Adopting new frameworks and technologies is a constant challenge in the fast-paced tech environment. GitHub Copilot lowers the learning curve for our developers by suggesting how to effectively use new frameworks. This guidance reduces the time spent on trial and error, enabling our team to leverage the latest technologies confidently. For Carlsberg, this capability ensures that we are always at the forefront of technology adoption, enhancing our agility and ability to respond to market changes swiftly.

Reduce Monotonous Work

Monotonous Work, like writing unit tests, though critical for ensuring code quality, can be tedious and time-consuming. GitHub Copilot addresses this by generating unit tests, which developers can then review and refine. This automation not only speeds up the development process but also ensures a high standard of code quality. At Carlsberg, leveraging Copilot for unit testing means our developers can focus more on developing features that add value to the business, while still maintaining a robust and reliable codebase.

Improve Documentation

Well-crafted documentation is crucial for maintainability and scalability but is often overlooked due to the time it requires. GitHub Copilot aids in this aspect by automatically generating meaningful comments and documentation during code commits or pull request reviews. This not only saves time but also enhances the quality of our documentation, making it easier for developers to understand and work with our code. At Carlsberg, improved documentation directly translates to reduced maintenance costs and smoother collaboration among teams, further driving operational efficiency.

Developer Experience = Productivity + Impact + Satisfaction

At Carlsberg, our integration of GitHub Copilot into our development workflow has not just been about improving individual elements of the coding process—it’s about a holistic enhancement of the overall developer experience. 

GitHub frames Developer Experience as the sum of Productivity, Impact, and Satisfaction. Here’s how Copilot aligns with these components:

  • Productivity: By automating and accelerating parts of the development cycle, Copilot directly boosts productivity. In the “Spend Less Time on Scaffolding” and “Reduce Monotonous Work” sections, we explored how Copilot streamlines tasks that traditionally consume significant time and resources. This allows our developers to focus on higher-value work, speeding up our overall project timelines and making our workflow more efficient.
  • Impact: The true measure of any tool or process change is the impact it has on the business and its goals. As discussed in “Understand Code Faster” and “Improve Documentation,” Copilot helps our team tackle complex systems more effectively and maintain better documentation. This not only enhances our current projects but also secures our long-term ability to adapt and grow, significantly impacting our operational success and market competitiveness.
  • Satisfaction: A satisfied developer is a productive and innovative developer. Through features like lowering the learning curve for new technologies and reducing the drudgery of repetitive tasks, as highlighted in “Lower the Learning Curve” and “Reduce Monotonous Work,” Copilot increases job satisfaction. This leads to a more engaged team, ready to innovate and push boundaries in pursuit of new solutions.

By investing in tools that elevate these aspects of the developer experience, Carlsberg is not just improving our software; we are fostering a culture of efficiency, innovation, and satisfaction. This commitment not only enhances our current team’s morale and output but also positions us as a forward-thinking leader in leveraging technology to drive business success.

Conclusion

GitHub Copilot has revolutionized the way we approach software development at Carlsberg, significantly enhancing the overall developer experience. By automating repetitive tasks, simplifying complex codebases, and expediting the learning process for new technologies, Copilot has allowed our developers to focus on what they do best: creating innovative solutions that drive real business value. This not only leads to a more satisfied and engaged development team but also accelerates our time-to-market and improves our competitive stance. The integration of GitHub Copilot into our workflow is a testament to Carlsberg’s commitment to leveraging cutting-edge technology to foster a culture of efficiency, innovation, and continuous improvement. It’s clear that by investing in tools that enhance the developer experience, we’re not just improving our software; we’re building a stronger foundation for our business’s future success.

Carlsberg unifies development on GitHub and accelerates innovation with Copilot

I’m honored that GitHub have chosen to do a customer story on how we’re transforming software development in Carlsberg: https://github.com/customer-stories/carlsberg-group and a friggin awesome movie.

The movie was also used by Satya Nadella in the Build 2024 Keynote:

To provide a bit more background information I’ve written some posts:

© 2026 Peter Birkholm-Buch

Theme by Anders NorenUp ↑