Category: AI

Making Sense of LLM Training

2025-11-03 / Peter Birkholm-Buch

Introduction

We often talk about training large language models (LLMs) as if it’s one thing — but it really isn’t.

There are several distinct types of training, each with its own purpose, cost, and level of control.

Understanding the difference helps clarify what’s realistic to do in practice, and what should be left to the model labs with thousands of GPUs and power budgets larger than small towns.

Here’s a simple breakdown.

Base Training — Learning Language from Scratch

This is where it all begins.

The model learns to predict the next word in a sentence across trillions of examples. It’s how it develops a general understanding of language, reasoning, facts, and relationships between concepts.

Purpose: Build a general-purpose foundation.

Data: Huge, diverse datasets (Common Crawl, Wikipedia, code, books).

Cost: Astronomical — only done by major labs.

Analogy: Teaching a child how to speak and read.

Once complete, this produces what’s called a base model — capable, but not polite, safe, or even particularly helpful.

Post-Training — Teaching Behavior and Alignment

After base training, the model needs to learn how to behave.

This phase adjusts it to follow instructions, respond helpfully, and align with human preferences and safety policies.

It typically involves:

Supervised Fine-Tuning (SFT): The model learns from curated examples of correct input/output pairs.
Reinforcement Learning from Human Feedback (RLHF): Humans rank several model responses, and the model learns to prefer the higher-ranked ones.
Reinforcement Learning from AI Feedback (RLAIF): The same, but with AI systems acting as evaluators.

Purpose: Make the model cooperative and safe.

Analogy: Teaching manners, ethics, and social intelligence after language is learned.

All commercial models — GPT-4, Claude 3, Gemini, Llama 3 — go through this step before they ever reach users.

Fine-Tuning — Specializing for a Domain

Fine-tuning takes a general model and teaches it domain-specific knowledge: medicine, law, brewing, internal documentation — whatever your niche may be.

There are a few variants:

Full fine-tuning: retraining all model weights (rare and expensive).
Parameter-efficient fine-tuning (LoRA, QLoRA, PEFT): training small adapter layers on top of the frozen base model — vastly cheaper and reversible.

Purpose: Adapt to a domain or style.

Analogy: Sending a fluent speaker to medical school.

In practice, fine-tuning makes sense only if you have high-quality, well-structured data and a clear purpose — for example, improving factual recall in a specific knowledge domain or matching a company’s tone of voice.

Reinforcement Fine-Tuning — Teaching Preferences and Optimization

A newer development combines reinforcement learning with fine-tuning to optimize specific, measurable outcomes — such as factuality, brevity, or computational efficiency.

OpenAI’s Reinforcement Fine-Tuning (RFT) of the o1 model is one recent example: instead of relying on humans to rate outputs, the process automatically scores model responses using well-defined reward functions.

Purpose: Optimize behavior using measurable rewards.

Analogy: Practicing until performance metrics improve, rather than memorizing answers.

Retrieval-Augmented Generation (RAG) — Adding Knowledge Without Training

RAG isn’t training at all — it’s a retrieval technique.

The model stays frozen but is connected to a search index, database, or vector store.

When asked a question, it first retrieves relevant information and then generates an answer grounded in that content.

Purpose: Keep models current and connected to external knowledge.

Analogy: Looking something up rather than memorizing it.

RAG is ideal when data changes frequently, or when you can’t or shouldn’t embed sensitive data into the model itself.

Prompt Tuning — Lightweight Personality Shaping

The lightest-weight form of adaptation is prompt tuning, sometimes called soft prompting.

Here, a small vector (or a few tokens) is trained to steer the model’s behavior without modifying its core weights.

Purpose: Adjust tone or persona without retraining.

Analogy: Giving the same person a new job description — “today you’re the legal assistant.”

Prompt tuning is useful when you want to offer multiple personalities or roles from a single model.

What’s Reasonable to Do (and What Isn’t)

To make this practical, here’s how the different kinds of training align with the Four Categories of AI Solutions from simple Copilot-style automation to custom AI systems built from scratch.

Goal	Technique	Cost & Effort	Category (from “Four Categories of AI Solutions”)	Typical Use
General chatbot or Copilot	None (use aligned base model)	🟢 Low	Category 1 — Copilot / built-in AI features	Office copilots, internal Q&A bots
Domain expertise	LoRA / adapter fine-tuning	🟠 Medium	Category 2 — Configured / composable solutions	Industry copilots, internal assistants
Keep knowledge fresh	RAG or hybrid RAG + fine-tuning	🟠 Medium	Category 3 — Integrated or extended AI systems	Research assistants, customer-facing search
Optimize measurable output	Reinforcement fine-tuning	🔴 High	Category 4 — Custom AI / in-house LLMs	Scientific computing, advanced R&D
Create new model family	Base training	🚫 Extreme	Beyond Category 4	Reserved for foundation model labs

Summary

Base training — teaches language.
Post-training (SFT/RLHF) — teaches behavior.
Fine-tuning — teaches domain knowledge.
Reinforcement fine-tuning — teaches optimization.
RAG / Prompt tuning — extend without retraining.

LLMs aren’t trained once — they’re trained in layers.

Each layer shapes a different aspect of intelligence: from raw linguistic intuition to helpful conversation, domain expertise, and ongoing adaptability.

Knowing which layer you’re working with isn’t just a technical detail — it’s the difference between using AI and building with it.

AI for Data (Not Data and AI)

2025-04-24 / Peter Birkholm-Buch

Cold Open

Most companies get it backwards.

They say “Data and AI,” as if AI is dessert—something you get to enjoy only after you’ve finished your vegetables. And by vegetables, they mean years of data modeling, integration work, and master‑data management. AI ends up bolted onto the side of a data office that’s already overwhelmed.

That mindset isn’t just outdated—it’s actively getting in the way.

It’s time to flip the script. It’s not Data and AI. It’s AI for Data.

AI as a Data Appendage: The Legacy View

In most org charts, AI still reports to the head of data. That tells you everything: AI is perceived as a tool to be used on top of clean data. The assumption is that AI becomes useful only after you’ve reached some mythical level of data maturity.

So what happens? You wait. You delay. You burn millions building taxonomies and canonical models that never quite deliver. When AI finally shows up, it generates dashboards or slide‑deck summaries. Waste of potential.

What If AI Is Your Integration Layer?

Here’s the mental flip: AI isn’t just a consumer of data—it’s a synthesizer. A translator. An integrator – an Enabler!

Instead of cleaning, mapping, and modeling everything up front, what if you simply exposed your data—as is—and let the AI figure it out?

That’s not fantasy. Today, you can feed an AI messy order tables, half‑finished invoice exports, inconsistent SKU lists—and it still works out the joins. Sales and finance data follow patterns the model has seen a million times.

The magic isn’t that AI understands perfect data. The magic is that it doesn’t need to.

MCP: OData for Agents

Remember OData? It promised introspectable, queryable APIs—you could ask the endpoint what it supported. Now meet MCP (Model Context Protocol). Think OData, but for AI agents.

With MCP, an agent can introspect a tool, learn what actions exist, what inputs it needs, what outputs to expect. No glue code. No brittle integrations. You expose a capability, and the AI takes it from there.

OData made APIs discoverable. MCP makes tools discoverable to AIs.

Expose your data with just enough structure, and let the agent reason. No mapping tables. No MDM. Just AI doing what it’s good at: figuring things out.

Why It Works in Science—And Why It’ll Work in Business

Need proof? Look at biology.

Scientific data is built on shared, Latin‑based taxonomies. Tools like Claude or ChatGPT navigate these datasets without manual schema work. At Carlsberg we’ve shown an AI connecting yeast strains ➜ genes ➜ flavor profiles in minutes.

Business data is easier. You don’t need to teach AI what an invoice is. Or a GL account. These concepts are textbook. Give the AI access and it infers relationships. If it can handle yeast genomics, it can handle your finance tables.

Stop treating AI like glass. It’s ready.

The Dream: MCP‑Compliant OData Servers

Imagine every system—ERP, CRM, LIMS, SharePoint—exposing itself via an AI‑readable surface. No ETLs, no integration middleware, no months of project time.

Combine OData’s self‑describing endpoints with MCP’s agent capabilities. You don’t write connectors. You don’t centralize everything first. The AI layer becomes the system‑of‑systems—a perpetual integrator, analyst, translator.

Integration disappears. Master data becomes a footnote.

When Do You Still Need Clean Data?

Let’s address the elephant in the room: there are still scenarios where data quality matters deeply.

Regulatory reporting. Financial reconciliation. Mission-critical operations where a mistake could be costly. In these domains, AI is a complement to—not a replacement for—rigorous data governance.

But here’s the key insight: you can pursue both paths simultaneously. Critical systems maintain their rigor, while the vast majority of your data landscape becomes accessible through AI-powered approaches.

AI for Data: The Flip That Changes Everything

You don’t need perfect data to start using AI. That’s Data and AI thinking.

AI for Data starts with intelligence and lets structure emerge. Let your AI discover, join, and reason across your real‑world mess—not just your sanitized warehouse.

It’s a shift from enforcing models to exposing capabilities. From building integrations to unleashing agents. From waitingto acting while you learn.

If your organization is still waiting to “get the data right,” here’s your wake‑up call: you’re waiting for something AI no longer needs.

AI is ready. Your data is ready enough.

The only question left: Are you ready to flip the model?

Four Categories of AI Solutions

2024-04-22 / Peter Birkholm-Buch

Introduction

When driving value from generative AI (GenAI) it’s important to choose the right approach in order to be able to get a return on investment. This page attempts at explaining possible approaches and required resources.

Takers, Shapers and Makers

There seems to be 3 major categories of GenAI adopters according to McKinsey and Gartner:

McKinsey	Gartner	Description
Takers	Quick Wins	Focus on utilizing existing GenAI tools and models for productivity improvements with minimal customization. These initiatives typically have short time to value and are task-specific, aiming for immediate efficiency gains in routine tasks.
Shapers	Differentiating Use Cases	Engage in integrating GenAI tools with proprietary data or adapting them for specific applications. These initiatives aim to achieve competitive advantages, involving medium time to value with higher costs and risks than quick wins. They leverage GenAI to extend current processes and create unique value propositions.
Makers	Transformative Initiatives	Concentrate on developing new GenAI models or tools for specialized applications, with the potential to transform business models and markets. These are the most ambitious initiatives, characterized by high cost, complexity, and risk, and a long time to value. They aim for strategic benefits that may be difficult to quantify initially.

TCO/ROI

The Total Cost of Ownership (TCO) and Return on Investment (ROI) for GenAI adoption across takers, shapers, and makers categories involve several considerations, including hidden costs, strategic implications, and potential benefits.

Gartner offers insights on measuring GenAI ROI, advocating for a business case approach that simulates potential cost and value realization across GenAI activities. This approach categorizes investments into quick wins, differentiating use cases, and transformational initiatives. Quick wins focus on immediate productivity improvements with short time to value, differentiating use cases aim at competitive advantage with medium time to value, and transformative initiatives have the potential to upend business models with longer time to value but higher costs and complexity. The guide emphasizes the importance of balancing financial returns with strategic benefits, which might be difficult to quantify initially.

Source: https://www.gartner.com/en/articles/take-this-view-to-assess-roi-for-generative-ai.
Red box is added by me, see conclusion below.

Builders

I’m introducing an extra “Builders” category into the GenAI adoption landscape beyond merely adopting or adapting, Builders take a step further by crafting bespoke extensions and plugins for GenAI platforms. This initiative is driven by the ambition to tackle intricate, multi-step workflows that typically demand considerable human intervention. The essence of being a Builder lies in their ability to not just work with GenAI but to enhance its core capabilities, enabling solutions that seamlessly bridge various systems and processes. This approach demands a blend of creativity, technical prowess, and a deep understanding of both the technology and the problem domain.

Category	Description	Required People Resources/Skills	Tools
Takers	Utilize existing GenAI tools for productivity improvements with minimal customization. Aimed at immediate efficiency gains in routine tasks with short time to value.	Basic understanding of AI/ML conceptsSkills in integrating and configuring APIs Ability to adapt third-party GenAI tools to existing workflows	Microsoft Copilot Microsoft Copilot Plugins Enterprise “Chat”-GPTs
Shapers	Integrate GenAI tools with proprietary data or adapt them for specific applications to achieve competitive advantages, involving medium time to value with higher costs and risks.	Low/No-code developers Domain experts for data interpretation Project managers with a technical background	Retrieval Augmented Generation (RAG) Microsoft Copilot Studio Microsoft Azure AI Studio
Builders	Develop custom solutions or extensions to GenAI platforms to solve complex, multi-step processes that usually require significant human effort.	Advanced programming skills in relevant languages Data scientists for model tuning Experience with GenAI frameworks Systems integration expertise Creative problem-solving abilities	Microsoft Copilot Extensions Microsoft PromptFlow LangChain LangGraph LlamaIndex AutoGen CrewAI (OpenAI Swarm) LLM Function Calling LLM Routing LLM Threat Modelling LLM Security
Makers	Develop new GenAI models or tools for specialized applications with the potential to transform business models and markets. Characterized by high cost, complexity, and risk, with a long time to value.	Expertise in deep learning and neural networks Experience in building and training large-scale AI modelsStrong research and development background Ability to work with high-performance computing resources	LLM Models LLM Frameworks LLM Fine-Tuning (LLM Creation and Training)

The “Builders” category fills the gap between “Shapers,” who mainly adapt existing models for their unique needs, and “Makers,” who create new GenAI models from scratch. Builders leverage powerful frameworks and platforms to create bespoke solutions that automate complex workflows, potentially revolutionizing how businesses approach process automation and efficiency. This distinction underscores the evolving landscape of GenAI adoption, highlighting the increasing sophistication and customization capabilities available to organizations.

Conclusion

The red box on the image above indicates that solutions made in the Takers and lower Shapers category are likely to be overtaken by standard solutions from vendors and the plethora of SaaS AI offerings appearing on a daily basis. Caution should be used when choosing to invest in solutions in this area unless quick wins are important.

Clearly it’s important to have a strategic, well-planned approach to integrating GenAI with emphasis on organizational readiness, skill development, and a focus on applications that offer a competitive advantage – otherwise GenAI just becomes a technology looking for a problem like Blockchain.

References

ChatGPT Vision could be used to destroy incumbent image recognition solutions in CPG

2023-10-14 / Peter Birkholm-Buch

I finally got access to ChatGPT Vision and I uploaded a picture of some beers I had at home:

This is the response from ChatGPT:

Nice selection of beers you got there! I see Perlenbacher, which is often associated with Lidl supermarkets, and of course, the iconic Carlsberg. Carlsberg, founded in 1847, is one of the most well-known beer brands globally and is based out of Copenhagen, Denmark. “Birra Messina” seems to be a Sicilian beer. Each of these beers brings a different taste profile to the table. Enjoy responsibly! Would you like to know more about any specific beer?

I then prompted this using TypeChat:

Please attempt to recognize the cans in the picture and fill out the following response:
interface Response { data: Array<{ name: string, brand: string, count: number}>; }

And I got this from ChatGPT:

Sure, here’s the response based on the image:

interface Response { data: Array<{ name: string, brand: string, count: number}>; }

const beerData: Response = { data: [ { name: "Premium Pilsner", brand: "Perlenbacher", count: 2 }, { name: "Pilsner", brand: "Carlsberg", count: 2 }, { name: "Classic", brand: "Birra Messina", count: 1 } ] };

This is based on the visible labels in the image.

No training – no nothing. I know the internet is full of examples of people having ChatGPT turning Figma drawings into web sites and apps – but this is totally out of the comfort zone of ChatGPT and I have previous experience trying to teach image recognition of packaged goods to models and it’s not that easy.

Solutions based on this could destroy the incumbent vendors in image recognition in the CPG market.

Clippy talking to Clippy talking to Clippy…

2023-10-04 / Peter Birkholm-Buch

I listened to a podcast interview with Christian Lanng (Christian Lanng: “How Being a Founder Almost Killed Me” | E1065) about his next thing with Beyondwork and them trying find out what that means.

Christian says that he doesn’t want a future where a Clippy has to talk to a Clippy which talks to another Clippy and so on. In this context Clippy is some sort of an “AI” which can provide some answer.

I don’t know if that is true or not or if it’s the future we want or not – but right now, that seems to be where we’re headed.

LangChain is about chaining prompts and answers together so that we can get better answers
Microsoft Copilot Plugins is about enabling Microsoft Copilot to extend your prompts to other systems and consolidate the answers and perform tasks on your behalf
Microsoft TypeChat is about providing types and structure to the data we want to pass between systems to make interoperability even easier and more precise

I’ve spent the first 20 years of my life as a developer and the last 20 or so as an “architect” in many different contexts and it almost always boils down to solving integration between business domains and figuring out how data can be accessed, translated and acted upon across the different domains either in real time or before some date and time.

I totally agree with the sentiment that most, if not all, business systems are really just a database with a UI on top and that actual business value comes from combining the data across all systems and extracting answers from it. However, as long as the “business” continues to lock data in silos and refuses to adapt business processes to standard systems and demands customization of systems to accommodate the last 20% of functionality in a way that almost makes integration impossible – then I think the Clippies are going to be around for some time to come.

I think short to medium term the scenarios with plugins to Microsoft Copilot (and similar) will prove valuable to most businesses. Longer term we still need to fix copyright (AIs will be stopped from scraping), privacy (business are scared of leaks) and authentication/authorization (who are you and what do you have access to across boundaries). For instance how can we safely solve problems where data is scattered across private and public domain.

It’s going to be interesting and some new rules/processes are going to have to be worked out.

Making Sense of LLM Training

Introduction

Base Training — Learning Language from Scratch

Post-Training — Teaching Behavior and Alignment

Fine-Tuning — Specializing for a Domain

Reinforcement Fine-Tuning — Teaching Preferences and Optimization

Retrieval-Augmented Generation (RAG) — Adding Knowledge Without Training

Prompt Tuning — Lightweight Personality Shaping

What’s Reasonable to Do (and What Isn’t)

Summary

AI for Data (Not Data and AI)

Cold Open

AI as a Data Appendage: The Legacy View

What If AI Is Your Integration Layer?

MCP: OData for Agents

Why It Works in Science—And Why It’ll Work in Business

The Dream: MCP‑Compliant OData Servers

When Do You Still Need Clean Data?

AI for Data: The Flip That Changes Everything

Four Categories of AI Solutions

Introduction

Takers, Shapers and Makers

TCO/ROI

Builders

Conclusion

References

ChatGPT Vision could be used to destroy incumbent image recognition solutions in CPG

Clippy talking to Clippy talking to Clippy…

Tags

Archives