Stuff about Software Engineering

Month: October 2023

ChatGPT Vision could be used to destroy incumbent image recognition solutions in CPG

I finally got access to ChatGPT Vision and I uploaded a picture of some beers I had at home:

This is the response from ChatGPT:

Nice selection of beers you got there! I see Perlenbacher, which is often associated with Lidl supermarkets, and of course, the iconic Carlsberg. Carlsberg, founded in 1847, is one of the most well-known beer brands globally and is based out of Copenhagen, Denmark. “Birra Messina” seems to be a Sicilian beer. Each of these beers brings a different taste profile to the table. Enjoy responsibly! Would you like to know more about any specific beer?

I then prompted this using TypeChat:

Please attempt to recognize the cans in the picture and fill out the following response:
interface Response {
data: Array<{ name: string, brand: string, count: number}>;
}

And I got this from ChatGPT:

Sure, here’s the response based on the image:

interface Response {
data: Array<{ name: string, brand: string, count: number}>;
}

const beerData: Response = {
data: [
{ name: "Premium Pilsner", brand: "Perlenbacher", count: 2 },
{ name: "Pilsner", brand: "Carlsberg", count: 2 },
{ name: "Classic", brand: "Birra Messina", count: 1 }
]
};

This is based on the visible labels in the image.

No training – no nothing. I know the internet is full of examples of people having ChatGPT turning Figma drawings into web sites and apps – but this is totally out of the comfort zone of ChatGPT and I have previous experience trying to teach image recognition of packaged goods to models and it’s not that easy.

Solutions based on this could destroy the incumbent vendors in image recognition in the CPG market.

Clippy talking to Clippy talking to Clippy…

I listened to a podcast interview with Christian Lanng (Christian Lanng: “How Being a Founder Almost Killed Me”  | E1065) about his next thing with Beyondwork and them trying find out what that means.

Christian says that he doesn’t want a future where a Clippy has to talk to a Clippy which talks to another Clippy and so on. In this context Clippy is some sort of an “AI” which can provide some answer.

I don’t know if that is true or not or if it’s the future we want or not – but right now, that seems to be where we’re headed.

  • LangChain is about chaining prompts and answers together so that we can get better answers
  • Microsoft Copilot Plugins is about enabling Microsoft Copilot to extend your prompts to other systems and consolidate the answers and perform tasks on your behalf
  • Microsoft TypeChat is about providing types and structure to the data we want to pass between systems to make interoperability even easier and more precise

I’ve spent the first 20 years of my life as a developer and the last 20 or so as an “architect” in many different contexts and it almost always boils down to solving integration between business domains and figuring out how data can be accessed, translated and acted upon across the different domains either in real time or before some date and time.

I totally agree with the sentiment that most, if not all, business systems are really just a database with a UI on top and that actual business value comes from combining the data across all systems and extracting answers from it. However, as long as the “business” continues to lock data in silos and refuses to adapt business processes to standard systems and demands customization of systems to accommodate the last 20% of functionality in a way that almost makes integration impossible – then I think the Clippies are going to be around for some time to come.

I think short to medium term the scenarios with plugins to Microsoft Copilot (and similar) will prove valuable to most businesses. Longer term we still need to fix copyright (AIs will be stopped from scraping), privacy (business are scared of leaks) and authentication/authorization (who are you and what do you have access to across boundaries). For instance how can we safely solve problems where data is scattered across private and public domain.

It’s going to be interesting and some new rules/processes are going to have to be worked out.

© 2024 Peter Birkholm-Buch

Theme by Anders NorenUp ↑