Mihai Anton
Blog Image

With no doubts, there has been quite a bit of buzz around LLMs in the last year, if not even more, since OpenAI launched GPT3 and the world went crazy. People started becoming creative on what they could do with it, we’ve seen more and more companies and products taking advantage of the generic nature of the model, and we’ve seen the focus changing from other areas towards building bigger and bigger language models.

All this is great, the industry advances, more and more productivity tools are created and the average day in a life gets better. Productivity goes up, valuation of AI companies goes up, GPU manufacturers take advantage of this, big advancements in research are being made, and ultimately more and more people, with or without experience in AI, claim to be experts in the field. However, there’s one thing that is lacking in this: understanding of what actually is an LLM.

Our goals in this article are to:

  1. explain what is an LLM
  2. discourage you from AI-first thinking and
  3. raise awareness about the simplicity of going from 0 to good, but the complexities of going from good to great when building data driven/ML products.

Why would you care about what we tell you? Don’t trust us, trust our background. We’re Mihai and Tomas, tech founders with extensive expertise in AI, Data Engineering, and Product development, with over 7 years of experience each.

Mihai has worked in both big tech at Google and Bloomberg, and with startups in various industries, has built an AI startup. He pursued a Data Science MSc, and is currently a ML Lead engineer, while working on his AI focused software agency. With a passion in AI from the early days, he understand the smallest details that go in big ML architectures, and is able to build them from scratch.

Tomas loves the cloud and big data architectures. He has worked since 2018 in high intensive data engineering related projects at the academic level at the Institute of Intelligent Systems and Numeric Applications in Engineering and the private sector, ranging from big consulting firms, acquired startups such as Factmata and big telecom like Telia.

AI is nothing new. It has been around since the 20th century with the development of algorithms that are the cornerstone of important “AI” processes nowadays. Just so you have an idea, the Attention Is All You Need article published in 2017 explains the foundations of chat GPT, Claude, Gemini… Transformers.

Everybody and their mom are talking about AI nowadays. Nothing wrong with that. In fact, the more people experiment with AI, the more the industry goes and the better tools are created.

The challenge, though, is understanding who knows what, and who to believe when they tell you about the latest RAG system. It all looks good on a nice presentation or a high level chart. But having a nice looking graphic does not take you that far. Ideally, you want to integrate it in your business, feed it some relevant data, potentially fine-tune it, understand trade-offs, deploy it at scale and improve your customer’s journey as a result.

There’s many moving pieces when it comes to using ML models, and LLMs specifically, in production. It's fairly easy to go from 0 to good. But good is mediocre sometimes. The real challenge is to go from good to great. And not anyone can do it.

We've seen the following many times: “I have this data on my drive, can we throw it into an LLM and use it to classify some data in my business with an accuracy of 95%?”. Many things can go wrong having the belief that this can be easily achieved.

  • Yes, we can feed the data. For what purpose? What about personal information? What formats does this data come in? Is it in both PDFs, text documents, images and databases? Does it have somewhat of a structure? Do we need to keep the model in sync with the data changes? There’s many variables, and we didn’t even get to the LLM part.
  • Can we train your LLM to classify data? Maybe. LLMs are not classification algorithms, they just do text completion. Because they are so flexible and large, and because they saw large amounts of data, they are flexible to adapt to many use cases, at a decent level. But keep in mind, they were mainly built to predict the next likely sequence of words. All the flashy demos might tell you otherwise, but just pay extra attention.
  • Can we promise you a certain accuracy before we even start the project? Most likely not. Many clients want to know the risk before experimenting with ML. Would the deliverable be 60% accurate, or 99%. Hard to say, especially in very niched domains, and with custom, proprietary datasets. The only feasible way to test quality is to make a bet, and observe the results. Short and repeated R&D cycles will tell you a lot about where the quality is going, and it would be impossible to predict those a-priori.
  • Can your project be implemented in a week? A short experiment, yes. A full product, certainly no. In the following chapters, we’ll go deeper into what it actually means to use LLMs in production, and we’ll make it easier to understand why it’s not always straightforward.

Asking the real question here. Many people see the hype, many open ChatGPT and think they have seen it all, and many let themselves be guided by the toxic marketing and think it’s a tool to solve all problems.

The main risk worth talking about is the perception that LLMs are a magic box, where data can be thrown and exceptional results can be achieved, regardless of the problem a business is trying to solve. We’ve seen this belief with clients many times, and while ML and in particular language models can achieve spectacular performances, it can lead to false expectations.

So what’s an LLM? Numbers. A huge chain of mathematical operations, that when exposed to the right data, converge to a place where as a whole, are able to generate meaningful results. LLMs are neural networks with a single task: to predict the next word or a series of next words given some input. And trust me when I tell you, it’s all numbers.

When you ask a question, the model has no clue of what that means. Before it’s even passed to the model, it gets encoded in a series of numbers, something the model has seen before. We also call those “tokens”. Having those tokens, the model predicts the most likely next series of tokens, that in turn get converted back to words or sequences. That’s it, this is how a LLM works at a high level.

The purpose of this introductory article is to stay high level. We’ll go into the depths of how they actually work later in the series. However, it’s worth noting that one of the most important pieces of research behind LLMs is the transformer, a Machine Learning architecture proposed in 2017, in the paper Attention is All You Need. What happened later is more or less a variation of the paper, plus a massive scaling of both architectures and training capabilities, that led to the useful models we know today.

”AI” and “LLMs” are nice buzzwords that certainly look good on a startup pitch deck. But do you actually need to use an LLM to make your business go to the next level?

Nothing wrong with using AI in as many situations as possible. Keeping things simple is key to getting good products off the ground. LLMs are clearly a breakthrough in the last 2 years. But overusing them for the sake of displaying “AI" on your landing page is a poor decision.

Let’s craft a checklist of when it might make sense to use an LLM. It’s by no means 100% comprehensive and complete. Everyone has a different view on the subject.

  • You can simplify a process by estimating and doing the work a user is likely to do, ahead of them. (Example: email text completion)
  • You rely on large amounts of text data. (Example: law, medicine, research, content creation)
  • Your use case fits the purpose of an LLM. (Example: you need to analyze large pieces of written content, you have a restaurant and you need a chatbot to help your clients, or you are an airline and want to do over-the-chat bookings).
  • You have the budget to use, augmented, and potentially fine-tune LLMs to your needs.
  • Paying the price for an LLM (and its infrastructure) makes sense from an RoI standpoint.
  • Personal and sensitive data is not a concern, or you have a solid plan on how to deal with that.
  • Probably many more, that can vary from business to business.

Anyone can get access to a good language model. Just go on ChatGPT and start using it. No tech skills required, no code, not much work, no finetuning. And you end up with a good language model. But good is far from great, and aiming for the highest level of performance is where you aim in business.

In this section, we’ll go through what’s needed to get way more performance than a standard, one-size-fits-all language model can deliver. And hopefully, solve some misconceptions around the “magic button” that makes an LLM boost your business significantly.

Blog Image

Going from good to great is significantly slower than going from 0 to good. It requires constant R&D cycles, good decisions, and what’s more important, deep expertise on what to do when things don’t go your way. Each cycle might improve your product by 1% or lower, but staying committed is necessary to getting to a point where you can say: “it’s not yet a great product, but it’s getting closer”.

MORE THAN PROMPT ENGINEERING

You might have heard the notion of prompt engineering. This is not something LLM specific, but rather linked to communication. Imagine 2 people going to the same restaurant. The first one says “I’m hungry”, the other says “I’d like a 3 course menu, centered around Italian cuisine”. Both of them want the same outcome, but one knows how to communicate it better than the other. In LLMs, we call this prompt engineering. Basically, how do you frame your question in a way that makes the mathematical model behind understand it better and then be able to give you something relevant in return. Going from good to great requires you to follow some rules when trying to ask a language model to solve a problem or answer a question.

ACCESS TO DOMAIN SPECIFIC DATA

The average open source language model (take GPT-4 as an example) is very generic. It knows a bit of everything, and is able to give good answers in many areas. The more access you have to domain specific, maybe proprietary information, the higher the chances are that your model will know more of what you ask for. Having this data stored, processed, curated and fed to the model in an efficient and relevant way is key here. It takes more than just calling an API to design such a data pipeline and use it effectively.

RETRIEVAL AUGMENTED GENERATION

You have your proprietary data, nicely stored and structured. Now what? You need to feed it in the model somehow. RAG is one of the most popular approaches to do this. Instead of asking the model a question, without any context, RAG systems augment your question, and prepend it with relevant information from your data store.

It’s like asking a question for an exam, having instant access to all the curriculum. You’ll be able to give a more informed, likely correct answer. RAGs are extremely useful, and can easily take a model from good to great, if implemented and used properly. However, there are complexities that need to be dealt with, from data pipelines, to data storage and live inference together with a language model.

SMALL LANGUAGE MODELS

Much of the hype lately was going bigger and bigger with models. Multiple companies competing over who has the largest number of parameters and what they can achieve with them. There’s another side of the story. Those models are highly capable, however they are heavy on memory, need very expensive and capable infrastructure to run on, and might not be affordable for the average startup (to run and maintain it).

The other side of the story is companies going smaller and smaller in language models. Instead of building a single monolith to rule them all, they build a series of smaller, more specialized models. You might wonder how they decide which one to use at runtime. Apart from those many small and specialized models, there’s another master node that has a single task. Take a query, and distribute it to the model that most likely has the knowledge to answer it properly (this is what Mistral does).

Doing this, not only you reduce the cost and runtime, but you also make it more accessible to run language models on smaller infrastructures, while also opening up the door for people to build even more custom LLMs that could, at some point, be a part of the bigger system.

With the whole hype around LLMs, it’s easy to fall into the trap of wanting to use them in any product. What’s worse, listening to overnight experts might get you stuck in a place where the deliverable is good, but not great.

Our goal through this series of articles is to educate entrepreneurs into finding the right tools and people to implement them, so they can go from good to great with language models, and actually benefit.

If someone sells you “magic AI tools”, stay away. Don’t believe people when they tell you your product must use AI. It might not need to for the first few iterations. However, laying the foundation of your product with an AI future in mind is crucial, so when the time comes, you can seamlessly make use of LLMs, the right way.

Stay tuned to our series to learn more about ML and data, and how it can actually make an impact on your business. Don’t just ride the wave, focus on going from good to great in your AI journey.

Contact me!

Won't be shared with anybody else 🙂