Building a Technical Moat Around LLMs
Four strategies for engineering defensible LLM-based products
Large Language Models (LLMs) are magical—they make tasks that were impossible a few years ago seem simple today. Instead of spending weeks or months training a custom model for a specific Natural Language Processing task, you can just ask ChatGPT to do something in English, and the model will actually just go and do it. People have used ChatGPT to improve their resumes, invent a language, or even simulate a virtual machine.
Since the "programming" of LLMs can be done via natural language, anyone can use it without much previous experience. You don't need a PhD in machine learning to train a specific model for your needs, and in fact, you don't even need to know how the model works at all. In the future, you'll even have flexibility in picking which model you want to use. Besides ChatGPT, Anthropic recently announced Claude, their LLM targeted for chat applications. Google also announced Bard, which will let them bring conversational AI into their whole suite of products.
If anyone can use LLMs without much experience, then won't someone be able to copy your product very quickly?
It turns out, it’s not necessarily that simple. There’s still plenty of hard work to do to make LLMs truly useful in a business context. While these models are great at synthesis, there are many other technical problems that need to be solved that span across data ingestion, pre-and-post processing, fine-tuning, building in natural affordances for human-in-the-loop reinforcement, and even cost management. If you’re smart, you can turn these hard problems into a deep technical moat that keeps the competition away. Let's dive deeper into a few of these.
Data Ingestion and Processing
There are some simple uses of LLMs where the user's text is directly passed into the model along with a prompt, and the output is returned immediately back to the user. Most real use cases, however, are not that straightforward. Depending on your specific use case, you may need to build a data ingestion pipeline and pre-process the information so that the LLM can give you the best results. Doing this in an efficient and effective manner can be a part of your technical moat.
For example, let's say that you are trying to create a CRM that takes information from two different systems and synthesizes them together (for summaries, search, etc). First, you will need to ensure that your data pipeline between the two systems has a consistent schema. If one system has text that mentions a person, you'll need to make sure that the other system refers to that person in a similar manner, or else your LLM may assume that there are two different people being mentioned. This requires pre-processing your data to make sure that important entities match--which could require you to use another model altogether.
In addition, most LLMs have token limits because of their specific implementations, so you'll need workarounds to handle larger requests. For example, GPT-3 has a limit of around 8000 tokens, which is roughly 5-6000 words. If you want to do a task across a 50-page document using GPT-3, you'll need to develop a clever chunking algorithm or some other approach that works around the issue. You may need to pre-process the document using other models like sbert in order to figure out what chunks of data to send into the Large Language Model. All of these solutions require technical expertise and are not easily copied.
Output Processing
Once you have the output from the LLM, it's not always immediately useful. LLMs generate text in an unstructured way, but most software systems require the data to be structured if you want to store it in a database or other data store. How do you transform the unstructured text into something structured? You could call the LLM again and ask it to structure your text, but this doesn't always work. Often times this requires using some other model (possibly custom trained) in order to extract the information you need.
Also, large language models tend to get things wrong a portion of the time. They may hallucinate and make up content, for example. You will need safeguards in place to understand when the content is incorrect, or give the user affordances to tell you when the content doesn't make sense. You also need to figure out what types of input data and prompt produce the best results. This is part of the technical differentiation that you need to build.
Personalization and Fine-tuning
In many cases, you'll need to personalize the content that you output to your users. If two of your users make the same request to an LLM, you may not want to give them the same result. For example, let's say that you're building a chatbot. An intelligent chatbot should remember the history of your conversations and bring old topics up when appropriate. This requires that the same text sent by two different users elicit a different output. How do you accomplish this? Maybe you inject information into the prompt you send based on historical data. You'll need to index and search past conversations efficiently, and then use those messages to personalize the request to the LLM. This specific process is difficult to get right and involves substantial engineering effort, making it difficult to copy.
Depending on the use case, you'll need to fine-tune the LLM in addition to changing the prompt. Before you can fine-tune the model, you'll need a whole process of data collection, data labeling, removing personal identifying information, etc. You end up with similar problems as if you needed to train a conventional model. This requires expertise and technical depth.
Optimizing for Computation Cost
Besides all of these other hard problems, there is also the issue of cost. The best models are not cheap, and this is a limitation that is going to change slowly. Large Language Models have hundreds of billions of parameters (GPT-3 has over 175 billion), which makes compute cost a very real concern for the LLM providers. These providers pass their costs onto the users of their APIs. The most successful products and companies will deliver their solutions with a minimal amount of LLM usage.
Optimizing for cost may mean finding alternate ways of accomplishing what can be done with LLMs. For example, if you are doing zero-shot classification with an LLM, you may want to consider using a cheaper model and fine-tuning or training it to give the same level of results. Since LLMs generally charge per token, you are also incentivized to reduce your token count. This could push you to pre-process your data further before sending it into the model. All of these techniques make your product and company more cost-efficient, giving you the ability to add more functionality into your app compared to your competitors.
Closing
While LLMs are incredibly powerful and transformative, they are also not as straightforward as they seem. This gives you an opportunity to build a technical moat around the usage of LLMs, even as they become a commodity. You can build a more efficient data processing pipeline, a better way to pre-process your text, a more clever way to personalize your output, or a cheaper way of using state-of-the-art models. And, if you combine these technical moats with a unique product that solves a real problem, you're sure to have a company that stands out and remains differentiated over time.
William Cheng is an engineering and product leader, and co-founder of Maestro AI, an AI-powered chief of staff that automates knowledge management for dev teams. Connect with him on LinkedIn.