Blog

How Engineering AI products is Different

Artificial Intelligence
Nov 21, 2024
How Engineering AI products is Different

Today’s AI-powered product advancements, like Luzmo IQ, represent a new wave of innovation in SaaS tools. However, engineering artificial intelligence into your software products fundamentally differs from traditional software engineering. 

This impacts not only the development process itself but also how to manage projects and even how to run AI models in production, giving rise to the field of “MLOps” (Machine Learning Operations).

Here are six core differences to take into account when building AI products:

Design AI applications holistically

Software engineering is generally an additive process: adding a feature or fixing a bug (in most cases) affects the system in an explainable way and locally to where the change is being made. 

For example, adding a branch at the end of an “if” statement does not affect any earlier branches.

However, when software is developed mainly through adapting system prompts, changes can have unpredictable and unintended consequences.

A typical agile software methodology tends to reduce the overall AI system accuracy over time, as more and more cases make the system prompt grow and make the foundational LLM models struggle to follow all instructions simultaneously.

Instead, think upfront about the full scope of your AI feature and try to reduce it. For example, let your LLM figure out which language to reply in rather than try to force it. Or rather than adding specific instructions to avoid a common error by the LLM, correct it during a post-processing of the output.

Don’t overwhelm the LLM

As an app designer, you can tightly control the information flow and instructions to the language model to keep it from choking on too many instructions. Refrain from trying to solve everything in one mega-prompt.

Instead, apply a state machine approach where you keep track of the current state of a conversation. In each state, you present only a limited system prompt and list of available function calls (tools) to the LLM. You can advance the state of the conversation based on business rules or let the LLM do a function call to advance it.

engineering AI products, a graph

In this example, rather than asking the LLM to guide a customer service agent dialogue at the start (state “A”), you only give it a short system prompt on handling the diagnosis part of the conversation. The LLM can ask for clarification on the question or make a diagnosis without thinking about a solution.

Prefer model determinicity

LLMs (large language models) are probabilistic models; at their core, they predict the most likely next token[*small unit of text], given the input and the text already generated. By default, they are tuned to be indeterminate. Luckily, providers like OpenAI and Anthropic provide knobs to tune the model to be more predictable:

  • The temperature controls how likely the model is to pick the next token, which is not the top choice but still a good choice. Setting it to 0 will always pick the most probable next token.
  • The seed initializes an internal random number generator. By setting a deterministic seed per client or interaction and logging it, you can recreate the same conditions to test prompt changes.

Have a data management plan from the start

Log every interaction with your AI – both the raw output and the explicit feedback (👍/ 👎) or implicit feedback (did the user engage with the content). This helps you in several ways:

  • You can easily replay and debug conversations, aiding your development process.
  • You build a dataset that can be used to fine-tune a model. You can do this to boost the accuracy of your system or fine-tune a smaller and cheaper model to give you similar accuracy as a more advanced LLM.
  • Caching responses can help you cut your LLM bill.

Don’t forget to log the prompt and prompt versions, as they will likely vary over time.

Use a testing framework

To maintain system accuracy as development continues, you should test your system methodically at every change and over time. For safety and alignment reasons, LLM models can slightly evolve even within identical model versions.

Test your application by replaying conversations captured (4.) using deterministic LLM settings (3.) and evaluating the outputs. This can be done manually or via a testing framework that uses evaluator functions to compare the actual output vs. the expected output. 

For example, the cosine similarity of the answers could be compared, or you can write a custom evaluation function that checks if certain words or data points are present in the reply.

Don’t forget to compare costs and model performance (latency, error rate) alongside accuracy!

Platforms like Orq.ai and BaseTen automate this process.

Use continuous monitoring

The work does not stop when development is finished. Users may interact with your AI-powered applications in production unexpectedly or maliciously. Apply continuous monitoring techniques to:

  • Keep track of your system's running costs and real-time accuracy and be alerted of deviations quickly.
  • Have a “watchdog” application evaluate ongoing conversations and even block them if they become too expensive or deviate from the application's purpose (e.g., by users pen-testing your AI application).

Conclusion

Building AI features that deliver reliable, meaningful outputs requires a fresh approach to engineering. Developers can better control the complexities inherent in AI-powered tools by adapting to a holistic development model and implementing continuous testing and monitoring.

At Luzmo, this approach to AI is exemplified in Luzmo IQ, which transforms raw data into actionable insights while maintaining coherence and reliability. As AI capabilities evolve, these adapted engineering practices will become the backbone of effective, user-friendly, and impactful AI-driven products.

If you want to learn what Luzmo IQ can do for you and your end-users, book a free demo with our team.

Haroen Vermylen

Haroen Vermylen

CTO and Founder

Haroen Vermylen is CTO and Founder at Luzmo. With over 12 years of experience and a degree in artificial intelligence, he's a rooted expert in data visualization and business intelligence. Passionate about technology and data, he loves building new data products and writes about them on the Luzmo blog.

Build your first embedded dashboard in less than 15 min

Experience the power of Luzmo. Talk to our product experts for a guided demo  or get your hands dirty with a free 10-day trial.

Dashboard