Build your first embedded data product now. Talk to our product experts for a guided demo or get your hands dirty with a free 10-day trial.
Today’s AI-powered product advancements, like Luzmo IQ, represent a new wave of innovation in SaaS tools. However, engineering artificial intelligence into your software products fundamentally differs from traditional software engineering.
This impacts not only the development process itself but also how to manage projects and even how to run AI models in production, giving rise to the field of “MLOps” (Machine Learning Operations).
Here are six core differences to take into account when building AI products:
Software engineering is generally an additive process: adding a feature or fixing a bug (in most cases) affects the system in an explainable way and locally to where the change is being made.
For example, adding a branch at the end of an “if” statement does not affect any earlier branches.
However, when software is developed mainly through adapting system prompts, changes can have unpredictable and unintended consequences.
A typical agile software methodology tends to reduce the overall AI system accuracy over time, as more and more cases make the system prompt grow and make the foundational LLM models struggle to follow all instructions simultaneously.
Instead, think upfront about the full scope of your AI feature and try to reduce it. For example, let your LLM figure out which language to reply in rather than try to force it. Or rather than adding specific instructions to avoid a common error by the LLM, correct it during a post-processing of the output.
As an app designer, you can tightly control the information flow and instructions to the language model to keep it from choking on too many instructions. Refrain from trying to solve everything in one mega-prompt.
Instead, apply a state machine approach where you keep track of the current state of a conversation. In each state, you present only a limited system prompt and list of available function calls (tools) to the LLM. You can advance the state of the conversation based on business rules or let the LLM do a function call to advance it.
In this example, rather than asking the LLM to guide a customer service agent dialogue at the start (state “A”), you only give it a short system prompt on handling the diagnosis part of the conversation. The LLM can ask for clarification on the question or make a diagnosis without thinking about a solution.
LLMs (large language models) are probabilistic models; at their core, they predict the most likely next token[*small unit of text], given the input and the text already generated. By default, they are tuned to be indeterminate. Luckily, providers like OpenAI and Anthropic provide knobs to tune the model to be more predictable:
Log every interaction with your AI – both the raw output and the explicit feedback (👍/ 👎) or implicit feedback (did the user engage with the content). This helps you in several ways:
Don’t forget to log the prompt and prompt versions, as they will likely vary over time.
To maintain system accuracy as development continues, you should test your system methodically at every change and over time. For safety and alignment reasons, LLM models can slightly evolve even within identical model versions.
Test your application by replaying conversations captured (4.) using deterministic LLM settings (3.) and evaluating the outputs. This can be done manually or via a testing framework that uses evaluator functions to compare the actual output vs. the expected output.
For example, the cosine similarity of the answers could be compared, or you can write a custom evaluation function that checks if certain words or data points are present in the reply.
Don’t forget to compare costs and model performance (latency, error rate) alongside accuracy!
Platforms like Orq.ai and BaseTen automate this process.
The work does not stop when development is finished. Users may interact with your AI-powered applications in production unexpectedly or maliciously. Apply continuous monitoring techniques to:
Building AI features that deliver reliable, meaningful outputs requires a fresh approach to engineering. Developers can better control the complexities inherent in AI-powered tools by adapting to a holistic development model and implementing continuous testing and monitoring.
At Luzmo, this approach to AI is exemplified in Luzmo IQ, which transforms raw data into actionable insights while maintaining coherence and reliability. As AI capabilities evolve, these adapted engineering practices will become the backbone of effective, user-friendly, and impactful AI-driven products.
If you want to learn what Luzmo IQ can do for you and your end-users, book a free demo with our team.
Build your first embedded data product now. Talk to our product experts for a guided demo or get your hands dirty with a free 10-day trial.