
How should we approach building LLM-powered software? What methods lead to success, and what can cause disasters down the line?
In this series of blog posts, I'll explore principles for building AI-powered applications developed through three years of professional experience in this field. I hope this series aids you in creating the next generation of AI-powered software.
Prompts used in a product should be treated as code rather than mere text. They require version control, testing, debugging, and performance monitoring. Although written in natural language rather than traditional programming syntax, prompts share essential characteristics with code. Treating prompts as code significantly facilitates the evolution and maturity of AI-powered products. This principle underpins all other techniques described in this series, hence its placement as number one.
Initially, LLMs gained popularity through chatbot windows, where prompts served one-time queries effectively. While helpful for casual inquiries—like what to do in Amsterdam on a Tuesday night—this approach isn't a solid foundation for reliable software products. High-reliability applications demand a disciplined approach to every aspect of LLM development, starting with prompts. Don't view prompts merely as text queries; respect them as code segments.
How are prompts similar to traditional code?
Version Control: Prompts must be version-controlled to enable rollbacks, reviews, and detailed examinations of changes.
Modularity: AI-powered applications typically require multiple distinct functionalities. It is advisable to use smaller, separate prompts rather than one large prompt.
Debugging: Prompts require debugging both before entering production and whenever issues arise later.
Automated Testing: Similar to unit testing for traditional code, prompts should undergo rigorous automated testing. Several future posts will explore specific techniques for this.
Performance Monitoring: Once in production, prompt performance must be continuously monitored, assessing strengths and weaknesses.
Commenting: Like traditional code, prompts benefit from well-written comments to aid future maintenance.
Each of these points will be discussed in greater detail in future posts.
While prompts share many characteristics with traditional code, key differences exist:
Treat prompts as code only when they are integral to your product architecture and are executed repeatedly. Casual, one-time queries to LLMs usually don't require such rigor.
What if prompts aren’t treated as code?
Similar issues arise as when traditional coding best practices are ignored. Short-term productivity might initially seem high, but within months, prompts become large, unwieldy, and intimidating to modify. How can complex application logic be safely updated without automated testing or performance monitoring? Changes and model upgrades become progressively painful, and feature development slows dramatically. Knowledge becomes concentrated in a small group of long-term employees whose departure could significantly harm the organization.
All these pitfalls can be avoided by applying sound AI development principles. Maintaining a high velocity of feature development becomes sustainable, modifications remain frequent and safe, and work becomes more enjoyable. Achieving this only requires knowledge, common sense, and discipline.
If you enjoyed this post, follow the series for more insights.