The Case for Preprocessing: Using LLMs Before Your Users Do
Learn how to use LLMs efficiently by preprocessing and storing AI-generated content on the backend instead of generating it on demand, saving costs, reducing latency, and improving scalability.
Most of us interact with LLMs today through chat interfaces. We type in whatever’s on our mind (e.g. random questions, half-formed thoughts) and the AI responds with something uniquely tailored, almost instantly. It’s beyond impressive. But this immediacy also shapes how we think about using LLMs: as tools that only operate in real time, responding to a user’s request on the spot. That mindset can be limiting.
In reality, most apps deal with finite datasets. Sure, they might contain millions of entries, but they’re still bounded. Take Airbnb as an example. While listings are frequently added or updated, the vast majority remain unchanged. I’m currently working on a Sanskrit Dictionary app, which also has a fixed set of words. And this is true for most apps - they rely on structured, relatively stable databases behind the scenes.
Let’s say you want to sort or enrich this data somehow. In my case, each Sanskrit word might have over a hundred definitions across multiple dictionaries, and I want to group these similar meanings together to make them more digestible. One approach is to do this dynamically: when a user taps on a word, send its definitions to an LLM, get the grouped result, and display it in the app.
But that creates a few issues:
Cost: You pay to reprocess the same word every time, across all users.
Scalability: As usage grows, you may hit rate limits or API quotas.
Latency: Real-time grouping adds wait time (especially noticeable on mobile!) and some “reasoning” LLMs now take minutes to respond.
So instead of pushing this work to the client, a better approach is to run a one-time script on your backend. Use it to process all your data - group it, augment it, enrich it - and then store the results directly in your database. Now, your app just needs to fetch and display preprocessed data like it always has. You’ve eliminated the need to call an LLM at runtime, while still getting the benefits of LLM-enhanced content.
This approach also gives you full control over the process. You can fine-tune your agents, iterate slowly, and even parallelize complex workflows. For those of you who took my Advanced Agents workshop where we built a PDF-to-Podcast workflow, imagine applying that to a database of philosophy PDFs for a college course. You could pre-generate podcasts for each syllabus topic, store them once, and let users stream them immediately in the app.
This model is much cheaper in the long run. You pay once, and avoid the unpredictability of usage-based costs. It’s also more reliable - no surprises from LLM downtime or API failures. You can confidently offer flat-rate pricing without worrying about a single power user driving up your OpenAI bill.
Even if your app requires some on-demand generation, there are often “prep” steps you can handle ahead of time. For example, in the PDF-to-Podcast agent, the first step is creating a metadata summary of the PDF. That summary rarely changes and can be stored in your database. When it’s time to generate the podcast, you simply include that context in the LLM prompt - saving time, cost, and compute.
Final Thoughts
By rethinking how and where we use LLMs, we can build smarter, faster, and more scalable apps. Real-time AI is great, but it’s not always necessary. Offloading LLM work to backend scripts allows you to preprocess and cache data intelligently, giving users a seamless experience while keeping your infrastructure predictable and affordable. Whether you’re grouping dictionary definitions, generating educational content, or just prepping context in advance, the key is to plan ahead and let the AI work before the user ever taps a button.