Let's build a PDF Query App using AI Agents Parallelization Workflow in Swift
In this app, the user uploads a PDF document and asks a questions about the document. Several AI Agents are deployed in parallel to find the responses in PDF Chunks.
Google’s latest Gemini models have one million token context windows and can answer questions about large PDFs directly. However, as the context window becomes bigger and bigger, the LLMs will struggle at getting the right (read non-hallucinated) answer, especially on content in the middle of the PDF. Outside of Google Gemini, if you’d like to use other models, they have a small 128K token context window and cannot work with bigger PDFs.
In addition to the issue of LLMs getting more confused as the context window gets bigger - we’ve all experienced this while having long conversations with ChatGPT, for example - the other issue is latency. It’ll take the LLM a much longer time to process a full PDF than smaller chunks of a PDF, which is not great as a user interface.
So instead, our strategy is to chunk the PDF into smaller parts and deploy several agents in parallel to gather the relevant information from the user query in a PDFQuery app:
Note that this Agents workflow can be used for many other types of apps, which I’ll go over at the end of this blog post. So let’s get started 🚀