LLM Evaluations in Swift: Test, Compare, Swap (Recording Only)
Learn how to evaluate models specifically for your Swift app
Please note that the workshop has already taken place. You can purchase access to the session recording, which also includes exclusive access to the GitHub repositories referenced during the event. Upon purchase, the recording will be delivered to your email within 24 hours.
Building apps with LLMs can be challenging because we, as developers, can't predict exactly what each user will see and must guard against factually-incorrect hallucinations that erode trust. To tackle this, we can set up evaluations - quantitative and qualitative tests that measure “correctness” or other desired outcomes of model outputs. By running those evaluation pipelines across multiple LLMs, we can compare metrics like accuracy, latency, and instruction-following to pick the best model for our app. In this workshop, you'll learn concrete techniques in Swift for building reusable, automated evaluation pipelines so you can quickly test and swap out models as your needs evolve.