Expensive subscribers,
Right now, I need to share a brand new episode with Hamel Husain.
Hamel has skilled 2,000+ PMs and engineers from firms like OpenAI, Anthropic, and Google on how you can run AI evals. In my new episode, he shares a free grasp class on how you can construct evals for an actual AI agent in simply 50 minutes utilizing a easy spreadsheet. I realized loads from Hamel and I feel you’ll too!
Watch now on YouTube, Apple, and Spotify.
In case you loved this tutorial, Hamel can also be providing $1,330 off his AI evaluations course to readers of this text. Join his final cohort of the yr by 10/6.
Hamel and I talked about:
-
(00:00) What essentially the most invaluable a part of evals is
-
(01:25) Stay walkthrough: Analyzing 100 actual manufacturing traces
-
(09:50) Creating the eval standards utilizing a easy spreadsheet
-
(24:44) Why binary go/fail scores beat 1-5 scores each time
-
(28:52) The settlement metric lure that fools most PMs
-
(30:08) True optimistic and unfavourable charges defined
-
(36:00) Learn how to arrange steady evals in manufacturing
-
Skip generic eval standards to judge particular product issues as an alternative. “Generic evals don’t measure a very powerful issues together with your AI product.” As an alternative of “helpfulness” or “correctness”, create evals for particular product points like “human handoff failure” or “tour scheduling concern.”
-
Comply with Hamel’s 4-step eval course of.
-
Begin by manually labeling 100+ AI conversations (traces):
Paste your guide labels right into a spreadsheet and categorize them with AI -
Use knowledge evaluation to determine and depend the commonest points:

Create a easy pivot desk to depend points by class -
Create LLM judges with binary go/fail labels. Validate judges utilizing true optimistic and true unfavourable charges as an alternative of solely alignment. Extra beneath.
-
Deploy LLM judges to manufacturing and do guide knowledge labeling periodically.
-
You are able to do all the above utilizing a easy spreadsheet. Right here’s a hyperlink to Hamel’s sheet to judge an actual AI agent utilizing the steps above:
-
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits immediately: learn extra, subscribe to our publication, and grow to be a part of the NextTech group at NextTech-news.com
