Seems easy to use and removes boiler-plate code. I am new to evals so I do not know what experienced researchers would look for in such a tool. I am curious to know what others think of it!
There is one unusual behaviour around whether what they call ‘scorer’ should be independent of what they call ‘plan’. I raised an issue about this on GitHub and would be very interested to know what others in AI safety community think of this detail.
AI Safety Institute’s Inspect hello world example for AI evals
Link post
Sharing my detailed walk-through on using the UK AI Safety Institute’s new open source package Inspect for AI evals.
Main points:
Package released in early May 2024 is here: https://github.com/UKGovernmentBEIS/inspect_ai
Seems easy to use and removes boiler-plate code. I am new to evals so I do not know what experienced researchers would look for in such a tool. I am curious to know what others think of it!
There is one unusual behaviour around whether what they call ‘scorer’ should be independent of what they call ‘plan’. I raised an issue about this on GitHub and would be very interested to know what others in AI safety community think of this detail.