“A lot of expectations and optimism people have for these systems were anchored to these medical exam test benchmarks,” says Raji, who studies AI auditing and evaluation at the University of ...