applicable models

reinforcement

Models that learn through interaction with an environment

2 techniques

	Goals		Models	Data Types	Description
Agent Goal Misalignment Testing		Testing	Architecture/model Agnostic Architecture/neural Networks/transformer +3	Any	Agent goal misalignment testing identifies scenarios where AI agents pursue objectives in unintended ways or develop...
Reward Hacking Detection		Testing	Architecture/model Agnostic Paradigm/reinforcement +1	Any	Reward hacking detection identifies when AI systems achieve stated objectives through unintended shortcuts or loopholes...

Rows per page

Page 1 of 1