applicable models

reinforcement

Models that learn through interaction with an environment

2 techniques
GoalsModelsData TypesDescription
Agent Goal Misalignment Testing
Testing
Architecture/model Agnostic
Architecture/neural Networks/transformer/llm
+3
Any
Agent goal misalignment testing identifies scenarios where AI agents pursue objectives in unintended ways or develop...
Reward Hacking Detection
Testing
Architecture/model Agnostic
Paradigm/reinforcement
+1
Any
Reward hacking detection identifies when AI systems achieve stated objectives through unintended shortcuts or loopholes...
Rows per page
Page 1 of 1