Value Alignment Evaluation

One-way AI alignment no longer works in generative AI world: Here's why

The authors argue that generative AI introduces a new class of alignment risks because interaction itself becomes a mechanism of influence. Humans adapt their behavior in response to AI outputs, ...

Investing

Anthropic and OpenAI release joint model alignment evaluation findings

Investing.com -- Anthropic and OpenAI have published results from their first joint alignment evaluation exercise, revealing strengths and weaknesses in both companies’ AI models when tested in ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

One-way AI alignment no longer works in generative AI world: Here's why

Anthropic and OpenAI release joint model alignment evaluation findings

Trending now