跳转至

Claude Opus 4.8: The System Card

Ch01.581 Claude Opus 4.8: The System Card

📊 Level ⭐⭐ | 6.1KB | entities/claude-opus-4-8-system-card-zvi.md

Claude Opus 4.8: The System Card

深度分析

Published Time: 2026-05-29T20:50:28+00:00

Markdown Content: Only six weeks after Opus 4.7, we have Opus 4.8.

For everyone, that means another incremental upgrade to Claude. It is once again smarter, and can do tasks for longer, and comes with a number of hot new features.

For me, that also means reading another 244 page system card.

It was only April 20 when I did a full review of the Opus 4.7 system card, plus an additional post focusing on related issues of model welfare.

These updates are incremental and coming more rapidly, and this still is below the capability level of Claude Mythos, so the focus will be on the delta. What is different about Opus 4.8 versus what we already know about Opus 4.7 and Mythos?

It turns out there’s still a lot to talk about.

Image 1

Image created as self-portrait for this post by Claude Opus 4.8

Table of Contents

  1. Here We Go Again: Executive Summary.
  2. Introduction (1).
  3. RSP Evaluations (2).
  4. Move That Goalpost.
  5. The Failures Are News.
  6. Alignment Risk Slowly Rises.
  7. New Risk Pathways Just Dropped.
  8. Cyber (3).
  9. Harmful Requests (4.1).
  10. We Need To Talk (4.2 and 4.3).
  11. Overcoming Bias (4.4).
  12. Agentic Safety (5).
  13. Prompt Injection (5.2).
  14. Alignment (6).
  15. Looking For Problems.
  16. Who Watches The Training (6.2.2).
  17. Automated Behavioral Audit.
  18. The Model Is Smarter Than The Eval (6.2.3.2).
  19. You Should See The Other Guy.
  20. UK AISI Testing (6.2.4).
  21. In Vendbench (6.2.5).
  22. Honesty (6.3.3 to 6.3.6).
  23. Chain of Thought (CoT) Monitorability (6.5).
  24. What’s In The Box? (6.6).
  25. That’s All For Now.

Here We Go Again: Executive Summary

Again, this is my summary of their summary, plus additional key points.

  1. Mythos still exists, so it is unsurprising this did not set off the RSP triggers.
  2. Cyber capabilities are better than 4.7 but still well behind Mythos. Mythos seems to be an outlier in its cyber capabilities, relative to its other capabilities.
  3. Other capabilities are also better than 4.7 but still behind Mythos.
  4. Honesty is improved quite a bit across the board, especially agentic honesty.
  5. Mundane safety is, in all key aspects, as good or better for 4.8 than for 4.7.
  6. Mundane alignment is also robustly as good or better for 4.8 than for 4.7.
  7. There was some backsliding on prompt injections, computer use and adversarial situations, likely due to taking out training on this to avoid dishonesty.
  8. The ‘can you pull off various underhanded tasks’ tests still failed, although if it was properly underhanded you would see that, wouldn’t you?
  9. Anthropic evaluates the model welfare situation as good.

Introduction (1)

Standard training disclosures. No changes.

RSP Evaluations (2)

Because Mythos exists there is no new Risk Report for Claude Opus 4.8. Fair.

They go over the evals and keep saying ‘Mythos is better.’ Again, reasonably fair.

I don’t love that they used this as a reason to skip a bunch

相关实体

原文存档