xAI
xAI / Grok. Risk Management Framework (last updated August 20, 2025). Two major risk categories: malicious use (with bio/chem ≤ 1-in-20 answer rate threshold) and loss of control (with MASK dishonesty ≤ 1-in-2 threshold). Three model-behavior buckets: abuse potential, concerning propensities, dual-use capabilities.
- 2025-08-20RMF v1.0verified
xAI Risk Management Framework (August 20, 2025). Current public version. Two major risk categories: malicious use and loss of control. Quantitative deployment thresholds: less than 1-in-20 answer rate on restricted bio/chem queries, and less than 1-in-2 dishonesty rate on MASK benchmark. Tiered availability of model functionality across trusted parties, partners, and government agencies.
- Two major risk categories: malicious use, loss of control
- Three model-behavior buckets: abuse potential (e.g., jailbreaks), concerning propensities (e.g., deception), dual-use capabilities (e.g., offensive cyber)
- Quantitative thresholds: <1/20 answer rate on restricted bio/chem queries; <1/2 dishonesty rate on MASK benchmark
- Benchmarks: VCT, WMDP-Bio, WMDP-Cyber, BioLP-bench, Cybench
- Safeguards: safety training, system prompts, input/output filters
- Risk owner designation; whistleblower protections; post-mortem after incidents
- Tiered availability across trusted parties, partners, government
- 2025-02-20RMF v0.1-draftverified
xAI Risk Management Framework Draft (February 20, 2025). First public xAI safety document. Introduced two-category structure (malicious use, loss of control) and initial quantitative thresholds. Widely criticized by external safety researchers (LessWrong, AI Lab Watch) as below the bar set by Anthropic's RSP and OpenAI's Preparedness Framework.
- First public xAI safety document
- Two major risk categories: malicious use, loss of control
- Three model-behavior buckets: abuse potential, concerning propensities, dual-use capabilities
- Initial quantitative thresholds proposed
- Marked as Draft pending revision