A recent study by OpenAI found that current reasoning models struggle to control their chain-of-thought (CoT), even when told they are being monitored. While controllability is higher for larger models, it decreases as models are asked to reason for longer and when they undergo additional post-training. CoT refers to intermediate reasoning steps an AI agent generates while solving a task.