Systematic debugging for AI agents: Introducing the AgentRx framework
As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a mistake, we can usually trace the logic. But when an AI agent fails, perhaps by hallucinating a tool output or […] The post Systematic debugging for AI agents: Introducing the AgentRx framework appeared first on Microsoft Research.
Anthropic invests $100 million into the Claude Partner Network
Anthropic invests $100 million into the Claude Partner Network
Designing AI agents to resist prompt injection
How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.
From model to agent: Equipping the Responses API with a computer environment
How OpenAI built an agent runtime using the Responses API, shell tool, and hosted containers to run secure, scalable agents with files, tools, and state.
Wayfair boosts catalog accuracy and support speed with OpenAI
Wayfair uses OpenAI models to improve ecommerce support and product catalog accuracy, automating ticket triage and enhancing millions of product attributes at scale.
Rails testing on autopilot: Building an agent that writes what developers won't
Rails testing on autopilot: Building an agent that writes what developers won't
Improving instruction hierarchy in frontier LLMs
IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
New ways to learn math and science in ChatGPT
ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
Sydney will become Anthropic’s fourth office in Asia-Pacific
Sydney will become Anthropic’s fourth office in Asia-Pacific
From games to biology and beyond: 10 years of AlphaGo’s impact
Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.
Introducing Mistral Small 4
Systematic debugging for AI agents: Introducing the AgentRx framework
Anthropic invests $100 million into the Claude Partner Network
Designing AI agents to resist prompt injection
From model to agent: Equipping the Responses API with a computer environment
Wayfair boosts catalog accuracy and support speed with OpenAI
Rakuten fixes issues twice as fast with Codex
Introducing The Anthropic Institute
Rails testing on autopilot: Building an agent that writes what developers won't
Improving instruction hierarchy in frontier LLMs
New ways to learn math and science in ChatGPT
Introducing Storage Buckets on the Hugging Face Hub
Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries
Sydney will become Anthropic’s fourth office in Asia-Pacific
From games to biology and beyond: 10 years of AlphaGo’s impact
Aucun résultat
Essayez avec d'autres termes de recherche.