Flux
Couleur d'accent
Toutes les sources

Microsoft Research

23 articles Flux RSS
IA
New Future of Work: AI is driving rapid change, uneven benefits

New Future of Work: AI is driving rapid change, uneven benefits

For the past five years, the New Future of Work report has captured how work is changing. This year, the shift feels especially sharp. Previous editions have focused on technology’s role in increasing productivity by automating tasks, accelerating communication, and expanding access to information, as well as the rise of remote work. Today, generative AI […] The post New Future of Work: AI is driving rapid change, uneven benefits appeared first on Microsoft Research.

Microsoft Research
ADeLe: Predicting and explaining AI performance across tasks

ADeLe: Predicting and explaining AI performance across tasks

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance. They do not explain failures or reliably predict outcomes on new tasks. To address this, Microsoft researchers in collaboration with Princeton University and Universitat Politècnica de València introduce ADeLe (opens in new tab) (AI […] The post ADeLe: Predicting and explaining AI performance across tasks appeared first on…

Microsoft Research
AsgardBench: A benchmark for visually grounded interactive planning

AsgardBench: A benchmark for visually grounded interactive planning

Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example, when the mug it was tasked to wash is already clean, or the sink is full of other items. This is the domain of embodied AI: systems […] The post AsgardBench: A benchmark for visually grounded interactive planning appeared first on Microsoft Research.

Microsoft Research
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

Vision-language models (VLMs) use images and text to plan robot actions, but they still struggle to decide what actions to take and where to take them. Most systems split these decisions into two steps: a VLM generates a plan in natural language, and a separate model translates it into executable actions. This approach often breaks […] The post GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation appeared first on Microsoft Research.

Microsoft Research
Systematic debugging for AI agents: Introducing the AgentRx framework

Systematic debugging for AI agents: Introducing the AgentRx framework

As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a mistake, we can usually trace the logic. But when an AI agent fails, perhaps by hallucinating a tool output or […] The post Systematic debugging for AI agents: Introducing the AgentRx framework appeared first on Microsoft Research.

Microsoft Research
Esc