π Tutorials: Learn more from our detailed guides β
ReActXen IoT Agent (EMNLP 2025) |
FailureSensorIQ (NeurIPS 2025) |
AssetOpsBench Lab (AAAI 2026) |
Spiral (AAAI 2026) |
AssetOpsBench Technical Material
π Paper | π€ HF-Dataset | π’ IBM Blog | π€ HF Blog | Contributors
- Video Overview: AssetOpsBench - AI Agents for Industrial Asset Operations & Maintenance by Reliability Odyssey.
- Announcements
- Introduction
- Datasets
- AI Agents
- Multi-Agent Frameworks
- Leaderboards
- Talks & Events
- External Resources
- University Projects & Extensions
- Call for Scenario Contribution
- Contributors
-
π Dataset Update: AssetOpsBench expanded to cover wider variety of 9 Asset classes (Chiller, AHU, Pump, Motor, Bearing, Engine, Rotors, Boilers, Turbine, etc.) and various Tasks (Remaining Useful Life, Fault Classification, Rule Monitoring, etc.)
Special Thanks to primary Contributors: π₯ @DeveloperMindset123, @ChathurangiShyalika, @Fabio-Lorenzi1 -
π° AAAI-2026: SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search
-
π― AAAI-2026 Lab: From Inception to Productization: Hands-on Lab for the Lifecycle of Multimodal Agentic AI in Industry 4.0
-
π° AABA4ET/AAAI-2026: Agentic Code Generation for Heuristic Rules in Equipment Monitoring
-
π° IAAI/AAAI-2026: Diversity Meets Relevancy: Multi-Agent Knowledge Probing for Industry 4.0 Applications
-
π° IAAI/AAAI-2026: Deployed AI Agents for Industrial Asset Management: CodeReAct Framework for Event Analysis and Work Order Automation
-
π° AAAI-2026 Demo: AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations
-
π° NeurIPS-2025 Social β Evaluating Agentic Systems
Talk: Building Reliable Agentic Benchmarks: Insights from AssetOpsBench Total Registered Users: 2000+
-
π Past Event: 2025-10-03 β 2-Hour Workshop: AI Agents and Their Role in Industry 4.0 Applications
-
π Accepted Papers: Parts of papers are accepted at NeurIPS 2025, EMNLP 2025 Research Track, and EMNLP 2025 Industry Track.
-
π 2025-09-01: CODS 2025 Competition launched β Access AI Agentic Challenge AssetOpsBench-Live.
-
π¦ 2025-06-01: AssetOpsBench v1.0 released with 141 industrial Scenarios.
β¨ Stay tuned for new tracks, competitions, and community events.
AssetOpsBench is a unified framework for developing, orchestrating, and evaluating domain-specific AI agents in industrial asset operations and maintenance.
It provides:
- 4 domain-specific agents
- 2 multi-agent orchestration frameworks
Designed for maintenance engineers, reliability specialists, and facility planners, it allows reproducible evaluation of multi-step workflows in simulated industrial environments.
AssetOpsBench scenarios span multiple domains:
| Domain | Example Task |
|---|---|
| IoT | "List all sensors of Chiller 6 in MAIN site" |
| FSMR | "Identify failure modes detected by Chiller 6 Supply Temperature" |
| TSFM | "Forecast 'Chiller 9 Condenser Water Flow' for the week of 2020-04-27" |
| WO | "Generate a work order for Chiller 6 anomaly detection" |
Some tasks focus on a single domain, others are multi-step end-to-end workflows.
Explore all scenarios HF-Dataset.
- IoT Agent:
get_sites,get_history,get_assets,get_sensors - FMSR Agent:
get_sensors,get_failure_modes,get_failure_sensor_mapping - TSFM Agent:
forecasting,timeseries_anomaly_detection - WO Agent:
generate_work_order
- MetaAgent: reAct-based single-agent-as-tool orchestration
- AgentHive: plan-and-execute sequential workflow
The src/ directory contains MCP servers and a plan-execute runner built on the Model Context Protocol.
See INSTRUCTIONS.md for setup, usage, and testing.
- Evaluated with 7 Large Language Models
- Trajectories scored using LLM Judge (Llama-4-Maverick-17B)
- 6-dimensional criteria measure reasoning, execution, and data handling
Example: MetaAgent leaderboard
- 2026-05-10: NUS Seminar: AssetOpsBench Applications
- 2025-10-03: 2-Hour Workshop: AI Agents and Their Role in Industry 4.0 Applications (Host: NJIT ACM)
- 2025-09-01: CODS 2025 Competition Launch β AssetOpsBench-Live
- 2025-06-01: AssetOpsBench v1.0 released with 141 industrial scenarios
- 2025: Multiple invited talks and accepted papers at NeurIPS, EMNLP, AAAI (see Announcements above)
- π Paper: AssetOpsBench: Benchmarking AI Agents for Industrial Asset Operations
- π€ HuggingFace: Scenario & Model Hub
- π’ Blog: Insights, Tutorials, and Updates
- π₯ Recorded Talks: Link coming soon.
AssetOpsBench is being extended by university research groups exploring new asset classes, evaluation paradigms, and agentic architectures. To list your project, open a PR.
-
Project Name β short description (such as paper title). Contributor Name, University Β· repo
-
Project Name β Prognostics & Health Management benchmark for MCP agents (rotating equipment, aero-engines, lithium-ion cells). Contributor Name, University of XYZ Β· repo
-
Internalizing MCP Tool Knowledge in Small LLMs via QLoRA Fine-Tuning β HPML project using AssetOpsBench to fine-tune ~4B models to internalize MCP tool knowledge and reduce prompt schema overhead. Ayal Yakobe, Columbia University Β· repo
-
SPIN β Structural LLM Planning via Iterative Navigation for Industrial Tasks. Yusuke Ozaki, University at Albany Β· paper Β· repo
-
Synthetic Scenario Generation for Evaluation of Industry 4.0 Agents β Automated Scenario Generation, Transformer Asset Integration, and Scenario Quality Evaluation. [Rohith Kanathur, Sagar Chethan Kumar](https://github.com/Rohith-Kanathur, https://github.com/Sagar-CK), Columbia University Β· repo
-
AgentOpsBench β High-throughput battery analytics MCP server with DNN prognostics (RUL prediction) and 3.3Γ latency optimization via parallel fetch, batching, and disk caching. Siddharth Gowda, Rushin Bhatt, Aryaman Agrawal, Winston Li, Columbia University Β· repo
-
Skill-Knowledge-Augmented Agents on AssetOpsBench - Confidence-gated skill execution with scoped knowledge plugins for industrial fault diagnosis on AssetOpsBench. [Vera Mazeeva, Sanskruti Shejwal, Shrey Arora, Mana Abbaszadeh](https://github.com/verammaz , https://github.com/Sans-Shej , https://github.com/shreyarora2198, https://github.com/Manazd), Columbia University Β· repo
-
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines. Krish Veera, Alimurtaza Mustafa Merchant, Sajal Kumar Goyla , Shambhawi Bhure, Columbia University Β· repo
We are expanding AssetOpsBench to cover a broader range of industrial challenges. We invite researchers and practitioners to contribute new scenarios, particularly in the following areas:
- Asset Classes: Turbines, HVAC Systems, Pumps, Transformers, CNC Machines, Robotics, Engines, and so on.
- Task Domains: Prognostics and Health Management, Remaining Useful Life (RUL) estimation, or Root Cause Analysis (RCA), Diagnostic Analysis and Predictive Maintenance.
How to contribute:
-
Define your scenario following our Utterance Guideline, Ground Truth Guideline
-
Explore the Hugging Face dataset as examples.
-
Submit a Pull Request or open an Issue with the tag
new-scenario. -
Contact us via email if any question:
- Dhaval Patel (pateldha@us.ibm.com)
- Nianjun Zhou (jzhou@us.ibm.com)
Thanks goes to these wonderful people β¨

















