AI Agents for Industrial Asset Operations & Maintenance

📘 Tutorials: Learn more from our detailed guides —
ReActXen IoT Agent (EMNLP 2025) | FailureSensorIQ (NeurIPS 2025) | AssetOpsBench Lab (AAAI 2026) | Spiral (AAAI 2026) | AssetOpsBench Technical Material

📄 Paper | 🤗 HF-Dataset | 📢 IBM Blog | 🤗 HF Blog | Contributors

Resources

Video Overview: AssetOpsBench - AI Agents for Industrial Asset Operations & Maintenance by Reliability Odyssey.

Announcements

📊 Dataset Update: AssetOpsBench expanded to cover wider variety of 9 Asset classes (Chiller, AHU, Pump, Motor, Bearing, Engine, Rotors, Boilers, Turbine, etc.) and various Tasks (Remaining Useful Life, Fault Classification, Rule Monitoring, etc.)

Special Thanks to primary Contributors: 👥 @DeveloperMindset123, @ChathurangiShyalika, @Fabio-Lorenzi1
📰 AAAI-2026: SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search
🎯 AAAI-2026 Lab: From Inception to Productization: Hands-on Lab for the Lifecycle of Multimodal Agentic AI in Industry 4.0
📰 AABA4ET/AAAI-2026: Agentic Code Generation for Heuristic Rules in Equipment Monitoring
📰 IAAI/AAAI-2026: Diversity Meets Relevancy: Multi-Agent Knowledge Probing for Industry 4.0 Applications
📰 IAAI/AAAI-2026: Deployed AI Agents for Industrial Asset Management: CodeReAct Framework for Event Analysis and Work Order Automation
📰 AAAI-2026 Demo: AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations
📰 NeurIPS-2025 Social — Evaluating Agentic Systems
Talk: Building Reliable Agentic Benchmarks: Insights from AssetOpsBench Total Registered Users: 2000+
🕓 Past Event: 2025-10-03 – 2-Hour Workshop: AI Agents and Their Role in Industry 4.0 Applications
🏆 Accepted Papers: Parts of papers are accepted at NeurIPS 2025, EMNLP 2025 Research Track, and EMNLP 2025 Industry Track.
🚀 2025-09-01: CODS 2025 Competition launched – Access AI Agentic Challenge AssetOpsBench-Live.
📦 2025-06-01: AssetOpsBench v1.0 released with 141 industrial Scenarios.

✨ Stay tuned for new tracks, competitions, and community events.

Introduction

AssetOpsBench is a unified framework for developing, orchestrating, and evaluating domain-specific AI agents in industrial asset operations and maintenance.

It provides:

4 domain-specific agents
2 multi-agent orchestration frameworks

Designed for maintenance engineers, reliability specialists, and facility planners, it allows reproducible evaluation of multi-step workflows in simulated industrial environments.

Datasets

AssetOpsBench scenarios span multiple domains:

Domain	Example Task
IoT	"List all sensors of Chiller 6 in MAIN site"
FSMR	"Identify failure modes detected by Chiller 6 Supply Temperature"
TSFM	"Forecast 'Chiller 9 Condenser Water Flow' for the week of 2020-04-27"
WO	"Generate a work order for Chiller 6 anomaly detection"

Some tasks focus on a single domain, others are multi-step end-to-end workflows.
Explore all scenarios HF-Dataset.

AI Agents

Domain-Specific Agents (Important tools)

IoT Agent: get_sites, get_history, get_assets, get_sensors
FMSR Agent: get_sensors, get_failure_modes, get_failure_sensor_mapping
TSFM Agent: forecasting, timeseries_anomaly_detection
WO Agent: generate_work_order

Multi-Agent Frameworks

MetaAgent: reAct-based single-agent-as-tool orchestration
AgentHive: plan-and-execute sequential workflow

MCP Environment

The src/ directory contains MCP servers and a plan-execute runner built on the Model Context Protocol. See INSTRUCTIONS.md for setup, usage, and testing.

Leaderboards

Evaluated with 7 Large Language Models
Trajectories scored using LLM Judge (Llama-4-Maverick-17B)
6-dimensional criteria measure reasoning, execution, and data handling

Example: MetaAgent leaderboard

Talks & Events

2026-05-10: NUS Seminar: AssetOpsBench Applications
2025-10-03: 2-Hour Workshop: AI Agents and Their Role in Industry 4.0 Applications (Host: NJIT ACM)
2025-09-01: CODS 2025 Competition Launch – AssetOpsBench-Live
2025-06-01: AssetOpsBench v1.0 released with 141 industrial scenarios
2025: Multiple invited talks and accepted papers at NeurIPS, EMNLP, AAAI (see Announcements above)

External Resources

📄 Paper: AssetOpsBench: Benchmarking AI Agents for Industrial Asset Operations
🤗 HuggingFace: Scenario & Model Hub
📢 Blog: Insights, Tutorials, and Updates
🎥 Recorded Talks: Link coming soon.

University Projects & Extensions

AssetOpsBench is being extended by university research groups exploring new asset classes, evaluation paradigms, and agentic architectures. To list your project, open a PR.

Project Name — short description (such as paper title). Contributor Name, University · repo
Project Name — Prognostics & Health Management benchmark for MCP agents (rotating equipment, aero-engines, lithium-ion cells). Contributor Name, University of XYZ · repo
Internalizing MCP Tool Knowledge in Small LLMs via QLoRA Fine-Tuning — HPML project using AssetOpsBench to fine-tune ~4B models to internalize MCP tool knowledge and reduce prompt schema overhead. Ayal Yakobe, Columbia University · repo
SPIN — Structural LLM Planning via Iterative Navigation for Industrial Tasks. Yusuke Ozaki, University at Albany · paper · repo
Synthetic Scenario Generation for Evaluation of Industry 4.0 Agents — Automated Scenario Generation, Transformer Asset Integration, and Scenario Quality Evaluation. [Rohith Kanathur, Sagar Chethan Kumar](https://github.com/Rohith-Kanathur, https://github.com/Sagar-CK), Columbia University · repo
AgentOpsBench — High-throughput battery analytics MCP server with DNN prognostics (RUL prediction) and 3.3× latency optimization via parallel fetch, batching, and disk caching. Siddharth Gowda, Rushin Bhatt, Aryaman Agrawal, Winston Li, Columbia University · repo
Skill-Knowledge-Augmented Agents on AssetOpsBench - Confidence-gated skill execution with scoped knowledge plugins for industrial fault diagnosis on AssetOpsBench. [Vera Mazeeva, Sanskruti Shejwal, Shrey Arora, Mana Abbaszadeh](https://github.com/verammaz , https://github.com/Sans-Shej , https://github.com/shreyarora2198, https://github.com/Manazd), Columbia University · repo
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines. Krish Veera, Alimurtaza Mustafa Merchant, Sajal Kumar Goyla , Shambhawi Bhure, Columbia University · repo

Call for Scenario Contribution

We are expanding AssetOpsBench to cover a broader range of industrial challenges. We invite researchers and practitioners to contribute new scenarios, particularly in the following areas:

Asset Classes: Turbines, HVAC Systems, Pumps, Transformers, CNC Machines, Robotics, Engines, and so on.
Task Domains: Prognostics and Health Management, Remaining Useful Life (RUL) estimation, or Root Cause Analysis (RCA), Diagnostic Analysis and Predictive Maintenance.

How to contribute:

Define your scenario following our Utterance Guideline, Ground Truth Guideline
Explore the Hugging Face dataset as examples.
Submit a Pull Request or open an Issue with the tag new-scenario.
Contact us via email if any question:
- Dhaval Patel (pateldha@us.ibm.com)
- Nianjun Zhou (jzhou@us.ibm.com)

Contributors

Thanks goes to these wonderful people ✨

_ShuxinLin 💻	_DhavalRepo18 💻	_{ChathurangiShyalika} 💻	_Dev-Scodes5 💻	_{DeveloperMindset123} 💻	_LGDiMaggio 💻	_{PUSHPAK-JAISWAL} 💻
_bradleyjeck 💻	_florenzi002 💻	_jack-pfeifer 💻	_jdsheehan 💻	_jtrayfield 💻	_kushwaha001 💻	_nianjunz 💻
_{sandeepkunkunuru} 💻	_srutanik 💻	_thedgarg31 💻