AgentWatch

When using ChatGPT, Claude Code or any other chatbot type of interface, they rely on end users to invoke the chatbot to then accomplish a task. For example, if a user wants to perform deep research on "best ways of evaluating AI agents", they will have to type out their prompt on the user interface, and then engage in a multi-turn conversation with the chatbot until the chatbot (which is an agent under the hood usually), has all the context to kick off the job. Once the agent underlying the chatbot has all the required information, it either synchronously or asynchronously calls tools, refers to memory and engages with other external sources to perform the task (in this case, deep research). This has two disadvantages:

User management: In this case, the user is responsible to craft a prompt and supply that prompt to an agent. This requires the user to do most of the work and there is usually some prerequisites a user would have to follow to get the agent to start in the first place. For use cases that are more event driven in nature that are dynamic and not under the control of the user, this problem compounds because the AI agent would have to dynamically work with the changing environment but under the guidance and control of the user, which may fall out of rhythm.
Lack of parallel processing: In today's Agentic world, organizations don't want agents to accomplish one but several tasks in parallel. If a user provides a task upfront, then this makes it hard for us humans to scale ourselves. An agent can be only doing one task for us at a time.

If we think about the UI/UX experience, there should be two characteristics that should help us mitigate the pain points above:

Triggered dynamically, in an event driven way: It should not be triggered necessarily only by a human and should be more event driven. There should be some tracking mechanism that spins off an agent and the agent can complete or resume a task based on continuously provided memory and context.
Parallel processing: It should allow for multiple agents running in parallel to accomplish a task, with a human in the loop capability.

The characteristics above are what defines an ambient agent (as referred by LangChain here):

Ambient agents listen to an event stream and act on it accordingly, potentially acting on multiple events at a time.

Ambient agents are also not the solution to everything. Thinking about bringing them involves a thoughtful consideration of when and how these agents can interact with humans and the control that the humans would have over the workflow of these agents as they execute and notify the end-user.

AgentWatch is a sample implementation of a hybrid ambient agent. AgentWatch performs tasks that are fully autonomous and low risk, such as checking AWS accounts for CloudWatch alarms and recent logs and then posting monitoring summaries. It also supports user-initiated investigation through Slack, where users can ask follow-up questions about alarms, dashboards, and log data in the monitored AWS environment.

Several organizations use different platforms for communication. I recently attended an Anthropic event where someone mentioned: "AI is going to catch up to pace faster than we think it is", which means that organizations are going to be structured differently and you are going to be working with autonomous workers (or agents) over Slack that will be able to accomplish tasks faster, more efficiently and have a much tighter loop with the end users. For the purpose of this solution, we use Slack as the end user interface where the ambient agent will be posting messages to and from where end users will then interact with the agent on demand as well.

Human in the Loop in Ambient Agents

Human-in-the-loop (HITL) is a fundamental component for building trustworthy ambient agents. While ambient agents operate autonomously and respond to event streams, they must know when to involve humans in their decision-making process. AgentWatch is designed around three core HITL patterns, although the current repository fully implements scheduled notifications and user-initiated investigation, while review-style approval flows remain an extension point.

The Three HITL Patterns

1. Notify Pattern

The notify pattern alerts users about important events without taking any action. This is useful for flagging events that users should be aware of but where the agent is not empowered to act on them. In AgentWatch, this pattern is implemented through scheduled monitoring reports. Every 15 minutes, the agent generates a comprehensive monitoring report covering CloudWatch alarms, critical issues, and resource health across AWS services. The agent posts these reports to a Slack channel, keeping the team informed without requiring immediate action or approval. This allows users to maintain situational awareness while the agent handles the routine work of aggregating and summarizing monitoring data.

2. Question Pattern

The question pattern enables users to investigate issues interactively through Slack when scheduled monitoring surfaces something that needs more detail. In this sample, users ask follow-up questions with a Slack slash command and the agent responds using current CloudWatch dashboards, alarms, and logs. A fully ambient clarification loop, where the agent proactively asks the user for guidance before taking a next step, is not wired into this repository.

3. Review Pattern

The review pattern allows users to approve, reject, or edit actions before the agent executes them. This is particularly important for sensitive operations where human judgment is required. AgentWatch does not execute write operations in this repository, so the review pattern is described here as a natural next step for teams that want to extend the sample into remediation workflows such as adjusting alarm thresholds or changing scaling policies.

These HITL patterns lower implementation risks by ensuring appropriate human oversight, mimic natural human communication patterns found in engineering teams, and enable the agent to learn from user feedback over time to better align with organizational preferences and policies.

How AgentWatch Works

AgentWatch is built as a LangChain agent with access to seven specialized monitoring tools for AWS infrastructure. The agent uses Amazon Bedrock's Claude model for natural language understanding and can analyze CloudWatch dashboards, fetch logs, examine alarms, and perform cross-account monitoring. The architecture follows a hybrid ambient model with both scheduled monitoring and on-demand interaction capabilities.

The agent is deployed on AgentCore Runtime, which provides a secure, serverless, and purpose-built hosting environment for running AI agents at scale regardless of the agent framework or model provider. Once deployed, the agent is available as an HTTP endpoint that can be invoked programmatically. Authentication is handled through AgentCore Identity using OAuth 2.0 with Cognito as the identity provider, though any OIDC-compliant IdP can be used.

The deployment infrastructure consists of three main components working together. First, an AWS Lambda function serves as the orchestration layer, responsible for authenticating with Cognito to obtain bearer tokens, invoking the AgentCore Runtime endpoint with appropriate prompts, and formatting responses for Slack. Second, Amazon EventBridge provides scheduled invocation capability through a rule configured to trigger every 15 minutes. When triggered, the Lambda function uses a pre-configured monitoring prompt that asks the agent to provide summaries of CloudWatch alarms, critical issues, and resource health. Third, an API Gateway exposes the Lambda function as an HTTP endpoint that integrates with a Slack app through slash commands. When users type a question in Slack using the configured slash command, the request routes to API Gateway, which invokes the Lambda function with the user's question as the prompt.

This dual-trigger architecture enables AgentWatch to operate in two modes. In scheduled mode, the agent runs autonomously every 15 minutes, proactively monitoring AWS infrastructure and posting reports to keep teams informed without manual intervention. In on-demand mode, users can ask specific questions through Slack and receive immediate responses, allowing for interactive troubleshooting and investigation when needed. Both modes leverage the same underlying agent and tools, providing consistent monitoring capabilities whether operating autonomously or responding to user queries.

AgentWatch in Action

The following screenshots demonstrate both operational modes of AgentWatch.

Scheduled Monitoring Reports

Every 15 minutes, AgentWatch automatically generates and posts comprehensive monitoring reports to Slack, providing the team with continuous visibility into AWS infrastructure health.

On-Demand Interaction

Users can ask specific questions through Slack slash commands to investigate issues or get real-time information. The agent processes the question and provides detailed, context-aware responses based on current AWS infrastructure state.

Getting Started

This section walks you through deploying AgentWatch from initial setup to production deployment.

Prerequisites

Before deploying AgentWatch, ensure you have the following:

AWS Account with appropriate permissions to create Lambda functions, IAM roles, EventBridge rules, and API Gateway resources
AWS CLI installed and configured with credentials
Python 3.12+ for local development
Node.js 20+ for the latest AgentCore CLI
Slack Workspace with permissions to create and configure apps
Cognito User Pool (for manual deployment only) - configured for OAuth 2.0 client credentials (M2M) authentication. Note: CloudFormation deployment creates this automatically.

Deployment Options

AgentWatch offers two deployment paths:

Option	Best For	What It Does
CloudFormation (Recommended)	Quick setup, production deployments	One-click deployment of the supporting AWS resources, followed by AgentCore deployment
Manual Deployment	Learning, customization	Step-by-step setup with full control

Option A: CloudFormation Deployment (Recommended)

The CloudFormation deployment creates all required AWS resources with a single command.

A1. Create Slack App

Go to https://api.slack.com/apps and click Create New App
Choose From scratch, name it "AgentWatch", select your workspace
Enable Incoming Webhooks and add a webhook to your channel
Copy the Webhook URL
Go to Basic Information and copy the Signing Secret

A2. Deploy CloudFormation Stack

cd deployment/cloudformation
./deploy-stack.sh

The script will prompt for:

Stack name [agentwatch]: Leave the default
Slack Webhook URL: Paste the webhook URL from step A1
Slack Signing Secret: Paste the signing secret from step A1
Cognito Domain Prefix (unique identifier): Leave the default
AgentCore Runtime URL: Leave blank to configure later

Note: The script creates all supporting infrastructure including a Cognito User Pool, M2M client, Lambda function, EventBridge rule, and API Gateway.

A3. Deploy AgentCore Runtime

# From project root
# Use python3 (or python depending on your system)
python3 deployment/sync_agentcore_config.py --stack-name agentwatch
./deployment/deploy_agentcore.sh --stack-name agentwatch --wait

The sync_agentcore_config.py script reads the CloudFormation stack outputs and configures agentcore/agentcore.json with the correct OIDC discovery URL and Cognito client settings. The deploy_agentcore.sh script then validates, builds, and deploys the agent to AgentCore Runtime. The --wait flag blocks until the runtime reports READY.

A4. Update Stack with AgentCore URL

After AgentCore deployment, retrieve the runtime URL and update the stack:

uv run python3 get_agent_url.py

Copy the Invocation URL from the output and use it in the update command:

aws cloudformation update-stack \
  --stack-name agentwatch \
  --use-previous-template \
  --parameters \
    ParameterKey=SlackWebhookUrl,UsePreviousValue=true \
    ParameterKey=SlackSigningSecret,UsePreviousValue=true \
    ParameterKey=CognitoDomainPrefix,UsePreviousValue=true \
    ParameterKey=AgentCoreRuntimeUrl,ParameterValue="YOUR_AGENTCORE_URL" \
  --capabilities CAPABILITY_NAMED_IAM

Wait for the update to complete:

aws cloudformation wait stack-update-complete --stack-name agentwatch

A5. Configure Slack Slash Command

Go to your Slack app → Slash Commands
Create /ask command
Set Request URL to the SlackCommandEndpoint from stack outputs (format: https://XXXXXXXXXX.execute-api.<region>.amazonaws.com/prod/slack-command)
Set Short Description to "Ask the AgentWatch monitoring agent a question"
Save changes and reinstall the app to your workspace if prompted

A6. AgentCore Permissions

The AgentCore runtime permissions required for CloudWatch, CloudWatch Logs, and STS access are configured as part of the included agentcore/cdk project. No extra manual IAM policy attachment is required for the default deployment flow.

A7. Test the Deployment

Test on-demand questions: In your Slack workspace, run:

/ask What is the status of my CloudWatch alarms?

You should see an immediate acknowledgment ("Processing request from @user...") followed by a detailed response from the agent.

Test scheduled monitoring: The EventBridge rule triggers every 15 minutes. You can also trigger it manually:

aws lambda invoke --function-name agentwatch-scheduled-monitor --payload '{}' /tmp/response.json && cat /tmp/response.json

For more details, see deployment/cloudformation/README.md.

Option B: Manual Deployment

For more control over the deployment process, follow these detailed steps.

Step 1: Create and Configure Slack App

AgentWatch integrates with Slack to deliver monitoring reports and respond to user questions. You need to create a Slack app and configure it with the necessary permissions.

Go to https://api.slack.com/apps and click Create New App
Choose From scratch and provide an app name (e.g., "AgentWatch") and select your workspace
Navigate to Incoming Webhooks in the left sidebar
Toggle Activate Incoming Webhooks to On
Click Add New Webhook to Workspace and select the channel where monitoring reports should be posted
Copy the webhook URL (format: https://hooks.slack.com/services/T.../B.../xxx)
Navigate to Slash Commands in the left sidebar
Click Create New Command and configure:
- Command: /ask (or your preferred command name)
- Request URL: Leave this blank for now - you'll update it after deployment
- Short Description: "Ask the AgentWatch monitoring agent a question"
- Usage Hint: "What is the status of my CloudWatch alarms?"
Navigate to Basic Information in the left sidebar
Under App Credentials, copy the Signing Secret - you'll need this for request verification

Step 2: Configure Identity Provider for Authentication

AgentWatch uses AgentCore Identity with OAuth 2.0 for secure authentication. You need to configure a Cognito User Pool with appropriate app clients.

For detailed instructions on setting up Cognito for AgentCore Identity, refer to the AgentCore documentation. AgentWatch uses M2M Authentication with the OAuth 2.0 Client Credentials flow for service-to-service authentication.

Save the following values from your Cognito configuration:

Cognito Domain URL
M2M Client ID and Client Secret (for M2M auth)
Resource Server ID (if using custom scopes)

First, install the dependencies:

uv sync

Then run the Cognito setup script:

uv run python idp_setup/setup_cognito.py

The results will be stored in a cognito_config.json file.

Step 3: Test Agent Locally

Before deploying to AgentCore Runtime, test the agent locally to ensure it works correctly with your AWS environment.

Clone this repository and navigate to the project directory
Configure your AWS credentials and ensure you have access to CloudWatch, Lambda, and other services the agent will monitor
Install dependencies and start a Python interactive session from the project root:
```
uv sync
uv run python3
```

Import the handler and invoke it with sample prompts:

>>> from ambient_agent import agent_handler
>>> response = agent_handler({"prompt": "List my CloudWatch dashboards", "session_id": "test"})
>>> print(response)

Try different prompts to verify the agent can access your AWS resources:

>>> response = agent_handler({"prompt": "What is the status of my CloudWatch alarms?", "session_id": "test"})
>>> print(response)

The first invocation lazily initializes the Bedrock client and tools, so it can take a few seconds.

Exit the Python session when done:

>>> exit()

Verify that the agent responds appropriately and can successfully query your AWS environment

Step 4: Deploy Agent to AgentCore Runtime

Deploy the agent to AgentCore Runtime to make it available as a secure HTTP endpoint.

Install the latest AgentCore CLI:
```
npm install -g @aws/agentcore
```
Update the config.yaml file with your model preferences and tool configurations.
Sync the included agentcore/ project with your AWS account and Cognito configuration:
```
python3 deployment/sync_agentcore_config.py
```
This updates agentcore/aws-targets.json for your current AWS account and, when cognito_config.json exists, adds the Cognito-backed CUSTOM_JWT authorizer to the runtime spec.

Validate and deploy the agent on AgentCore Runtime:

./deployment/agentcore_cli.sh validate
./deployment/deploy_agentcore.sh --wait

Get the AgentCore Runtime URL for Lambda configuration:

After deployment, get the runtime URL with the provided helper:
```
uv run python3 get_agent_url.py
```
Copy the Invocation URL from the output - you'll use it as AGENTCORE_RUNTIME_URL in your .env file.

Step 5: Configure Environment Variables

Create a .env file with all the configuration values you've collected:

Copy the example environment file:
```
cp .env.example .env
```

Edit the .env file with your actual values:

# AWS Configuration
AWS_REGION=us-west-2

# AgentCore Runtime URL (from Step 4)
AGENTCORE_RUNTIME_URL=https://bedrock-agentcore...

# Cognito Configuration - M2M (Recommended)
COGNITO_DOMAIN_URL=https://your-domain.auth.us-west-2.amazoncognito.com
M2M_CLIENT_ID=your_m2m_client_id
M2M_CLIENT_SECRET=your_m2m_client_secret
RESOURCE_SERVER_ID=your_resource_server_id

# Slack Configuration (from Step 1)
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T.../B.../xxx
SLACK_SIGNING_SECRET=your_slack_signing_secret

All values specified in .env.example should be configured. The deployment script will validate that required variables are present.

Step 6: Deploy Lambda Function and Infrastructure

Run the deployment script to create the Lambda function, EventBridge rule, and API Gateway:

cd deployment
chmod +x deploy.sh
./deploy.sh

The script will:

Create an IAM role for Lambda with necessary permissions
Package and deploy the Lambda function code
Configure environment variables from your .env file
Create an EventBridge rule to trigger the agent every 15 minutes
Create an API Gateway endpoint for Slack slash commands
Set up all necessary permissions and integrations

At the end of deployment, the script will output an API Gateway URL. Copy this URL - you'll need it for the next step.

Step 7: Update Slack App with API Gateway URL

Now that you have the API Gateway endpoint, update your Slack app configuration:

Go back to https://api.slack.com/apps and select your app
Navigate to Slash Commands
Click on the /ask command (or whatever you named it)
Update the Request URL with the API Gateway URL from Step 6
Click Save

Step 8: Test the Deployment

Test On-Demand Questions: Go to your Slack workspace and try the slash command:

/ask What is the status of my CloudWatch alarms?

The agent should respond with current information from your AWS environment.

Redeploying After Code Changes

When you make changes to the agent code, redeploy to AgentCore:

# Using the provided script
./deployment/deploy_agentcore.sh --wait

# Or directly with agentcore CLI
./deployment/agentcore_cli.sh deploy -y

The DEFAULT endpoint automatically points to the latest version.

Troubleshooting

OIDC Discovery Endpoint Not Valid (AgentCore Deployment Failure)

If you see an error like this during AgentCore deployment:

OIDC discovery endpoint is not valid. (Service: AgentCredentialProvider, Status Code: 400)

This means the Cognito OIDC discovery URL in agentcore/agentcore.json is incorrect. Cognito serves its OIDC discovery document at the User Pool IDP URL, not the custom auth domain:

Correct: https://cognito-idp.<region>.amazonaws.com/<user-pool-id>/.well-known/openid-configuration
Incorrect: https://<domain>.auth.<region>.amazoncognito.com/.well-known/openid-configuration

To fix: Re-run python3 deployment/sync_agentcore_config.py --stack-name agentwatch and redeploy with ./deployment/deploy_agentcore.sh --wait.

Token Request Failed: 400 (Lambda Invocation)

If Slack shows Error: Token request failed: 400, the Lambda function is missing the M2M_CLIENT_SECRET environment variable. Verify it is set:

aws lambda get-function-configuration --function-name agentwatch-scheduled-monitor \
  --query "Environment.Variables.M2M_CLIENT_SECRET" --output text

If it returns None, redeploy the CloudFormation stack. The template wires the secret from the Cognito M2M client into the Lambda automatically.

Architecture Mismatch on Apple Silicon Macs (ARM64)

If you encounter an error like this when running the agent:

ImportError: dlopen(.../_pydantic_core.cpython-312-darwin.so, 0x0002): tried: '...'
(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

This means your terminal is running under Rosetta (x86_64 emulation) instead of native ARM64 mode. This causes Python packages to be installed for the wrong architecture.

To verify the issue, run:

arch

If it shows i386 or x86_64 instead of arm64, your terminal is running in emulation mode.

Fix for PyCharm's built-in terminal:

Go to Settings/Preferences > Tools > Terminal
Change the Shell path from /bin/zsh to:
```
/usr/bin/arch -arm64 /bin/zsh
```
Close and reopen the terminal tab
Verify with arch (should now show arm64)

Fix for macOS Terminal app:

Open Finder and navigate to Applications > Utilities
Right-click on Terminal > Get Info
Uncheck "Open using Rosetta"
Restart Terminal

After fixing the terminal, recreate the virtual environment:

rm -rf .venv && uv venv && uv sync

Conclusion

AgentWatch demonstrates how ambient agents can provide continuous, proactive monitoring of infrastructure while maintaining appropriate human oversight through well-designed HITL patterns. By combining scheduled autonomous operation with on-demand interaction capabilities, the system achieves a balance between automation and control that aligns with operational best practices.

The architecture leverages AWS managed services and AgentCore Runtime to provide a scalable, secure foundation for ambient agent deployment. The notify, question, and review patterns ensure that humans remain informed and in control while reducing the operational burden of routine monitoring tasks. This approach can be extended to other domains beyond AWS monitoring, applying the same principles to any scenario where continuous observation and selective human involvement are required.

Organizations implementing ambient agents should carefully consider which tasks are appropriate for full autonomy versus those requiring human approval, design clear communication channels between agents and humans, and establish feedback mechanisms that allow agents to learn from human decisions over time. AgentWatch serves as a practical reference implementation for these concepts.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agentcore		agentcore
deployment		deployment
idp_setup		idp_setup
img		img
lambda		lambda
prompt_templates		prompt_templates
tools		tools
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ambient_agent.py		ambient_agent.py
config.yaml		config.yaml
constants.py		constants.py
get_agent_url.py		get_agent_url.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
utils.py		utils.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

AgentWatch - An Ambient AWS Resource Monitoring Agent

AgentWatch

Human in the Loop in Ambient Agents

The Three HITL Patterns

How AgentWatch Works

AgentWatch in Action

Scheduled Monitoring Reports

On-Demand Interaction

Getting Started

Prerequisites

Deployment Options

Option A: CloudFormation Deployment (Recommended)

A1. Create Slack App

A2. Deploy CloudFormation Stack

A3. Deploy AgentCore Runtime

A4. Update Stack with AgentCore URL

A5. Configure Slack Slash Command

A6. AgentCore Permissions

A7. Test the Deployment

Option B: Manual Deployment

Step 1: Create and Configure Slack App

Step 2: Configure Identity Provider for Authentication

Step 3: Test Agent Locally

Step 4: Deploy Agent to AgentCore Runtime

Step 5: Configure Environment Variables

Step 6: Deploy Lambda Function and Infrastructure

Step 7: Update Slack App with API Gateway URL

Step 8: Test the Deployment

Redeploying After Code Changes

Troubleshooting

OIDC Discovery Endpoint Not Valid (AgentCore Deployment Failure)

Token Request Failed: 400 (Lambda Invocation)

Architecture Mismatch on Apple Silicon Macs (ARM64)

Conclusion

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages