Skip to content

Add Cloud Governance AI Agent (MCP-based OpenSearch query interface)#997

Open
pragya811 wants to merge 4 commits into
mainfrom
cg-mcp
Open

Add Cloud Governance AI Agent (MCP-based OpenSearch query interface)#997
pragya811 wants to merge 4 commits into
mainfrom
cg-mcp

Conversation

@pragya811

@pragya811 pragya811 commented May 12, 2026

Copy link
Copy Markdown
Member

Type of change

Note: Fill x in []

  • bug
  • enhancement
  • documentation
  • dependencies

Summary

  • Adds a Streamlit-based AI agent that provides a natural language interface to query cloud governance data in OpenSearch
  • Uses a custom MCP (Model Context Protocol) server with high-level query tools that abstract away raw OpenSearch Query DSL
  • AI (Google Gemini) calls simple tools like search_documents(filters=[...]) and count_by_field(group_by="policy") instead of constructing complex JSON queries
  • The MCP server auto-handles .keyword field resolution, type coercion, and case-insensitive field matching
  • Includes a Dockerfile for containerized deployment and a sidebar index selector dropdown

Architecture

Streamlit UI → Gemini AI → MCP stdio → mcp_server.py subprocess → OpenSearch

Tools Available

Tool Description
list_indices List all OpenSearch indices with doc counts
get_fields Discover field names, types, and aggregatability
search_documents Filtered search with auto .keyword handling
count_by_field Group-by aggregation (terms)
aggregate Metric aggregation (sum/avg/max/min) grouped by field
date_range_search Time-range search with text/date field support
raw_search Escape hatch for raw Query DSL

Setup

cd cloud-governance-mcp
cp .env.example .env  # configure GEMINI_API_KEY, OPENSEARCH_HOSTS
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

Test plan

- Configure .env with valid OpenSearch host and Gemini API key
- Run streamlit run app.py and verify 7 tools load
- Test: "What fields are available in this index?"
- Test: "Show me 5 sample documents"
- Test: "Count documents by [field]"
- Test aggregations with filters
- Switch index via sidebar dropdown and verify session resets

## For security reasons, all pull requests need to be approved first before running any automated CI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Cloud Governance AI Agent with conversational chat interface
    • Conversation history persistence and export capability
    • OpenSearch integration with query tools (search, filtering, aggregations, date ranges)
    • Docker containerization for streamlined deployment
  • Documentation

    • Comprehensive setup and usage guide with troubleshooting steps
  • Chores

    • Environment configuration templates and dependency specifications

Walkthrough

This pull request introduces a complete Cloud Governance AI agent feature. It adds a Streamlit application that discovers OpenSearch indices, runs a Gemini-powered agent loop with tool-calling enabled, and uses a custom MCP server to expose OpenSearch query tools. The PR includes full deployment infrastructure (Docker, shell bootstrap), dependency manifest, configuration templates, comprehensive documentation, and all application code.

Changes

Cloud Governance MCP AI Agent

Layer / File(s) Summary
Infrastructure & Deployment Setup
cloud-governance-mcp/.dockerignore, cloud-governance-mcp/.gitignore, cloud-governance-mcp/requirements.txt, cloud-governance-mcp/.env.example, cloud-governance-mcp/Dockerfile, cloud-governance-mcp/run_agent.sh
Docker ignore and build config, git ignore for local artifacts, Python dependency pinning, environment variable templates, Python 3.11 Slim container definition with non-root user and Streamlit configuration, plus shell script for bootstrapping virtual environment, port cleanup, dependency installation, and Streamlit startup.
MCP Server: OpenSearch Query Tools
cloud-governance-mcp/mcp_server.py
Backend MCP server initialized over stdio that caches OpenSearch field mappings, resolves case-insensitive field names, coerces filter values by mapped type, and implements seven query tools: list_indices, get_fields, search_documents, count_by_field, aggregate, date_range_search, and raw_search. Results are formatted as markdown tables with debug query dumps for zero-result cases.
Streamlit App: Config, Persistence & MCP Wrappers
cloud-governance-mcp/app.py (modules: imports, index discovery, config, persistence, schema normalization, async tool wrappers)
Configuration reloading on each Streamlit rerun, cached OpenSearch index discovery for sidebar selector, conversation history load/save/clear to JSON, Gemini schema normalization to fix MCP-style schemas, and async-to-sync wrapper functions for discovering and executing MCP tools with optional debug output.
Streamlit App: Gemini Agent Loop & UI
cloud-governance-mcp/app.py (modules: agent loop, main)
Multi-turn Gemini tool-calling loop that forces tool-calling on first turn, executes extracted function calls via MCP tools, appends responses to conversation, iterates up to 10 turns, and returns final answer with tool attribution. Main Streamlit entrypoint initializes session state (index selection, message history, tool cache), renders sidebar controls and chat interface, processes user input, runs the agent loop with prior context, and saves updated conversation to disk.
Documentation
cloud-governance-mcp/README.md
Architecture overview, prerequisites, quick-start workflow, catalog of seven MCP query tools with examples, step-by-step agent flow, troubleshooting for startup and connectivity, management commands, expected OpenSearch indices, security notes on credentials and access control, advanced configuration for Gemini model and OpenSearch cluster switching, project structure, and licensing.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant UI as Streamlit<br/>Chat UI
    participant Agent as Gemini<br/>Agent Loop
    participant MCP as MCP<br/>Tools
    participant OS as OpenSearch
    
    User->>UI: Enter natural<br/>language question
    UI->>Agent: run_agent_loop_gemini<br/>(question, tools, history)
    Agent->>Agent: Load prior<br/>conversation
    Agent->>Agent: Force<br/>tool_choice=ANY
    Agent->>Agent: Call Gemini<br/>with tools
    loop Up to 10 turns (while tool_calls)
        Gemini-->>Agent: Function<br/>call(s)
        Agent->>Agent: Extract function<br/>names & args
        Agent->>MCP: execute_mcp_tool
        MCP->>OS: search_documents /<br/>count_by_field / etc.
        OS-->>MCP: Query results
        MCP-->>Agent: Result text<br/>(markdown)
        Agent->>Agent: Append to<br/>message history
        Agent->>Agent: Call Gemini<br/>again
    end
    Agent-->>Agent: No more<br/>tool calls
    Agent-->>UI: Final answer<br/>(with citations)
    UI->>UI: Append to<br/>chat history
    UI->>UI: Save to<br/>conversations.json
    UI-->>User: Display in<br/>chat interface
Loading

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a Cloud Governance AI Agent based on MCP and OpenSearch, which matches the comprehensive feature additions across the changeset.
Description check ✅ Passed The description is well-related to the changeset, providing a clear summary of the enhancement, architectural overview, available tools, and setup instructions that align with the files added.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

@ebattat ebattat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can limit it only to the indexes that we are using, instead of all the existing indexes. Also, it would be great if you could add a filter to the index list.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cloud-governance-mcp/app.py`:
- Line 251: The st.error call uses an unnecessary f-string: change the call to
st.error("❌ Failed to connect to MCP server") (remove the leading f) so the
literal isn't treated as an f-string; update the string in the existing st.error
invocation where the message "❌ Failed to connect to MCP server" is used.
- Around line 336-345: The current bare "except:" around the json.loads(result)
block should be replaced with a specific exception handler for JSON parsing
(e.g., catch json.JSONDecodeError or ValueError) so non-JSON parsing failures
aren't swallowed; update the try/except that surrounds json.loads(result) to
"except json.JSONDecodeError:" (import JSONDecodeError if needed) and keep the
existing fallback behavior that truncates and calls st.text(result) when parsing
fails.
- Line 350: The function signature for run_agent_loop_gemini uses an implicit
optional type for previous_messages ("list = None"); update the parameter
annotation to explicitly show nullability by changing previous_messages to use
an explicit union with None (e.g., previous_messages: list | None = None) so the
type hint conforms to PEP 484; modify the function definition of
run_agent_loop_gemini accordingly and update any callers or imports if they rely
on a different typing style.

In `@cloud-governance-mcp/Dockerfile`:
- Line 6: The Dockerfile currently writes a Streamlit config with
showErrorDetails enabled via the printf command that contains
"[client]\nshowErrorDetails = true"; change that default to false by updating
the printf invocation (the line that writes to /root/.streamlit/config.toml) so
the file contains "showErrorDetails = false" instead of "true".
- Around line 1-15: The Dockerfile currently runs as root; create and switch to
a non-root user before CMD by adding a user/group (e.g., appuser), chowning
WORKDIR and any config files (like /root/.streamlit or move config to
/home/appuser/.streamlit) and then set USER appuser so that the final CMD
["streamlit", "run", "app.py", ...] executes unprivileged; update any file paths
or ownership for files copied (app.py, mcp_server.py, requirements) and ensure
the home/config directory exists and is owned by the new user before switching.

In `@cloud-governance-mcp/mcp_server.py`:
- Around line 619-622: The _tool_raw_search function accepts arbitrary Query
DSL; validate and sanitize query_body before calling _get_client().search by (1)
enforcing a capped size: if query_body.get("size") is missing or >1000, set it
to 1000; (2) reject or remove dangerous keys such as "script", "script_fields",
"scroll", and any top-level "stored_fields" that could cause heavy work; (3)
block requests containing "from" values greater than a safe threshold (e.g.,
10000) or convert them to use safe pagination; and (4) pass a client.search
timeout (e.g., timeout="30s" or request_timeout) to the call in _tool_raw_search
to bound execution time. Apply these checks in _tool_raw_search before calling
client.search and return a clear error message when input is rejected.

In `@cloud-governance-mcp/README.md`:
- Around line 147-149: The README currently suggests running the literal command
"cat .env | grep OPENSEARCH" which can expose secrets; replace that step with a
safer approach that only reveals variable names or redacts values—i.e., locate
lines matching OPENSEARCH but then remove or mask the value portion instead of
printing the raw .env contents (use standard text-processing tools such as grep
combined with a value-strip or redaction step, or output only the variable names
with cut), and update the README to show that safer command in place of the
existing one.

In `@cloud-governance-mcp/requirements.txt`:
- Line 3: Replace the requirements entry that currently reads "mcp[all]>=1.0.0"
with a pinned stdio-only dependency "mcp>=1.23.0,<2"; specifically update the
token mcp[all]>=1.0.0 in the requirements file to mcp>=1.23.0,<2 to remove
extras used only by non-stdio transports, tighten the minimum version to 1.23.0
for security fixes, and add the <2 upper bound for reproducible installs.

In `@cloud-governance-mcp/run_agent.sh`:
- Around line 41-43: The configuration snippet in run_agent.sh currently sets
showErrorDetails = true which exposes internal traces; change this default to
showErrorDetails = false in the [client] block (replace the true value with
false) so detailed error traces are disabled by default; ensure any comments or
downstream logic that relies on showErrorDetails are updated to expect false as
the safe default.
- Line 33: Replace the unsafe hard-kill line `kill -9 $(lsof -ti tcp:8501)` with
a graceful shutdown sequence: use `lsof -ti tcp:8501` to capture PIDs, send
SIGTERM (kill -15) to those PIDs, wait a short interval and re-check lsof; only
if any PIDs remain, escalate to SIGKILL (kill -9) for those remaining PIDs. Also
consider filtering the PIDs by expected command/user before sending signals to
avoid killing unrelated processes on port 8501.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d1b2ebb5-f92c-4ad4-b4d2-9530bc506420

📥 Commits

Reviewing files that changed from the base of the PR and between d307bce and 2f87ea3.

📒 Files selected for processing (9)
  • cloud-governance-mcp/.dockerignore
  • cloud-governance-mcp/.env.example
  • cloud-governance-mcp/.gitignore
  • cloud-governance-mcp/Dockerfile
  • cloud-governance-mcp/README.md
  • cloud-governance-mcp/app.py
  • cloud-governance-mcp/mcp_server.py
  • cloud-governance-mcp/requirements.txt
  • cloud-governance-mcp/run_agent.sh

Comment thread cloud-governance-mcp/app.py
Comment thread cloud-governance-mcp/app.py
Comment thread cloud-governance-mcp/app.py Outdated
Comment thread cloud-governance-mcp/Dockerfile
Comment thread cloud-governance-mcp/Dockerfile Outdated
Comment thread cloud-governance-mcp/mcp_server.py
Comment thread cloud-governance-mcp/README.md Outdated
Comment thread cloud-governance-mcp/requirements.txt
Comment thread cloud-governance-mcp/run_agent.sh Outdated
Comment thread cloud-governance-mcp/run_agent.sh
@ebattat

ebattat commented May 26, 2026

Copy link
Copy Markdown
Member

/approved

@ebattat

ebattat commented May 26, 2026

Copy link
Copy Markdown
Member

@pragya811, did u review coderabbitai comments ?

@ebattat ebattat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
cloud-governance-mcp/run_agent.sh (1)

33-37: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict port cleanup to this app’s Streamlit process before signaling.

This still signals any process on port 8501, so it can stop unrelated services on shared machines. Filter PIDs by command (e.g., streamlit run app.py) and only escalate to -9 for survivors after re-check.

Suggested fix
-pids="$(lsof -ti tcp:8501 || true)"
-if [ -n "$pids" ]; then
-  kill $pids 2>/dev/null || true
-  sleep 1
-  kill -9 $pids 2>/dev/null || true
-fi
+pids="$(lsof -ti tcp:8501 -sTCP:LISTEN || true)"
+target_pids=""
+for pid in $pids; do
+  cmd="$(ps -p "$pid" -o args= 2>/dev/null || true)"
+  case "$cmd" in
+    *"streamlit run app.py"*) target_pids="$target_pids $pid" ;;
+  esac
+done
+if [ -n "$target_pids" ]; then
+  kill $target_pids 2>/dev/null || true
+  sleep 1
+  still_running="$(lsof -ti tcp:8501 -sTCP:LISTEN || true)"
+  [ -n "$still_running" ] && kill -9 $still_running 2>/dev/null || true
+fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cloud-governance-mcp/run_agent.sh` around lines 33 - 37, The current port
cleanup grabs any PID listening on tcp:8501; change the logic in run_agent.sh so
you first collect candidate PIDs from lsof -ti tcp:8501 into the pids variable,
then filter that list to only include processes whose command matches the
Streamlit invocation (e.g., "streamlit run app.py") by checking each PID's
command (ps -o cmd= -p <pid> or similar) before signaling; send a graceful
SIGTERM to the filtered list, wait a short period, re-check which of those same
PIDs are still running and only then escalate to kill -9 for survivors. Ensure
variable names (pids and the filtered list) and the ordering (terminate, sleep,
re-check, force-kill) match the existing script structure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@cloud-governance-mcp/run_agent.sh`:
- Around line 33-37: The current port cleanup grabs any PID listening on
tcp:8501; change the logic in run_agent.sh so you first collect candidate PIDs
from lsof -ti tcp:8501 into the pids variable, then filter that list to only
include processes whose command matches the Streamlit invocation (e.g.,
"streamlit run app.py") by checking each PID's command (ps -o cmd= -p <pid> or
similar) before signaling; send a graceful SIGTERM to the filtered list, wait a
short period, re-check which of those same PIDs are still running and only then
escalate to kill -9 for survivors. Ensure variable names (pids and the filtered
list) and the ordering (terminate, sleep, re-check, force-kill) match the
existing script structure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a876e573-4888-40f9-ae24-a5bd341c418f

📥 Commits

Reviewing files that changed from the base of the PR and between 2f87ea3 and d963cd6.

📒 Files selected for processing (6)
  • cloud-governance-mcp/.env.example
  • cloud-governance-mcp/Dockerfile
  • cloud-governance-mcp/README.md
  • cloud-governance-mcp/app.py
  • cloud-governance-mcp/mcp_server.py
  • cloud-governance-mcp/run_agent.sh
✅ Files skipped from review due to trivial changes (2)
  • cloud-governance-mcp/README.md
  • cloud-governance-mcp/.env.example
🚧 Files skipped from review as they are similar to previous changes (2)
  • cloud-governance-mcp/app.py
  • cloud-governance-mcp/mcp_server.py

@pragya811 pragya811 marked this pull request as ready for review June 9, 2026 07:41

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cloud-governance-mcp/app.py (1)

534-546: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing check for empty candidates can crash the agent loop.

If Gemini blocks or filters the response (e.g., safety filters triggered), response.candidates may be empty. Accessing candidates[0] will raise IndexError, which gets caught by the generic exception handler but provides a poor user experience.

🛡️ Proposed fix
+            # Check for valid response
+            if not response.candidates:
+                return "❌ No response from Gemini. The request may have been filtered or blocked."
+
             # Add assistant response to history
             history.append(response.candidates[0].content)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cloud-governance-mcp/app.py` around lines 534 - 546, The code assumes
response.candidates[0] exists and will IndexError when Gemini returns no
candidates; before appending to history or accessing .content, add a guard like
if not response.candidates: log/warn about the empty response (include any
response.status/error if available), optionally set final_answer = response.text
or continue the agent loop, and skip extracting function calls; update the
blocks that reference response.candidates[0] (history.append(...), fcalls =
[...]) to run only after the guard so you never access candidates[0] when the
list is empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cloud-governance-mcp/app.py`:
- Around line 534-546: The code assumes response.candidates[0] exists and will
IndexError when Gemini returns no candidates; before appending to history or
accessing .content, add a guard like if not response.candidates: log/warn about
the empty response (include any response.status/error if available), optionally
set final_answer = response.text or continue the agent loop, and skip extracting
function calls; update the blocks that reference response.candidates[0]
(history.append(...), fcalls = [...]) to run only after the guard so you never
access candidates[0] when the list is empty.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 7f79fdbe-520c-42bc-8ebf-3775b585cf49

📥 Commits

Reviewing files that changed from the base of the PR and between 8bc04a0 and 1d6c2bd.

📒 Files selected for processing (1)
  • cloud-governance-mcp/app.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants