Add Cloud Governance AI Agent (MCP-based OpenSearch query interface) by pragya811 · Pull Request #997 · redhat-performance/cloud-governance

pragya811 · 2026-05-12T08:12:10Z

Type of change

Note: Fill x in []

bug
enhancement
documentation
dependencies

Summary

Adds a Streamlit-based AI agent that provides a natural language interface to query cloud governance data in OpenSearch
Uses a custom MCP (Model Context Protocol) server with high-level query tools that abstract away raw OpenSearch Query DSL
AI (Google Gemini) calls simple tools like search_documents(filters=[...]) and count_by_field(group_by="policy") instead of constructing complex JSON queries
The MCP server auto-handles .keyword field resolution, type coercion, and case-insensitive field matching
Includes a Dockerfile for containerized deployment and a sidebar index selector dropdown

Architecture

Streamlit UI → Gemini AI → MCP stdio → mcp_server.py subprocess → OpenSearch

Tools Available

Tool	Description
`list_indices`	List all OpenSearch indices with doc counts
`get_fields`	Discover field names, types, and aggregatability
`search_documents`	Filtered search with auto `.keyword` handling
`count_by_field`	Group-by aggregation (terms)
`aggregate`	Metric aggregation (sum/avg/max/min) grouped by field
`date_range_search`	Time-range search with text/date field support
`raw_search`	Escape hatch for raw Query DSL

Setup

cd cloud-governance-mcp
cp .env.example .env  # configure GEMINI_API_KEY, OPENSEARCH_HOSTS
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

Test plan

- Configure .env with valid OpenSearch host and Gemini API key
- Run streamlit run app.py and verify 7 tools load
- Test: "What fields are available in this index?"
- Test: "Show me 5 sample documents"
- Test: "Count documents by [field]"
- Test aggregations with filters
- Switch index via sidebar dropdown and verify session resets

## For security reasons, all pull requests need to be approved first before running any automated CI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-12T08:12:18Z

📝 Walkthrough

Summary by CodeRabbit

Release Notes

New Features
- Cloud Governance AI Agent with conversational chat interface
- Conversation history persistence and export capability
- OpenSearch integration with query tools (search, filtering, aggregations, date ranges)
- Docker containerization for streamlined deployment
Documentation
- Comprehensive setup and usage guide with troubleshooting steps
Chores
- Environment configuration templates and dependency specifications

Walkthrough

This pull request introduces a complete Cloud Governance AI agent feature. It adds a Streamlit application that discovers OpenSearch indices, runs a Gemini-powered agent loop with tool-calling enabled, and uses a custom MCP server to expose OpenSearch query tools. The PR includes full deployment infrastructure (Docker, shell bootstrap), dependency manifest, configuration templates, comprehensive documentation, and all application code.

Changes

Cloud Governance MCP AI Agent

Layer / File(s)	Summary
Infrastructure & Deployment Setup `cloud-governance-mcp/.dockerignore`, `cloud-governance-mcp/.gitignore`, `cloud-governance-mcp/requirements.txt`, `cloud-governance-mcp/.env.example`, `cloud-governance-mcp/Dockerfile`, `cloud-governance-mcp/run_agent.sh`	Docker ignore and build config, git ignore for local artifacts, Python dependency pinning, environment variable templates, Python 3.11 Slim container definition with non-root user and Streamlit configuration, plus shell script for bootstrapping virtual environment, port cleanup, dependency installation, and Streamlit startup.
MCP Server: OpenSearch Query Tools `cloud-governance-mcp/mcp_server.py`	Backend MCP server initialized over stdio that caches OpenSearch field mappings, resolves case-insensitive field names, coerces filter values by mapped type, and implements seven query tools: `list_indices`, `get_fields`, `search_documents`, `count_by_field`, `aggregate`, `date_range_search`, and `raw_search`. Results are formatted as markdown tables with debug query dumps for zero-result cases.
Streamlit App: Config, Persistence & MCP Wrappers `cloud-governance-mcp/app.py` (modules: imports, index discovery, config, persistence, schema normalization, async tool wrappers)	Configuration reloading on each Streamlit rerun, cached OpenSearch index discovery for sidebar selector, conversation history load/save/clear to JSON, Gemini schema normalization to fix MCP-style schemas, and async-to-sync wrapper functions for discovering and executing MCP tools with optional debug output.
Streamlit App: Gemini Agent Loop & UI `cloud-governance-mcp/app.py` (modules: agent loop, main)	Multi-turn Gemini tool-calling loop that forces tool-calling on first turn, executes extracted function calls via MCP tools, appends responses to conversation, iterates up to 10 turns, and returns final answer with tool attribution. Main Streamlit entrypoint initializes session state (index selection, message history, tool cache), renders sidebar controls and chat interface, processes user input, runs the agent loop with prior context, and saves updated conversation to disk.
Documentation `cloud-governance-mcp/README.md`	Architecture overview, prerequisites, quick-start workflow, catalog of seven MCP query tools with examples, step-by-step agent flow, troubleshooting for startup and connectivity, management commands, expected OpenSearch indices, security notes on credentials and access control, advanced configuration for Gemini model and OpenSearch cluster switching, project structure, and licensing.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant UI as Streamlit<br/>Chat UI
    participant Agent as Gemini<br/>Agent Loop
    participant MCP as MCP<br/>Tools
    participant OS as OpenSearch
    
    User->>UI: Enter natural<br/>language question
    UI->>Agent: run_agent_loop_gemini<br/>(question, tools, history)
    Agent->>Agent: Load prior<br/>conversation
    Agent->>Agent: Force<br/>tool_choice=ANY
    Agent->>Agent: Call Gemini<br/>with tools
    loop Up to 10 turns (while tool_calls)
        Gemini-->>Agent: Function<br/>call(s)
        Agent->>Agent: Extract function<br/>names & args
        Agent->>MCP: execute_mcp_tool
        MCP->>OS: search_documents /<br/>count_by_field / etc.
        OS-->>MCP: Query results
        MCP-->>Agent: Result text<br/>(markdown)
        Agent->>Agent: Append to<br/>message history
        Agent->>Agent: Call Gemini<br/>again
    end
    Agent-->>Agent: No more<br/>tool calls
    Agent-->>UI: Final answer<br/>(with citations)
    UI->>UI: Append to<br/>chat history
    UI->>UI: Save to<br/>conversations.json
    UI-->>User: Display in<br/>chat interface

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 51.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a Cloud Governance AI Agent based on MCP and OpenSearch, which matches the comprehensive feature additions across the changeset.
Description check	✅ Passed	The description is well-related to the changeset, providing a clear summary of the enhancement, architectural overview, available tools, and setup instructions that align with the files added.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ebattat

Maybe we can limit it only to the indexes that we are using, instead of all the existing indexes. Also, it would be great if you could add a filter to the index list.

coderabbitai

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cloud-governance-mcp/app.py`:
- Line 251: The st.error call uses an unnecessary f-string: change the call to
st.error("❌ Failed to connect to MCP server") (remove the leading f) so the
literal isn't treated as an f-string; update the string in the existing st.error
invocation where the message "❌ Failed to connect to MCP server" is used.
- Around line 336-345: The current bare "except:" around the json.loads(result)
block should be replaced with a specific exception handler for JSON parsing
(e.g., catch json.JSONDecodeError or ValueError) so non-JSON parsing failures
aren't swallowed; update the try/except that surrounds json.loads(result) to
"except json.JSONDecodeError:" (import JSONDecodeError if needed) and keep the
existing fallback behavior that truncates and calls st.text(result) when parsing
fails.
- Line 350: The function signature for run_agent_loop_gemini uses an implicit
optional type for previous_messages ("list = None"); update the parameter
annotation to explicitly show nullability by changing previous_messages to use
an explicit union with None (e.g., previous_messages: list | None = None) so the
type hint conforms to PEP 484; modify the function definition of
run_agent_loop_gemini accordingly and update any callers or imports if they rely
on a different typing style.

In `@cloud-governance-mcp/Dockerfile`:
- Line 6: The Dockerfile currently writes a Streamlit config with
showErrorDetails enabled via the printf command that contains
"[client]\nshowErrorDetails = true"; change that default to false by updating
the printf invocation (the line that writes to /root/.streamlit/config.toml) so
the file contains "showErrorDetails = false" instead of "true".
- Around line 1-15: The Dockerfile currently runs as root; create and switch to
a non-root user before CMD by adding a user/group (e.g., appuser), chowning
WORKDIR and any config files (like /root/.streamlit or move config to
/home/appuser/.streamlit) and then set USER appuser so that the final CMD
["streamlit", "run", "app.py", ...] executes unprivileged; update any file paths
or ownership for files copied (app.py, mcp_server.py, requirements) and ensure
the home/config directory exists and is owned by the new user before switching.

In `@cloud-governance-mcp/mcp_server.py`:
- Around line 619-622: The _tool_raw_search function accepts arbitrary Query
DSL; validate and sanitize query_body before calling _get_client().search by (1)
enforcing a capped size: if query_body.get("size") is missing or >1000, set it
to 1000; (2) reject or remove dangerous keys such as "script", "script_fields",
"scroll", and any top-level "stored_fields" that could cause heavy work; (3)
block requests containing "from" values greater than a safe threshold (e.g.,
10000) or convert them to use safe pagination; and (4) pass a client.search
timeout (e.g., timeout="30s" or request_timeout) to the call in _tool_raw_search
to bound execution time. Apply these checks in _tool_raw_search before calling
client.search and return a clear error message when input is rejected.

In `@cloud-governance-mcp/README.md`:
- Around line 147-149: The README currently suggests running the literal command
"cat .env | grep OPENSEARCH" which can expose secrets; replace that step with a
safer approach that only reveals variable names or redacts values—i.e., locate
lines matching OPENSEARCH but then remove or mask the value portion instead of
printing the raw .env contents (use standard text-processing tools such as grep
combined with a value-strip or redaction step, or output only the variable names
with cut), and update the README to show that safer command in place of the
existing one.

In `@cloud-governance-mcp/requirements.txt`:
- Line 3: Replace the requirements entry that currently reads "mcp[all]>=1.0.0"
with a pinned stdio-only dependency "mcp>=1.23.0,<2"; specifically update the
token mcp[all]>=1.0.0 in the requirements file to mcp>=1.23.0,<2 to remove
extras used only by non-stdio transports, tighten the minimum version to 1.23.0
for security fixes, and add the <2 upper bound for reproducible installs.

In `@cloud-governance-mcp/run_agent.sh`:
- Around line 41-43: The configuration snippet in run_agent.sh currently sets
showErrorDetails = true which exposes internal traces; change this default to
showErrorDetails = false in the [client] block (replace the true value with
false) so detailed error traces are disabled by default; ensure any comments or
downstream logic that relies on showErrorDetails are updated to expect false as
the safe default.
- Line 33: Replace the unsafe hard-kill line `kill -9 $(lsof -ti tcp:8501)` with
a graceful shutdown sequence: use `lsof -ti tcp:8501` to capture PIDs, send
SIGTERM (kill -15) to those PIDs, wait a short interval and re-check lsof; only
if any PIDs remain, escalate to SIGKILL (kill -9) for those remaining PIDs. Also
consider filtering the PIDs by expected command/user before sending signals to
avoid killing unrelated processes on port 8501.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d1b2ebb5-f92c-4ad4-b4d2-9530bc506420

📥 Commits

Reviewing files that changed from the base of the PR and between d307bce and 2f87ea3.

📒 Files selected for processing (9)

cloud-governance-mcp/.dockerignore
cloud-governance-mcp/.env.example
cloud-governance-mcp/.gitignore
cloud-governance-mcp/Dockerfile
cloud-governance-mcp/README.md
cloud-governance-mcp/app.py
cloud-governance-mcp/mcp_server.py
cloud-governance-mcp/requirements.txt
cloud-governance-mcp/run_agent.sh

ebattat · 2026-05-26T11:52:32Z

/approved

ebattat · 2026-05-26T14:32:36Z

@pragya811, did u review coderabbitai comments ?

ebattat

/approve

coderabbitai

♻️ Duplicate comments (1)

cloud-governance-mcp/run_agent.sh (1)

33-37: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict port cleanup to this app’s Streamlit process before signaling.

This still signals any process on port 8501, so it can stop unrelated services on shared machines. Filter PIDs by command (e.g., streamlit run app.py) and only escalate to -9 for survivors after re-check.

Suggested fix

-pids="$(lsof -ti tcp:8501 || true)"
-if [ -n "$pids" ]; then
-  kill $pids 2>/dev/null || true
-  sleep 1
-  kill -9 $pids 2>/dev/null || true
-fi
+pids="$(lsof -ti tcp:8501 -sTCP:LISTEN || true)"
+target_pids=""
+for pid in $pids; do
+  cmd="$(ps -p "$pid" -o args= 2>/dev/null || true)"
+  case "$cmd" in
+    *"streamlit run app.py"*) target_pids="$target_pids $pid" ;;
+  esac
+done
+if [ -n "$target_pids" ]; then
+  kill $target_pids 2>/dev/null || true
+  sleep 1
+  still_running="$(lsof -ti tcp:8501 -sTCP:LISTEN || true)"
+  [ -n "$still_running" ] && kill -9 $still_running 2>/dev/null || true
+fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cloud-governance-mcp/run_agent.sh` around lines 33 - 37, The current port
cleanup grabs any PID listening on tcp:8501; change the logic in run_agent.sh so
you first collect candidate PIDs from lsof -ti tcp:8501 into the pids variable,
then filter that list to only include processes whose command matches the
Streamlit invocation (e.g., "streamlit run app.py") by checking each PID's
command (ps -o cmd= -p <pid> or similar) before signaling; send a graceful
SIGTERM to the filtered list, wait a short period, re-check which of those same
PIDs are still running and only then escalate to kill -9 for survivors. Ensure
variable names (pids and the filtered list) and the ordering (terminate, sleep,
re-check, force-kill) match the existing script structure.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@cloud-governance-mcp/run_agent.sh`:
- Around line 33-37: The current port cleanup grabs any PID listening on
tcp:8501; change the logic in run_agent.sh so you first collect candidate PIDs
from lsof -ti tcp:8501 into the pids variable, then filter that list to only
include processes whose command matches the Streamlit invocation (e.g.,
"streamlit run app.py") by checking each PID's command (ps -o cmd= -p <pid> or
similar) before signaling; send a graceful SIGTERM to the filtered list, wait a
short period, re-check which of those same PIDs are still running and only then
escalate to kill -9 for survivors. Ensure variable names (pids and the filtered
list) and the ordering (terminate, sleep, re-check, force-kill) match the
existing script structure.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a876e573-4888-40f9-ae24-a5bd341c418f

📥 Commits

Reviewing files that changed from the base of the PR and between 2f87ea3 and d963cd6.

📒 Files selected for processing (6)

cloud-governance-mcp/.env.example
cloud-governance-mcp/Dockerfile
cloud-governance-mcp/README.md
cloud-governance-mcp/app.py
cloud-governance-mcp/mcp_server.py
cloud-governance-mcp/run_agent.sh

✅ Files skipped from review due to trivial changes (2)

cloud-governance-mcp/README.md
cloud-governance-mcp/.env.example

🚧 Files skipped from review as they are similar to previous changes (2)

cloud-governance-mcp/app.py
cloud-governance-mcp/mcp_server.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cloud-governance-mcp/app.py (1)

534-546: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing check for empty candidates can crash the agent loop.

If Gemini blocks or filters the response (e.g., safety filters triggered), response.candidates may be empty. Accessing candidates[0] will raise IndexError, which gets caught by the generic exception handler but provides a poor user experience.

🛡️ Proposed fix

+            # Check for valid response
+            if not response.candidates:
+                return "❌ No response from Gemini. The request may have been filtered or blocked."
+
             # Add assistant response to history
             history.append(response.candidates[0].content)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cloud-governance-mcp/app.py` around lines 534 - 546, The code assumes
response.candidates[0] exists and will IndexError when Gemini returns no
candidates; before appending to history or accessing .content, add a guard like
if not response.candidates: log/warn about the empty response (include any
response.status/error if available), optionally set final_answer = response.text
or continue the agent loop, and skip extracting function calls; update the
blocks that reference response.candidates[0] (history.append(...), fcalls =
[...]) to run only after the guard so you never access candidates[0] when the
list is empty.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cloud-governance-mcp/app.py`:
- Around line 534-546: The code assumes response.candidates[0] exists and will
IndexError when Gemini returns no candidates; before appending to history or
accessing .content, add a guard like if not response.candidates: log/warn about
the empty response (include any response.status/error if available), optionally
set final_answer = response.text or continue the agent loop, and skip extracting
function calls; update the blocks that reference response.candidates[0]
(history.append(...), fcalls = [...]) to run only after the guard so you never
access candidates[0] when the list is empty.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 7f79fdbe-520c-42bc-8ebf-3775b585cf49

📥 Commits

Reviewing files that changed from the base of the PR and between 8bc04a0 and 1d6c2bd.

📒 Files selected for processing (1)

cloud-governance-mcp/app.py

Add Cloud Governance AI Agent (MCP-based OpenSearch query interface)

2f87ea3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ebattat reviewed May 18, 2026

View reviewed changes

ebattat assigned pragya811 May 18, 2026

ebattat added the enhancement New feature or request label May 18, 2026

ebattat added this to Cloud-Governance project May 18, 2026

github-project-automation Bot moved this to In progress in Cloud-Governance project May 18, 2026

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

ebattat approved these changes May 28, 2026

View reviewed changes

Address coderabbit comments, limit indices

d963cd6

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Address coderabbit comments

8bc04a0

pragya811 marked this pull request as ready for review June 9, 2026 07:41

Fix dropdown greyed out error

1d6c2bd

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Uh oh!

Conversation

pragya811 commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Summary

Architecture

Tools Available

Setup

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

ebattat left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ebattat commented May 26, 2026

Uh oh!

ebattat commented May 26, 2026

Uh oh!

ebattat left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pragya811 commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading