Added caching to gemini client #82

YazIbrahim · 2025-07-01T11:16:07Z

This PR adds caching to the gemini client to prevent having to re-add the entire context as input which also includes the tool descriptions. Which should reduce the cost of using gemini models.

There is also now a link a new security blog in the readme.

Checklist

Please ensure you have done the following:

I have run application tests ensuring nothing has broken.
I have updated the documentation if required.
I have added tests which cover my changes.

Type of change

Make sure to update label on right hand panel.

MacOS tests

To trigger the CI to run on a macOS backed workflow, add the macos-ci-test label to the pull request (PR).

Our advice is to only run this workflow when testing the compatability between operating systems for a change that you've made, e.g., adding a new dependency to the virtual environment.

Note: This can take up to 5 minutes to run. This workflow costs x10 more than a Linux-based workflow, use at discretion.

…into add-gemini-and-gke-support # Conflicts: # compose.ecr.yaml # compose.yaml # sre_agent/client/client.py # sre_agent/firewall/startup.sh # sre_agent/llm/utils/clients.py # sre_agent/servers/prompt_server/server.py # uv.lock

Update credentials setup for GKE and EKS

tomstockton

Code Review for PR #82: "Added caching to gemini client"

Overview

This PR implements caching functionality for the Gemini client to reduce API costs by avoiding re-transmission of tool descriptions and context. It also includes some documentation updates and configuration changes.

Analysis

✅ Positive Changes

Caching Implementation:

Smart approach using Gemini's CachedContent API with 600s TTL
Proper error handling with warning logs if cache creation fails
Conditional logic to use cached content when available vs. full tools

Token Usage Tracking:

Enhanced token usage logging includes cache-specific metrics
Proper handling of cache_creation_token_count and cached_content_token_count

⚠️ Areas for Improvement

1. Configuration Changes Need Review:

max_tokens increased from 1000 → 8000 in schemas.py:71
Default example changed from 10000 → 8000 in setup_credentials.py:94
Risk: This significantly increases potential API costs per request

2. Cache Management Issues:

Cache is created per client instance but never invalidated or refreshed
No mechanism to handle cache expiration gracefully
Cache TTL of 600s (10 minutes) may be too short for some use cases

3. Model Version Inconsistency:

README shows claude-3-7-sonnet-latest but this model doesn't exist
Should likely be claude-3-5-sonnet-latest

4. Missing Error Handling:

cache_tools() method catches all exceptions but only logs warnings
No fallback strategy if cache operations fail repeatedly

Specific Suggestions

Code Quality

# Consider adding cache invalidation logic
def _invalidate_cache(self) -> None:
    """Invalidate the current cache if it exists."""
    if self._cache:
        try:
            # Add cache deletion logic here
            self._cache = None
        except Exception as e:
            logger.warning(f"Failed to invalidate cache: {e}")

Performance Considerations

Consider making cache TTL configurable via environment variable
Add cache hit/miss metrics for monitoring effectiveness

Testing Coverage

No tests added for the new caching functionality
Should include unit tests for cache creation, usage, and failure scenarios

Security & Compliance

✅ No security concerns identified
✅ API keys still properly handled via environment variables

Recommendations

Must Fix:

Correct the model name in README.md (claude-3-7-sonnet-latest → claude-3-5-sonnet-latest)
Add unit tests for the caching functionality
Document the max_tokens change - explain why it was increased

Should Consider:

Make cache TTL configurable
Add cache invalidation mechanism
Add monitoring/metrics for cache effectiveness
Consider the cost implications of the max_tokens increase

Minor:

Add type hints to cache_tools() return type
Consider more specific exception handling in cache operations

Overall Assessment

Good implementation of a cost-saving feature with proper error handling. The caching logic is sound, but needs better lifecycle management and testing coverage. The configuration changes should be reviewed for cost implications.

Recommendation: Request changes for model name correction, test coverage, and documentation of configuration changes before approval.

Steve Moss and others added 30 commits June 6, 2025 14:53

feat: enable Gemini and GKE support

94992e4

chore: change env var names

bb06d6f

chore: update references to Claude and Anthropic

d59ccba

chore: enable gke cluster credentials in kubernetes MCP server

cfbbc0a

chore: update variables and add gcp compose file

ac35913

fix: use correct host for MCP Prompt Server

464b102

chore: fix licensecheck

447bb11

chore: cache HF model on host and update slack_channel_id ref

c34d659

Add Gemini key to LLM Server

f6322b2

Update Docker compose.gcp.yaml

e677825

feature: added single setup and run script

cf3dd6b

docs: updating credentials setup

2555bd7

fix: healthcheck dependency

00973c0

Merge pull request #79 from fuzzylabs/add-gemini-and-gke-support-dev-1

964541d

Update credentials setup for GKE and EKS

Add adapters and client to enable Gemini support

254892b

docs: added supported LLMs

3067c4b

Update adapter test with dummy tool name for Anthropic models

35843cb

Fix pre-commit issues and improve code quality

d5b6c1c

Add Gemini API key to compose.tests.yaml

ba96127

Log Gemini token usage

01d1bf4

fix: removed comment

41a535d

fix: added healthchecks

64b02ea

refactor: making max tokens an enviroment variable

1a0b665

feature: added caching to gemini model

ff973b5

docs: added link to security blog

94d1cc3

Merge branch 'main' into feature/gemini-caching

27d9620

chore: revert change

4b68aa6

refactor: reducing max output tokens

a360170

refactor: changed models in docs to ones that have been tested

a20f7b3

YazIbrahim added 2 commits July 1, 2025 12:41

chore: typing

2cc82cc

chore: revert change

ea2c83c

tomstockton requested changes Jul 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added caching to gemini client #82

Added caching to gemini client #82

Uh oh!

YazIbrahim commented Jul 1, 2025

Uh oh!

tomstockton left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Added caching to gemini client #82

Are you sure you want to change the base?

Added caching to gemini client #82

Uh oh!

Conversation

YazIbrahim commented Jul 1, 2025

Checklist

Type of change

MacOS tests

Uh oh!

tomstockton left a comment

Choose a reason for hiding this comment

Code Review for PR #82: "Added caching to gemini client"

Overview

Analysis

✅ Positive Changes

⚠️ Areas for Improvement

Specific Suggestions

Code Quality

Performance Considerations

Testing Coverage

Security & Compliance

Recommendations

Must Fix:

Should Consider:

Minor:

Overall Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants