Skip to content

Conversation

@YazIbrahim
Copy link
Contributor

This PR adds caching to the gemini client to prevent having to re-add the entire context as input which also includes the tool descriptions. Which should reduce the cost of using gemini models.

There is also now a link a new security blog in the readme.

Checklist

Please ensure you have done the following:

  • I have run application tests ensuring nothing has broken.
  • I have updated the documentation if required.
  • I have added tests which cover my changes.

Type of change

Make sure to update label on right hand panel.

MacOS tests

To trigger the CI to run on a macOS backed workflow, add the macos-ci-test label to the pull request (PR).

Our advice is to only run this workflow when testing the compatability between operating systems for a change that you've made, e.g., adding a new dependency to the virtual environment.

Note: This can take up to 5 minutes to run. This workflow costs x10 more than a Linux-based workflow, use at discretion.

Steve Moss and others added 30 commits June 6, 2025 14:53
…into add-gemini-and-gke-support

# Conflicts:
#	compose.ecr.yaml
#	compose.yaml
#	sre_agent/client/client.py
#	sre_agent/firewall/startup.sh
#	sre_agent/llm/utils/clients.py
#	sre_agent/servers/prompt_server/server.py
#	uv.lock
Copy link
Member

@tomstockton tomstockton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review for PR #82: "Added caching to gemini client"

Overview

This PR implements caching functionality for the Gemini client to reduce API costs by avoiding re-transmission of tool descriptions and context. It also includes some documentation updates and configuration changes.

Analysis

✅ Positive Changes

Caching Implementation:

  • Smart approach using Gemini's CachedContent API with 600s TTL
  • Proper error handling with warning logs if cache creation fails
  • Conditional logic to use cached content when available vs. full tools

Token Usage Tracking:

  • Enhanced token usage logging includes cache-specific metrics
  • Proper handling of cache_creation_token_count and cached_content_token_count

⚠️ Areas for Improvement

1. Configuration Changes Need Review:

  • max_tokens increased from 1000 → 8000 in schemas.py:71
  • Default example changed from 10000 → 8000 in setup_credentials.py:94
  • Risk: This significantly increases potential API costs per request

2. Cache Management Issues:

  • Cache is created per client instance but never invalidated or refreshed
  • No mechanism to handle cache expiration gracefully
  • Cache TTL of 600s (10 minutes) may be too short for some use cases

3. Model Version Inconsistency:

  • README shows claude-3-7-sonnet-latest but this model doesn't exist
  • Should likely be claude-3-5-sonnet-latest

4. Missing Error Handling:

  • cache_tools() method catches all exceptions but only logs warnings
  • No fallback strategy if cache operations fail repeatedly

Specific Suggestions

Code Quality

# Consider adding cache invalidation logic
def _invalidate_cache(self) -> None:
    """Invalidate the current cache if it exists."""
    if self._cache:
        try:
            # Add cache deletion logic here
            self._cache = None
        except Exception as e:
            logger.warning(f"Failed to invalidate cache: {e}")

Performance Considerations

  • Consider making cache TTL configurable via environment variable
  • Add cache hit/miss metrics for monitoring effectiveness

Testing Coverage

  • No tests added for the new caching functionality
  • Should include unit tests for cache creation, usage, and failure scenarios

Security & Compliance

  • ✅ No security concerns identified
  • ✅ API keys still properly handled via environment variables

Recommendations

Must Fix:

  1. Correct the model name in README.md (claude-3-7-sonnet-latestclaude-3-5-sonnet-latest)
  2. Add unit tests for the caching functionality
  3. Document the max_tokens change - explain why it was increased

Should Consider:

  1. Make cache TTL configurable
  2. Add cache invalidation mechanism
  3. Add monitoring/metrics for cache effectiveness
  4. Consider the cost implications of the max_tokens increase

Minor:

  1. Add type hints to cache_tools() return type
  2. Consider more specific exception handling in cache operations

Overall Assessment

Good implementation of a cost-saving feature with proper error handling. The caching logic is sound, but needs better lifecycle management and testing coverage. The configuration changes should be reviewed for cost implications.

Recommendation: Request changes for model name correction, test coverage, and documentation of configuration changes before approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants