-
Notifications
You must be signed in to change notification settings - Fork 24
Added caching to gemini client #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…into add-gemini-and-gke-support # Conflicts: # compose.ecr.yaml # compose.yaml # sre_agent/client/client.py # sre_agent/firewall/startup.sh # sre_agent/llm/utils/clients.py # sre_agent/servers/prompt_server/server.py # uv.lock
Update credentials setup for GKE and EKS
tomstockton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review for PR #82: "Added caching to gemini client"
Overview
This PR implements caching functionality for the Gemini client to reduce API costs by avoiding re-transmission of tool descriptions and context. It also includes some documentation updates and configuration changes.
Analysis
✅ Positive Changes
Caching Implementation:
- Smart approach using Gemini's
CachedContentAPI with 600s TTL - Proper error handling with warning logs if cache creation fails
- Conditional logic to use cached content when available vs. full tools
Token Usage Tracking:
- Enhanced token usage logging includes cache-specific metrics
- Proper handling of
cache_creation_token_countandcached_content_token_count
⚠️ Areas for Improvement
1. Configuration Changes Need Review:
max_tokensincreased from 1000 → 8000 inschemas.py:71- Default example changed from 10000 → 8000 in
setup_credentials.py:94 - Risk: This significantly increases potential API costs per request
2. Cache Management Issues:
- Cache is created per client instance but never invalidated or refreshed
- No mechanism to handle cache expiration gracefully
- Cache TTL of 600s (10 minutes) may be too short for some use cases
3. Model Version Inconsistency:
- README shows
claude-3-7-sonnet-latestbut this model doesn't exist - Should likely be
claude-3-5-sonnet-latest
4. Missing Error Handling:
cache_tools()method catches all exceptions but only logs warnings- No fallback strategy if cache operations fail repeatedly
Specific Suggestions
Code Quality
# Consider adding cache invalidation logic
def _invalidate_cache(self) -> None:
"""Invalidate the current cache if it exists."""
if self._cache:
try:
# Add cache deletion logic here
self._cache = None
except Exception as e:
logger.warning(f"Failed to invalidate cache: {e}")Performance Considerations
- Consider making cache TTL configurable via environment variable
- Add cache hit/miss metrics for monitoring effectiveness
Testing Coverage
- No tests added for the new caching functionality
- Should include unit tests for cache creation, usage, and failure scenarios
Security & Compliance
- ✅ No security concerns identified
- ✅ API keys still properly handled via environment variables
Recommendations
Must Fix:
- Correct the model name in README.md (
claude-3-7-sonnet-latest→claude-3-5-sonnet-latest) - Add unit tests for the caching functionality
- Document the max_tokens change - explain why it was increased
Should Consider:
- Make cache TTL configurable
- Add cache invalidation mechanism
- Add monitoring/metrics for cache effectiveness
- Consider the cost implications of the max_tokens increase
Minor:
- Add type hints to
cache_tools()return type - Consider more specific exception handling in cache operations
Overall Assessment
Good implementation of a cost-saving feature with proper error handling. The caching logic is sound, but needs better lifecycle management and testing coverage. The configuration changes should be reviewed for cost implications.
Recommendation: Request changes for model name correction, test coverage, and documentation of configuration changes before approval.
This PR adds caching to the gemini client to prevent having to re-add the entire context as input which also includes the tool descriptions. Which should reduce the cost of using gemini models.
There is also now a link a new security blog in the readme.
Checklist
Please ensure you have done the following:
Type of change
Make sure to update label on right hand panel.
MacOS tests
To trigger the CI to run on a macOS backed workflow, add the
macos-ci-testlabel to the pull request (PR).Our advice is to only run this workflow when testing the compatability between operating systems for a change that you've made, e.g., adding a new dependency to the virtual environment.