Skip to content

Set up logging and alerting with CloudWatch #13

@jeancochrane

Description

@jeancochrane

Motivation

The API does not currently push its logs to a remote log storage service like AWS CloudWatch; instead, if you want to see what's going on in the app, you need to SSH into the server, find the directory that contains the service, and run docker compose logs to view the stdout/stderr of the running app container. This system has two drawbacks:

  1. It's hard to understand and remember unless you have detailed knowledge of the app deployment structure
  2. We can't set up automated alerts based on errors that show up in the logs, so we have to wait for our users to report problems before

We can fix both of these problems by pushing our logs to CloudWatch and setting up some simple alerting based on those logs.

Requirements

The new logging system should:

  • Emit a structured log to stdout for every request/response that the API receives
    • Logs should include the following info:
      • Log level
      • Timestamp
      • Log message
      • Additional info specific to requests/responses
        • IP of the requester
        • Execution time for the request
        • Possibly the params that the user sent and the estimated value that we sent back, though that will involve logging a lot of data (at least one key for each of the ~100 features in the model) and I'm not sure if it's worth it; perhaps we should save that for a future iteration
  • Emit errors to stderr with full traceback and an ERROR log level
  • Ship stdout and stderr to a new CloudWatch log group /ccao/services/api-res-avm under a log stream corresponding to the date of the log (e.g. 2026-05-14)
  • Include a new alert in https://github.com/ccao-data/alerts that watches the CloudWatch log group for the string "error"

Notes

We've never built a logging system like this in R, so I expect implementation will involve some trial and error. My instinct is that we will use a combination of the following tools:

For a custom logging system we built that has a similar degree of complexity, see the create_python_logger function and the logging loop in our Spark ingest code (service-spark-iasworld). Hopefully we won't need to manually forward logs to CloudWatch in this case, since the Docker driver promises to do that for us.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions