Skip to content

Commit 2b3cb4b

Browse files
committed
[Fix] fix naming style and remove registry
1 parent 5438839 commit 2b3cb4b

File tree

14 files changed

+153
-367
lines changed

14 files changed

+153
-367
lines changed

docs/source/developer-guide/add_metrics.md

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ UCM supports custom metrics with bidirectional updates from both Python and C++
33

44
## Architecture Overview
55
The metrics consists of these components below:
6-
- **metrics** : Central stats registry that manages all metric lifecycle operations (registration, creation, updates, queries)
7-
- **observability.py** : Prometheus integration layer that handles metric exposition and multiprocess collection
6+
- **monitor** : Central stats registry that manages all metric lifecycle operations (registration, creation, updates, queries)
7+
- **observability.py** : Prometheus integration layer that handles metric exposition
88
- **metrics_config.yaml** : Declarative configuration that defines which custom metrics to register and their properties
99

1010
## Getting Started
@@ -31,7 +31,7 @@ prometheus:
3131

3232
# Gauge metrics configuration
3333
gauges:
34-
- name: "lookup_hit_rate"
34+
- name: "external_lookup_hit_rate"
3535
documentation: "Hit rate of ucm lookup requests"
3636
multiprocess_mode: "livemostrecent"
3737

@@ -43,15 +43,14 @@ prometheus:
4343
```
4444
4545
### Use Monitor APIs to Update Stats
46-
The monitor provides a unified interface for metric operations. Note that the workflow requires registering a stats class before creating an instance.
46+
The monitor provides a unified interface for metric operations. Users only need to create stats and update them, while the observability component is responsible for fetching the stats and pushing them to Prometheus.
4747
:::::{tab-set}
4848
:sync-group: install
4949
5050
::::{tab-item} Python side interfaces
5151
:selected:
5252
:sync: py
5353
**Lifecycle Methods**
54-
- `register_istats(name, py::object)`: Register a new stats class implementation.
5554
- `create_stats(name)`: Create and initialize a registered stats object.
5655

5756
**Operation Methods**
@@ -64,11 +63,8 @@ The monitor provides a unified interface for metric operations. Note that the wo
6463

6564
**Example:** Using built-in ConnStats
6665
```python
67-
from ucm.integration.vllm.conn_stats import ConnStats
6866
from ucm.shared.metrics import ucmmonitor
6967
70-
conn_stats = ConnStats(name="ConnStats")
71-
ucmmonitor.register_stats("ConnStats", conn_stats) # Register stats
7268
ucmmonitor.create_stats("ConnStats") # Create a stats obj
7369
7470
# Update stats
@@ -85,7 +81,6 @@ See more detailed example in [test case](https://github.com/ModelEngine-Group/un
8581
::::{tab-item} C++ side interfaces
8682
:sync: cc
8783
**Lifecycle Methods**
88-
- `RegistStats(std::string name, Creator creator)`: Register a new stats class implementation.
8984
- `CreateStats(const std::string& name)`: Create and initialize a registered stats object.
9085

9186
**Operation Methods**
@@ -102,10 +97,8 @@ UCM supports custom metrics by following steps:
10297
```c++
10398
target_link_libraries(xxxstore PUBLIC storeinfra monitor_static)
10499
```
105-
- Step 2: Inheriting from the IStats class to implement custom stats classes
106-
- Step 3: Register stats class to monitor
107-
- Step 4: Create stats object in monitor
108-
- Step 5: Update or get stats info using operation methods
100+
- Step 2: Create stats object using function **CreateStats**
101+
- Step 3: Update using function **UpdateStats**
109102

110103
See more detailed example in [test case](https://github.com/ModelEngine-Group/unified-cache-management/tree/develop/ucm/shared/test/case).
111104

ucm/integration/vllm/conn_stats.py

Lines changed: 0 additions & 20 deletions
This file was deleted.

ucm/integration/vllm/ucm_connector.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
from vllm.platforms import current_platform
1818
from vllm.v1.core.sched.output import SchedulerOutput
1919

20-
from ucm.integration.vllm.conn_stats import ConnStats
2120
from ucm.logger import init_logger
2221
from ucm.observability import UCMStatsLogger
2322
from ucm.shared.metrics import ucmmonitor
@@ -173,8 +172,6 @@ def __init__(self, vllm_config: "VllmConfig", role: KVConnectorRole):
173172

174173
self.metrics_config = self.launch_config.get("metrics_config_path", "")
175174
if self.metrics_config:
176-
conn_stats = ConnStats(name="ConnStats")
177-
ucmmonitor.register_stats("ConnStats", conn_stats)
178175
ucmmonitor.create_stats("ConnStats")
179176
self.stats_logger = UCMStatsLogger(
180177
vllm_config.model_config.served_model_name,

ucm/observability.py

Lines changed: 26 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -21,22 +21,17 @@
2121
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2222
# SOFTWARE.
2323
#
24-
24+
# Be impressed by https://github.com/LMCache/LMCache/blob/dev/lmcache/observability.py
2525

2626
import os
2727
import threading
2828
import time
2929
from dataclasses import dataclass
30-
from pathlib import Path
31-
from typing import Any, Dict, List, Optional, Union
30+
from typing import Any, List
3231

3332
import prometheus_client
3433
import yaml
3534

36-
# Third Party
37-
from prometheus_client import REGISTRY
38-
from vllm.distributed.parallel_state import get_world_group
39-
4035
from ucm.logger import init_logger
4136
from ucm.shared.metrics import ucmmonitor
4237

@@ -56,7 +51,7 @@ class PrometheusLogger:
5651
_counter_cls = prometheus_client.Counter
5752
_histogram_cls = prometheus_client.Histogram
5853

59-
def __init__(self, metadata: UCMEngineMetadata, config: Dict[str, Any]):
54+
def __init__(self, metadata: UCMEngineMetadata, config: dict[str, Any]):
6055
# Ensure PROMETHEUS_MULTIPROC_DIR is set before any metric registration
6156
prometheus_config = config.get("prometheus", {})
6257
multiproc_dir = prometheus_config.get("multiproc_dir", "/vllm-workspace")
@@ -74,7 +69,7 @@ def __init__(self, metadata: UCMEngineMetadata, config: Dict[str, Any]):
7469
self._init_metrics_from_config(labelnames, prometheus_config)
7570

7671
def _init_metrics_from_config(
77-
self, labelnames: List[str], prometheus_config: Dict[str, Any]
72+
self, labelnames: List[str], prometheus_config: dict[str, Any]
7873
):
7974
"""Initialize metrics based on configuration"""
8075
enabled = prometheus_config.get("enabled_metrics", {})
@@ -85,7 +80,7 @@ def _init_metrics_from_config(
8580

8681
# Store metric mapping: metric_name -> (metric_type, attribute_name, stats_field_name)
8782
# This mapping will be used in log_prometheus to dynamically log metrics
88-
self.metric_mappings: Dict[str, Dict[str, str]] = {}
83+
self.metric_mappings: dict[str, dict[str, str]] = {}
8984

9085
# Initialize counters
9186
if enabled.get("counters", True):
@@ -172,23 +167,23 @@ def _init_metrics_from_config(
172167
"attr": attr_name,
173168
}
174169

175-
def _log_gauge(self, gauge, data: Union[int, float]) -> None:
170+
def _set_gauge(self, gauge, data: List) -> None:
176171
# Convenience function for logging to gauge.
172+
if not data:
173+
return
177174
gauge.labels(**self.labels).set(data)
178175

179-
def _log_counter(self, counter, data: Union[int, float]) -> None:
176+
def _inc_counter(self, counter, data: List) -> None:
180177
# Convenience function for logging to counter.
181178
# Prevent ValueError from negative increment
182-
if data < 0:
183-
return
184-
counter.labels(**self.labels).inc(data)
179+
counter.labels(**self.labels).inc(sum(data))
185180

186-
def _log_histogram(self, histogram, data: Union[List[int], List[float]]) -> None:
181+
def _observe_histogram(self, histogram, data: List) -> None:
187182
# Convenience function for logging to histogram.
188183
for value in data:
189184
histogram.labels(**self.labels).observe(value)
190185

191-
def log_prometheus(self, stats: Any):
186+
def update_stats(self, stats: dict[str, List]):
192187
"""Log metrics to Prometheus based on configuration file"""
193188
# Dynamically log metrics based on what's configured in YAML
194189
for stat_name, value in stats.items():
@@ -202,17 +197,14 @@ def log_prometheus(self, stats: Any):
202197

203198
# Log based on metric type
204199
if metric_type == "counter":
205-
self._log_counter(metric_obj, value)
200+
self._set_gauge(metric_obj, value)
206201
elif metric_type == "gauge":
207-
self._log_gauge(metric_obj, value)
202+
self._inc_counter(metric_obj, value)
208203
elif metric_type == "histogram":
209204
# Histograms expect a list
210-
if not isinstance(value, list):
211-
if value:
212-
value = [value]
213-
else:
214-
value = []
215-
self._log_histogram(metric_obj, value)
205+
self._observe_histogram(metric_obj, value)
206+
else:
207+
logger.error(f"Not found metric type for {stat_name}")
216208
except Exception:
217209
logger.debug(f"Failed to log metric {stat_name}")
218210

@@ -223,40 +215,6 @@ def _metadata_to_labels(metadata: UCMEngineMetadata):
223215
"worker_id": metadata.worker_id,
224216
}
225217

226-
_instance = None
227-
228-
@staticmethod
229-
def GetOrCreate(
230-
metadata: UCMEngineMetadata,
231-
config_path: str = "",
232-
) -> "PrometheusLogger":
233-
if PrometheusLogger._instance is None:
234-
PrometheusLogger._instance = PrometheusLogger(metadata, config_path)
235-
# assert PrometheusLogger._instance.metadata == metadata, \
236-
# "PrometheusLogger instance already created with different metadata"
237-
if PrometheusLogger._instance.metadata != metadata:
238-
logger.error(
239-
"PrometheusLogger instance already created with"
240-
"different metadata. This should not happen except "
241-
"in test"
242-
)
243-
return PrometheusLogger._instance
244-
245-
@staticmethod
246-
def GetInstance() -> "PrometheusLogger":
247-
assert (
248-
PrometheusLogger._instance is not None
249-
), "PrometheusLogger instance not created yet"
250-
return PrometheusLogger._instance
251-
252-
@staticmethod
253-
def GetInstanceOrNone() -> Optional["PrometheusLogger"]:
254-
"""
255-
Returns the singleton instance of PrometheusLogger if it exists,
256-
otherwise returns None.
257-
"""
258-
return PrometheusLogger._instance
259-
260218

261219
class UCMStatsLogger:
262220
def __init__(self, model_name: str, rank: int, config_path: str = ""):
@@ -267,13 +225,13 @@ def __init__(self, model_name: str, rank: int, config_path: str = ""):
267225
# Load configuration
268226
config = self._load_config(config_path)
269227
self.log_interval = config.get("log_interval", 10)
270-
self.prometheus_logger = PrometheusLogger.GetOrCreate(self.metadata, config)
228+
self.prometheus_logger = PrometheusLogger(self.metadata, config)
271229
self.is_running = True
272230

273231
self.thread = threading.Thread(target=self.log_worker, daemon=True)
274232
self.thread.start()
275233

276-
def _load_config(self, config_path: str) -> Dict[str, Any]:
234+
def _load_config(self, config_path: str) -> dict[str, Any]:
277235
"""Load configuration from YAML file"""
278236
try:
279237
with open(config_path, "r") as f:
@@ -293,11 +251,16 @@ def _load_config(self, config_path: str) -> Dict[str, Any]:
293251

294252
def log_worker(self):
295253
while self.is_running:
296-
# Use UCMStatsMonitor.get_states_and_clear() from external import
297254
stats = ucmmonitor.get_all_stats_and_clear().data
298-
self.prometheus_logger.log_prometheus(stats)
255+
self.prometheus_logger.update_stats(stats)
299256
time.sleep(self.log_interval)
300257

301258
def shutdown(self):
302259
self.is_running = False
303260
self.thread.join()
261+
262+
def __del__(self):
263+
try:
264+
self.shutdown()
265+
except Exception:
266+
pass

ucm/shared/metrics/cc/api/stats_monitor_api.cc

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,6 @@
2424
#include "stats_monitor_api.h"
2525
namespace UC::Metrics {
2626

27-
void RegistStats(std::string name, Creator creator)
28-
{
29-
StatsRegistry::GetInstance().RegisterStats(name, creator);
30-
}
31-
3227
void CreateStats(const std::string& name) { StatsMonitor::GetInstance().CreateStats(name); }
3328

3429
void UpdateStats(const std::string& name, const std::unordered_map<std::string, double>& params)

ucm/shared/metrics/cc/api/stats_monitor_api.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@
2323
* */
2424
#ifndef UNIFIEDCACHE_MONITOR_API_H
2525
#define UNIFIEDCACHE_MONITOR_API_H
26-
#include <string>
27-
#include <unordered_map>
2826
#include "stats_monitor.h"
2927

3028
namespace UC::Metrics {
@@ -33,7 +31,6 @@ struct StatsResult {
3331
std::unordered_map<std::string, std::vector<double>> data;
3432
};
3533

36-
void RegistStats(std::string name, Creator creator);
3734
void CreateStats(const std::string& name);
3835
void UpdateStats(const std::string& name, const std::unordered_map<std::string, double>& params);
3936
void ResetStats(const std::string& name);

ucm/shared/metrics/cc/domain/stats/istats.h renamed to ucm/shared/metrics/cc/domain/stats/stats.h

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,24 +21,29 @@
2121
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2222
* SOFTWARE.
2323
* */
24-
#ifndef UNIFIEDCACHE_ISTATS_H
25-
#define UNIFIEDCACHE_ISTATS_H
24+
#ifndef UNIFIEDCACHE_STATS_H
25+
#define UNIFIEDCACHE_STATS_H
2626

27-
#include <functional>
28-
#include <memory>
2927
#include <string>
3028
#include <unordered_map>
3129
#include <vector>
3230

3331
namespace UC::Metrics {
3432

35-
class IStats {
33+
class Stats {
3634
public:
37-
virtual ~IStats() = default;
38-
virtual std::string Name() const = 0;
39-
virtual void Update(const std::unordered_map<std::string, double>& params) = 0;
40-
virtual void Reset() = 0;
41-
virtual std::unordered_map<std::string, std::vector<double>> Data() = 0;
35+
explicit Stats(const std::string& name) : name_(name) {}
36+
std::string Name() { return name_; }
37+
void Update(const std::unordered_map<std::string, double>& params)
38+
{
39+
for (const auto& [key, val] : params) { data_[key].push_back(val); }
40+
}
41+
void Reset() { data_.clear(); }
42+
std::unordered_map<std::string, std::vector<double>> Data() { return data_; }
43+
44+
private:
45+
std::string name_;
46+
std::unordered_map<std::string, std::vector<double>> data_;
4247
};
4348

4449
} // namespace UC::Metrics

0 commit comments

Comments
 (0)