Skip to content

Commit 320326c

Browse files
authored
Merge pull request #626 from graphistry/dev/dev-skrub
Dev/dev skrub
2 parents 105487b + 4a5edb8 commit 320326c

38 files changed

+1407
-869
lines changed

.github/workflows/ci.yml

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ jobs:
5454
./bin/lint.sh
5555
5656
- name: Type check
57+
env:
58+
PYTHON_VERSION: ${{ matrix.python-version }}
5759
run: |
5860
source pygraphistry/bin/activate
5961
./bin/typecheck.sh
@@ -101,6 +103,8 @@ jobs:
101103
./bin/lint.sh
102104
103105
- name: Type check
106+
env:
107+
PYTHON_VERSION: ${{ matrix.python-version }}
104108
run: |
105109
source pygraphistry/bin/activate
106110
./bin/typecheck.sh
@@ -143,6 +147,8 @@ jobs:
143147
python -m pip install -e .[test,pygraphviz]
144148
145149
- name: Type check
150+
env:
151+
PYTHON_VERSION: ${{ matrix.python-version }}
146152
run: |
147153
source pygraphistry/bin/activate
148154
./bin/typecheck.sh
@@ -159,8 +165,7 @@ jobs:
159165

160166
strategy:
161167
matrix:
162-
#python-version: [3.8, 3.9, '3.10', 3.11, 3.12]
163-
python-version: [3.8, 3.9]
168+
python-version: [3.9, '3.10', 3.11, 3.12]
164169

165170
steps:
166171

@@ -185,6 +190,8 @@ jobs:
185190
python -m pip install -e .[test,testai,umap-learn]
186191
187192
- name: Type check
193+
env:
194+
PYTHON_VERSION: ${{ matrix.python-version }}
188195
run: |
189196
source pygraphistry/bin/activate
190197
./bin/typecheck.sh
@@ -206,8 +213,7 @@ jobs:
206213

207214
strategy:
208215
matrix:
209-
python-version: [3.8, 3.9]
210-
#python-version: [3.8, 3.9, '3.10', 3.11, 3.12]
216+
python-version: [3.9, '3.10', 3.11, 3.12]
211217
#include:
212218
# - python-version: 3.12
213219
# continue-on-error: true
@@ -233,14 +239,16 @@ jobs:
233239
source pygraphistry/bin/activate
234240
python -m pip install --upgrade pip
235241
python -m pip install -e .[test,testai,ai]
236-
echo "dirty-cat: `pip show dirty-cat | grep Version`"
242+
echo "skrub: `pip show skrub | grep Version`"
237243
echo "pandas: `pip show pandas | grep Version`"
238244
echo "numpy: `pip show numpy | grep Version`"
239245
echo "scikit-learn: `pip show scikit-learn | grep Version`"
240246
echo "scipy: `pip show scipy | grep Version`"
241247
echo "umap-learn: `pip show umap-learn | grep Version`"
242248
243249
- name: Type check
250+
env:
251+
PYTHON_VERSION: ${{ matrix.python-version }}
244252
run: |
245253
source pygraphistry/bin/activate
246254
./bin/typecheck.sh
@@ -270,6 +278,11 @@ jobs:
270278
source pygraphistry/bin/activate
271279
./bin/test-embed.sh
272280
281+
- name: Full DGL tests (rich featurize)
282+
run: |
283+
source pygraphistry/bin/activate
284+
./bin/test-dgl.sh
285+
273286
274287
test-neo4j:
275288

CHANGELOG.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,43 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
77

88
## [Development]
99

10+
## [0.36.0 - 2025-02-05]
11+
12+
### Breaking
13+
14+
* `from_cugraph` returns using the src/dst bindings of `cugraph.Graph` object instead of base `Plottable`
15+
* `pip install graphistry[umap-learn]` and `pip install graphistry[ai]` are now Python 3.9+ (was 3.8+)
16+
* `Plottable`'s fields `_node_dbscan` / `_edge_dbscan` are now `_dbscan_nodes` / `_dbscan_edges`
17+
18+
### Feat
19+
20+
* Switch to `skrub` for feature engineering
21+
* More AI methods support GPU path
22+
* Support cugraph 26.10+, numpy 2.0+
23+
* Add more umap, dbscan fields to `Plottable`
24+
25+
### Infra
26+
27+
* `[umap-learn]` + `[ai]` unpin deps - scikit, scipy, torch (now 2), etc
28+
29+
### Refactor
30+
31+
* Move more type models to models/compute/{feature,umap,cluster}
32+
* Turn more print => logger
33+
34+
### Fixes
35+
36+
* Remove lint/type ignores and fix root causes
37+
38+
### Tests
39+
40+
* Stop ignoring warnings in featurize and umap
41+
* python version tests use corresponding python version for mypy
42+
* ci umap tests: py 3.8, 3.9 => 3.9..3.12
43+
* ci ai tests: py 3.8, 3.9 => 3.9..3.12
44+
* ci tests dgl
45+
* plugin tests check for module imports
46+
1047
## [0.35.10 - 2025-01-24]
1148

1249
### Fixes:

bin/test-dgl.sh

100644100755
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
set -ex
3+
4+
# Run from project root
5+
# - Args get passed to pytest phase
6+
# Non-zero exit code on fail
7+
8+
# Assume [umap-learn,test]
9+
10+
python -m pytest --version
11+
12+
python -B -m pytest -vv \
13+
graphistry/tests/test_dgl_utils.py

bin/typecheck.sh

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,9 @@ set -ex
66

77
mypy --version
88

9-
# Check core
10-
mypy --config-file mypy.ini graphistry
9+
if [ -n "$PYTHON_VERSION" ]; then
10+
SHORT_VERSION=$(echo "$PYTHON_VERSION" | cut -d. -f1,2)
11+
mypy --python-version "$SHORT_VERSION" --config-file mypy.ini graphistry
12+
else
13+
mypy --config-file mypy.ini graphistry
14+
fi

docker/test-cpu-umap-ai.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
set -ex
33

44

5-
PYTHON_VERSION=${PYTHON_VERSION:-3.8} \
5+
TEST_FILES=${@:-"graphistry/tests/test_feature_utils.py graphistry/tests/test_umap_utils.py"}
6+
7+
PYTHON_VERSION=${PYTHON_VERSION:-3.10} \
68
PIP_DEPS=${PIP_DEPS:--e .[ai,test,testai]} \
79
WITH_LINT=${WITH_LINT:-1} \
810
WITH_TYPECHECK=${WITH_TYPECHECK:-1} \
@@ -11,6 +13,4 @@ WITH_TEST=${WITH_TEST:-1} \
1113
SENTENCE_TRANSFORMER=${SENTENCE_TRANSFORMER-average_word_embeddings_komninos} \
1214
SENTENCE_TRANSFORMER=${SENTENCE_TRANSFORMER} \
1315
./test-cpu-local.sh \
14-
graphistry/tests/test_feature_utils.py \
15-
graphistry/tests/test_umap_utils.py \
16-
$@
16+
$TEST_FILES

docker/test-cpu-umap.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
set -ex
33

44

5-
PYTHON_VERSION=${PYTHON_VERSION:-3.8} \
5+
TEST_FILES=${@:-"graphistry/tests/test_feature_utils.py graphistry/tests/test_umap_utils.py"}
6+
7+
PYTHON_VERSION=${PYTHON_VERSION:-3.9} \
68
PIP_DEPS=${PIP_DEPS:--e .[umap-learn,test,testai]} \
79
WITH_LINT=${WITH_LINT:-1} \
810
WITH_TYPECHECK=${WITH_TYPECHECK:-1} \
@@ -11,6 +13,4 @@ WITH_TEST=${WITH_TEST:-1} \
1113
SENTENCE_TRANSFORMER=${SENTENCE_TRANSFORMER-average_word_embeddings_komninos} \
1214
SENTENCE_TRANSFORMER=${SENTENCE_TRANSFORMER} \
1315
./test-cpu-local.sh \
14-
graphistry/tests/test_feature_utils.py \
15-
graphistry/tests/test_umap_utils.py \
16-
$@
16+
$TEST_FILES

docker/test-gpu-local.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,5 +47,4 @@ docker run \
4747
${NETWORK} \
4848
graphistry/test-gpu:${TEST_CPU_VERSION} \
4949
--maxfail=1 \
50-
--ignore=graphistry/tests/test_feature_utils.py \
5150
$@

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,10 +239,10 @@
239239
('py:class', 'torch'),
240240
('py:class', 'umap'),
241241
('py:class', 'sentence_transformers'),
242-
('py:class', 'dirty_cat'),
243242
('py:class', 'sklearn'),
244243
('py:class', 'scipy'),
245244
('py:class', 'seaborn'),
245+
('py:class', 'skrub'),
246246
('py:class', 'annoy'),
247247
('py:class', 'NetworkX graph'),
248248
('py:class', 'Pandas dataframe'),

graphistry/Engine.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
from inspect import getmodule
2+
import warnings
23
import numpy as np
34
import pandas as pd
45
from typing import Any, Optional, Union
56
from enum import Enum
6-
from graphistry.utils.lazy_import import lazy_cudf_import
77

88

99
class Engine(Enum):
@@ -29,6 +29,8 @@ def resolve_engine(
2929
g_or_df: Optional[Any] = None,
3030
) -> Engine:
3131

32+
from graphistry.utils.lazy_import import lazy_cudf_import
33+
3234
if isinstance(engine, str):
3335
engine = EngineAbstract(engine)
3436

@@ -42,7 +44,8 @@ def resolve_engine(
4244
if isinstance(g_or_df, Plottable):
4345
if g_or_df._nodes is not None and g_or_df._edges is not None:
4446
if not isinstance(g_or_df._nodes, type(g_or_df._edges)):
45-
raise ValueError(f'Edges and nodes must be same type for auto engine selection, got: {type(g_or_df._edges)} and {type(g_or_df._nodes)}')
47+
#raise ValueError(f'Edges and nodes must be same type for auto engine selection, got: {type(g_or_df._edges)} and {type(g_or_df._nodes)}')
48+
warnings.warn(f'Edges and nodes must be same type for auto engine selection, got: {type(g_or_df._edges)} and {type(g_or_df._nodes)}')
4649
g_or_df = g_or_df._edges if g_or_df._edges is not None else g_or_df._nodes
4750

4851
if g_or_df is not None:

graphistry/Plottable.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22
from typing_extensions import Literal
33
import pandas as pd
44

5+
from graphistry.models.ModelDict import ModelDict
56
from graphistry.models.compute.chain_remote import FormatType, OutputTypeAll, OutputTypeDf, OutputTypeGraph
7+
from graphistry.models.compute.dbscan import DBSCANEngine
8+
from graphistry.models.compute.umap import UMAPEngineConcrete
69
from graphistry.plugins_types.cugraph_types import CuGraphKind
710
from graphistry.Engine import Engine, EngineAbstract
811
from graphistry.utils.json import JSONVal
@@ -72,11 +75,13 @@ class Plottable(object):
7275
_node_embedding : Optional[pd.DataFrame]
7376
_node_encoder : Optional[Any]
7477
_node_features : Optional[pd.DataFrame]
78+
_node_features_raw: Optional[pd.DataFrame]
7579
_node_target : Optional[pd.DataFrame]
7680

7781
_edge_embedding : Optional[pd.DataFrame]
7882
_edge_encoder : Optional[Any]
7983
_edge_features : Optional[pd.DataFrame]
84+
_edge_features_raw: Optional[pd.DataFrame]
8085
_edge_target : Optional[pd.DataFrame]
8186

8287
_weighted_adjacency: Optional[Any]
@@ -88,10 +93,27 @@ class Plottable(object):
8893
_xy: Optional[pd.DataFrame]
8994

9095
_umap : Optional[UMAP]
91-
_umap_params: Optional[Dict[str, Any]]
96+
_umap_engine: Optional[UMAPEngineConcrete]
97+
_umap_params: Optional[Union[ModelDict, Dict[str, Any]]]
9298
_umap_fit_kwargs: Optional[Dict[str, Any]]
9399
_umap_transform_kwargs: Optional[Dict[str, Any]]
94100

101+
# extra umap
102+
_n_components: int
103+
_metric: str
104+
_n_neighbors: int
105+
_min_dist: float
106+
_spread: float
107+
_local_connectivity: int
108+
_repulsion_strength: float
109+
_negative_sample_rate: float
110+
_suffix: str
111+
112+
_dbscan_engine: Optional[DBSCANEngine]
113+
_dbscan_params: Optional[ModelDict]
114+
_dbscan_nodes: Optional[Any] # fit model
115+
_dbscan_edges: Optional[Any] # fit model
116+
95117
_adjacency : Optional[Any]
96118
_entity_to_index : Optional[dict]
97119
_index_to_entity : Optional[dict]

0 commit comments

Comments
 (0)