Skip to content

Commit b163d24

Browse files
committed
make README the cluster functio ndocstring; remove boilerplate code
1 parent 468bd68 commit b163d24

File tree

3 files changed

+45
-57
lines changed

3 files changed

+45
-57
lines changed

README.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,51 @@ which interplay with the functions:
1313
- `cluster_labels`
1414
- `cluster_probs`
1515

16+
## `cluster` documentation
17+
18+
```julia
19+
cluster(ca::ClusteringAlgortihm, data) cr::ClusteringResults
20+
```
21+
22+
Cluster input `data` according to the algorithm specified by `ca`.
23+
All options related to the algorithm are given as keyword arguments when
24+
constructing `ca`.
25+
26+
The input `data` is a length-m iterable of "vectors" (data points).
27+
"Vector" here is considered in the generalized sense, i.e., any objects that
28+
a distance can be defined on them so that they can be clustered.
29+
In the majority of cases these are vectors of real numbers.
30+
If you have a matrix with each row a data point, simply pass in `eachrow(matrix)`.
31+
32+
The output is always a subtype of `ClusteringResults` that can be further queried.
33+
The cluster labels are always the
34+
positive integers `1:n` with `n::Int` the number of created clusters,
35+
Data points that couldn't get clustered (e.g., outliers or noise)
36+
get assigned negative integers, typically just `-1`.
37+
38+
`ClusteringResults` subtypes always implement the following functions:
39+
40+
- `cluster_labels(cr)` returns a length-m vector `labels::Vector{Int}` containing
41+
the clustering labels , so that `data[i]` has label `labels[i]`.
42+
- `cluster_probs(cr)` returns `probs` a length-m vector of length-`n` vectors
43+
containing the "probabilities" or "score" of each point belonging to one of
44+
the created clusters (useful for fuzzy clustering algorithms).
45+
- `cluster_number(cr)` returns `n`.
46+
47+
Other algorithm-related output can be obtained as a field of the result type,
48+
or by using other specific functions of the result type.
49+
This is described in the individual algorithm implementations docstrings.
50+
51+
## For developers
52+
1653
To create new clustering algorithms simply create a new
1754
subtype of `ClusteringAlgorithm` that extends `cluster`
1855
so that it returns a new subtype of `ClusteringResult`.
1956
This result must extend `cluster_number, cluster_labels`
2057
and optionally `cluster_probs`.
2158

22-
For developers: see two helper functions `each_data_point, input_data_size`
23-
so that you can support matrix input while abiding the declared api
59+
See also the two helper functions `each_data_point, input_data_size`
60+
which help you can support matrix input while abiding the declared api
2461
of iterable of vectors as input.
2562

2663
For more, see the docstring of `cluster`.

src/ClusteringAPI.jl

Lines changed: 6 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -6,41 +6,17 @@ export cluster, cluster_number, cluster_labels, cluster_probs
66
abstract type ClusteringAlgorithm end
77
abstract type ClusteringResults end
88

9-
"""
10-
cluster(ca::ClusteringAlgortihm, data) → cr::ClusteringResults
11-
12-
Cluster input `data` according to the algorithm specified by `ca`.
13-
All options related to the algorithm are given as keyword arguments when
14-
constructing `ca`.
15-
16-
The input `data` is a length-m iterable of "vectors" (data points).
17-
"Vector" here is considered in the generalized sense, i.e., any objects that
18-
a distance can be defined on them so that they can be clustered.
19-
In the majority of cases these are vectors of real numbers.
20-
21-
The output is always a subtype of `ClusteringResults` that can be further queried.
22-
The cluster labels are always the
23-
positive integers `1:n` with `n::Int` the number of created clusters,
24-
Data points that couldn't get clustered (e.g., outliers or noise)
25-
get assigned negative integers, typically just `-1`.
269

27-
`ClusteringResults` subtypes always implement the following functions:
28-
29-
- `cluster_labels(cr)` returns a length-m vector `labels::Vector{Int}` containing
30-
the clustering labels , so that `data[i]` has label `labels[i]`.
31-
- `cluster_probs(cr)` returns `probs` a length-m vector of length-`n` vectors
32-
containing the "probabilities" or "score" of each point belonging to one of
33-
the created clusters (useful for fuzzy clustering algorithms).
34-
- `cluster_number(cr)` returns `n`.
35-
36-
Other algorithm-related output can be obtained as a field of the result type,
37-
or by using other specific functions of the result type.
38-
This is described in the individual algorithm implementations docstrings.
39-
"""
4010
function cluster(ca::ClusteringAlgorithm, data)
4111
throw(ArgumentError("No implementation for `cluster` for $(typeof(ca))."))
4212
end
4313

14+
@doc let # make README the `cluster` function docstring.
15+
path = joinpath(dirname(@__DIR__), "README.md")
16+
include_dependency(path)
17+
read(path, String)
18+
end cluster
19+
4420
"""
4521
cluster_number(cr::ClusteringResults) → n::Int
4622
@@ -76,22 +52,4 @@ function cluster_probs(cr::ClusteringResults)
7652
return probs
7753
end
7854

79-
# two helper functions for agnostic input data type
80-
"""
81-
input_data_size(data) → (d, m)
82-
83-
Return the data point dimension and number of data points.
84-
"""
85-
input_data_size(A::AbstractMatrix) = size(A)
86-
input_data_size(A::AbstractVector{<:AbstractVector}) = (length(first(A)), length(A))
87-
88-
"""
89-
each_data_point(data)
90-
91-
Return an indexable iterator over each data point in `data`, that can be
92-
indexed with indices `1:m`.
93-
"""
94-
each_data_point(A::AbstractMatrix) = eachcol(A)
95-
each_data_point(A::AbstractVector{<:AbstractVector}) = A
96-
9755
end

test/runtests.jl

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,3 @@ cr = cluster(TestClustering(), randn(100))
1616
@test cluster_number(cr) == 1
1717
@test cluster_labels(cr) == fill(1, 100)
1818
@test cluster_probs(cr) == fill([1.0], 100)
19-
20-
@test ClusteringAPI.input_data_size([rand(3) for _ in 1:30]) == (3, 30)
21-
@test ClusteringAPI.input_data_size(rand(3,30)) == (3, 30)
22-
23-
v = [ones(3) for _ in 1:30]
24-
@test ClusteringAPI.each_data_point(v) == v
25-
@test ClusteringAPI.each_data_point(ones(3,30)) == v

0 commit comments

Comments
 (0)