You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+58-5Lines changed: 58 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,19 +2,35 @@
2
2
3
3
**Read this in other languages: [简体中文](./README.zh-CN.md)**
4
4
5
-
The go_numcalc package is a package developed in Go language. Its main function is to perform basic numerical processing such as data conversion, data grouping, and data smoothing for numerical type data.
5
+
The go_numcalc package is a package developed in Go language. Its main function is to perform basic numerical processing
6
+
such as data conversion, data grouping, and data smoothing for numerical type data.
6
7
All functions support only two data types: Int32 and Float32.
7
8
8
9
> [!TIP]
9
10
> This package is written in Go language and contains part of C++, that is, the cgo part
10
11
12
+
<!-- TOC -->
13
+
*[go_numcalc](#go_numcalc)
14
+
*[Installation](#installation)
15
+
*[Dependencies](#dependencies)
16
+
*[Usage](#usage)
17
+
*[Project structure](#project-structure)
18
+
*[Development](#development)
19
+
*[First phase functions](#first-phase-functions)
20
+
*[Phase II functions (TODO)](#phase-ii-functions-todo)
21
+
*[License](#license)
22
+
<!-- TOC -->
23
+
11
24
## Installation
25
+
12
26
Use the `go get` to install go_numcalc.
27
+
13
28
```shell
14
29
go get github.com/dingyuqi/go_numcalc
15
30
```
16
31
17
32
## Dependencies
33
+
18
34
There are two main external libraries used in the NumCalc package:
19
35
20
36
1. Go language: [Gonum](https://www.gonum.org/)
@@ -23,10 +39,13 @@ There are two main external libraries used in the NumCalc package:
23
39
Armadillo does not need to be installed and is called in Cgo as a static library.
24
40
25
41
## Usage
26
-
The following is a simple example of the `LogInt32()` method in conversion. The usage of other functions is consistent with this Demo.
27
-
1. Call the `NewCalculator()` method in conversion to initialize a numerical conversion object.
28
-
2. Call the `LogInt32()` method. This method will return the logarithm calculation result of the corresponding subscript of the slice `data`.
29
42
43
+
The following is a simple example of the `LogInt32()` method in conversion. The usage of other functions is consistent
44
+
with this Demo.
45
+
46
+
1. Call the `NewCalculator()` method in conversion to initialize a numerical conversion object.
47
+
2. Call the `LogInt32()` method. This method will return the logarithm calculation result of the corresponding subscript
48
+
of the slice `data`.
30
49
31
50
```go
32
51
package main
@@ -47,6 +66,7 @@ func main() {
47
66
log.Println("result is: ", result)
48
67
}
49
68
```
69
+
50
70
> [!TIP]
51
71
> This test code is in `example/example_test.go` and can be run directly.
52
72
@@ -66,7 +86,40 @@ func main() {
66
86
> ```shell
67
87
> go build ./example.exe
68
88
>```
69
-
>`example_test.go` contains a pure Go language implementation of the same function(logarithmic calculation), used to compare the calculation speed of Cgo.
89
+
>`example_test.go` contains a pure Go language implementation of the same function(logarithmic calculation), used to
90
+
> compare the calculation speed of Cgo.
91
+
92
+
## Development
93
+
94
+
As of August 2023, only the first phase of functions has been implemented, and the implementation language is all based
95
+
on the Go language.
96
+
97
+
### First phase functions
98
+
99
+
| Serial number | Type | Function | Detailed description of function| Remarks |
| 1 | Data conversion | Minimum and maximum standardization | Perform a linear transformation on the data series so that the processed data all fall within the interval [0, 1] ||
102
+
| 2 | Data conversion | Z-score standardization | Subtract the mean and divide by the variance foreach data pointin the data series so that the processed data approximately conforms to the standard normal distribution of (0, 1) ||
103
+
| 3 | Data conversion | Logarithmic transformation | y = log(base, x) | 1. Negative value processing </br> 2. base value |
104
+
| 4 | Data conversion | Square root transformation | y = √(x) | Negative value processing |
105
+
| 5 | Data Grouping | Cluster Grouping | Use cluster analysis methods to group data points into clusters with similar characteristics. Cluster grouping can be used to discover clustering patterns and categories in the data, which is useful for data mining and classification tasks. | 1. Clustering method (random_subset, static_subset, etc.) 2. Number of clusters ||
106
+
| 6 | Data Grouping | Equal Width Grouping | Divide the value range of the data into intervals of equal width. This method is simple and intuitive, but may not reflect the distribution characteristics of the data well, especially when there are imbalanced data or outliers. | Group Width |
107
+
| 7 | Data Grouping | Equal Frequency Grouping | Divide the data into groups containing the same number of data points. This method can better consider the distribution characteristics of the data, but for data containing a large number of repeated values, it may cause some groups to have the same values. | Number of Groups |
108
+
| 8 | Data Grouping | Grouping Based on Statistics | Divide the data into groups based on the quantiles of the data. Common methods include quartile grouping, decile grouping, etc. This method can divide data into groups with the same data density, which is more effective for skewed distribution data. | Grouping conditions _(Not implemented temporarily due to overlap with the equal frequency grouping function)_ |
109
+
| 9 | Outlier judgment | Standard deviation | By calculating the difference between the standard deviation of the data point and the mean, values exceeding a certain threshold are considered outliers. Usually, values exceeding 3 times the standard deviation are considered outliers | 1. true indicates an outlier</br>2. false indicates a non-outlier</br> 3. Threshold for outlier judgment |
110
+
| 10 | Outlier judgment | Box plot | According to the quartiles and outlier range of the data, values beyond the upper and lower boundaries are considered outliers. | 1. true indicates an outlier</br>2. false indicates a non-outlier |
111
+
112
+
### Phase II functions (TODO)
113
+
114
+
| Serial number | Type | Function | Function description | Remarks |
| 1 | Data smoothing | Wavelet filtering | Decompose and reconstruct signals by applying wavelet transform to remove noise or mutations and retain important features in the signal. Wavelet filtering provides better analysis and processing capabilities in both time and frequency domains. | Different base functions have a great impact on the results. Different data need to choose different base functions and frequency ranges according to the analysis requirements. </br> 1. Wavelet basis function (Daubechies wavelet, Haar wavelet, Morlet wavelet)</br>2. Scale parameter (determines the scaling factor of each wavelet basis function in the wavelet transform; a smaller scale parameter can capture higher frequency and detailed signal characteristics, while a larger scale parameter can capture lower frequency and overall trend signal characteristics)</br>3. Decomposition level (determines the order of wavelet transform; a higher decomposition level can provide more detailed frequency and scale information)</br>4. Threshold processing method (keep/discard) |
117
+
| 2 | Data smoothing | Moving average | This method smoothes the data by calculating the average value within a certain window size around the data point. The window size determines the degree of smoothing, and a larger window will smooth more fluctuations. Common moving averages include simple moving average and weighted moving average. | If the boundary cannot completely construct a window that meets the window size, the data points at these boundaries are usually removed in the output result, resulting in unequal lengths. 1. Window size </br>2. Weight </br>Boundary processing method |
118
+
| 3 | Data smoothing | Exponential smoothing | Exponential smoothing is a recursive smoothing method that gives a higher weight to recent data. The weight of past observations is controlled by specifying a smoothing coefficient, where the larger the smoothing coefficient, the greater the impact on recent data. Exponential smoothing methods are often used to smooth time series data. | Smoothing factor |
119
+
| 4 | Data smoothing | Savitzky-Golay smoothing | This is a smoothing method based on polynomial fitting, which smoothes data by fitting neighboring data around the data point to a polynomial curve. The Savitzky-Golay smoothing method can retain the overall shape and trend of the data and has a good noise suppression effect. | 1. Smoothing window </br>2. Polynomial order </br> 3. Derivative order (optional) |
120
+
| 5 | Data smoothing | Loess smoothing | Similar to Lowess smoothing, Loess smoothing is also a nonparametric local regression method. It smoothes data by fitting a polynomial to the neighboring data around the data point. Unlike Lowess smoothing, Loess smoothing uses adaptive weighted least squares to better handle nonlinear relationships in the data. | 1. Smoothing coefficient (controls the weight given to past observations) </br>2. Weighting function (default in the library) | |
121
+
| 6 | Data smoothing | Lowess smoothing | Lowess smoothing is a nonparametric local regression method that smooths data by fitting a local linear regression model. The method uses weighted least squares to estimate the smoothed value of a data point, with weights assigned based on how far away the data point is. | 1. Smoothing coefficient (controls the weight given to past observations) </br>2. Weighting function (default in the library) |
0 commit comments