Complete-Data-Science-Toolkits

The overall objective of this toolkit is to provide and offer a free collection of data analysis and machine learning that is specifically suited for doing data science. Its purpose is to get you started in a matter of minutes. You can run this collections either in Jupyter notebook, python alone or the html version.

Features

Machine Learning

Cross-Validation
Evaluating Classification Metrics
Evaluating Clustering Metrics
Evaluating Regression Metrics
Grid Search
Preprocessing Encoding Categorical Features
Preprocessing Binarization
Preprocessing Imputing Missing Values
Preprocessing Normalization
Preprocessing StandardScaler
Randomized Parameter Optimization

Numpy

Adding, Removing, and Splitting Arrays
Sorting arrays
Matrix object
Statistics Vector Math
Structured Arrays
Import, Export, Slicing, Indexing
Data to from string

Pandas

Complete pandas
Groupby in Pandas
Mapping
Filtering
Applying

Visualization

BarPlots
Customization Matplotlib
Working with Image
Working with text

Naming Conventions

The naming convections I followed is:
[yyyy-mm-dd-in-project-name-library].extention
yyyy = stands for year
mm = stands for month
dd = stands for day
in = my initial, for example: Saleban Olow = so
library = numpy, pandas, sklearn, matplotlib
project-name = each project name
extention = .ipynb, .py, .html
Example: 2017-25-11-so-cross-validation-sklearn.ipynb

Code Samples:

Cross Validation

from sklearn.model_selection import cross_val_score
model = SVC(kernel='linear', C=1)
# let's try it using cv
scores = cross_val_score(model, X, y, cv=5)

Grid Search

from sklearn.grid_search import GridSearchCV
params = {"n_neighbors": np.arange(1,5), "metric": ["euclidean", "cityblock"]}
grid = GridSearchCV(estimator=knn, param_grid=params)
grid.fit(X_train, y_train)
print(grid.best_score)
print(grid.best_estimator_.n_neighbors)

Preprocessing Imputing Missing Values

from sklearn.preprocessing import Imputer
impute = Imputer(missing_values = 0, strategy='mean', axis=0)
impute.fit_transform(X_train)

Randomized Parameter Optimization

from sklearn.grid_search import RandomizedSearchCV
params = {"n_neighbors" : range(1,5), "weights": ["uniform", "distance"]}
rsearch = RandomizedSearchCV(estimator=knn, param_distributions=params, cv=4, n_iter=8, random_state=5)
rsearch.fit(X_train, y_train)
print(rsearch.best_score_)

Model fitting supervised and unsupervised learning

#supervised learning
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
#unsupervised learning
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)
pca_model = pca.fit_transform(X_train)

Working with numpy arrays

import numpy as np 
#appends values to end of arr
np.append(arr, values)
#inserts values into arr before index 2
np.insert(arr, 2, values)

Indexing and Slicing arrays

import numpy as np 
#return the element at index 5
arr = np.array([[1,2,3,4,5,6,7]])
arr[5]
#returns the 2D array element on index 
arr[2,5]
#assign array element on index 1 the value 4
arr[1] = 4
#assign array element on index [1][3] the value 10
arr[1,3] = 10

Creating DataFrame

import pandas as pd 
#specify values for each rows and columns
df = pd.DataFrame(
	[[4,7,10],
	 [5,8,11],
	 [6,9,12]],
	 index=[1,2,3],
	 columns=['a','b','c'])

groupby pandas

import pandas as pd 
import pandas as pd 
#return a groupby object, grouped by values in column named 'cities'
df.groupby(by="Cities")

handling missing values

import pandas as pd 
#drop rows with any column having NA/null data.
df.dropna()
#replace all NA/null data with value
df.fillna(value)

Melt function

import pandas as pd 
#most pandas methods return a DataFrame so that
#this improves readability of code
df = (pd.melt(df)
	  .rename(columns={'old_name':'new_name', 'old_name':'new_name'})
	  .query('new_name >= 200')
)

Save plot

mport matplotlib.pyplot as plt 
#saves plot/figure to image
plt.savefig('pic_name.png')

Marker, lines

import matplotlib.pyplot as plt 
#add * for every data point
plt.plot(x,y, marker='*')
#adds dot for every data point
plt.plot(x,y, marker='.')

Figures, Axis

import matplotlib.pyplot as plt 
#a container that contains all plot elements
fig = plt.figures()
#Initializes subplot
fig.add_axes()
#A subplot is an axes on a grid system, rows-cols num
a = fig.add_subplot(222)
#adds subplot
fig, b = plt.subplots(nrows=3, ncols=2)
#creates subplot
ax = plt.subplots(2,2)

Working with text plot

import matplotlib.pyplot as plt 
#places text at coordinates 1/1
plt.text(1,1, 'Example text', style='italic')
#annotate the point with coordinates xy with text 
ax.annotate('some annotation', xy=(10,10))
#just put math formula
plt.title(r'$delta_i=20$',fontsize=10)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
All HTML Codes		All HTML Codes
All Notebooks		All Notebooks
All Python Codes		All Python Codes
advanced python pandas/ipython notebook		advanced python pandas/ipython notebook
notebook - machine learning sklearn/ipython notebook		notebook - machine learning sklearn/ipython notebook
notebook - numpy/ipython notebook		notebook - numpy/ipython notebook
notebook - pandas		notebook - pandas
notebook - visualization/ipython notebook		notebook - visualization/ipython notebook
snippets - machine learning sklearn		snippets - machine learning sklearn
snippets - numpy		snippets - numpy
snippets - pandas		snippets - pandas
snippets - time series analysis		snippets - time series analysis
snippets - visualization		snippets - visualization
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Complete-Data-Science-Toolkits

Features

Machine Learning

Numpy

Pandas

Visualization

Naming Conventions

Code Samples:

About

Uh oh!

Releases

Packages

Languages

License

syedDS/Complete-Data-Science-Toolkits

Folders and files

Latest commit

History

Repository files navigation

Complete-Data-Science-Toolkits

Features

Machine Learning

Numpy

Pandas

Visualization

Naming Conventions

Code Samples:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages