Sklearn Custom Models And Functions
Custom models can extend the functionality of scikit-learn to meet the requirements of specific application scenarios.
In scikit-learn, in addition to using ready-made models and preprocessing functions, users can also create custom models, transformers, and functions according to their needs.
The implementation of custom models and functions usually involves inheriting scikit-learn's base classes, such as BaseEstimator and TransformerMixin, and then implementing specific fit and predict methods.
This chapter will explain the following aspects:
1. **Custom Transformers**
2. **Custom Estimators**
3. **Custom Pipeline Steps**
4. **How to Implement Custom Algorithms by Inheriting `BaseEstimator` and `TransformerMixin`**
## 1. Custom Transformers
Transformers are components used for data transformation, such as standardization, feature selection, etc. Custom transformers can inherit from `TransformerMixin` and implement `fit` and `transform` methods.
**Steps to create a custom transformer:**
* **`fit` method**: Usually used to learn data properties (such as mean, variance, feature selection criteria, etc.). The `fit` method returns the transformer itself to allow for method chaining.
* **`transform` method**: Applies the learned properties to transform or process the data.
**Custom transformer example: Custom Standardization Transformer:**
Suppose we want to implement a custom standardization transformer. Standardization scales each feature of the data to have a mean of 0 and variance of 1.
## Example
```python
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin
class CustomScaler(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
"""
Calculate mean and standard deviation for each feature
"""
self.mean_ = np.mean(X, axis=0)
self.std_ = np.std(X, axis=0)
return self # Return the object itself
def transform(self, X):
"""
Standardize data
"""
return (X - self.mean_) / self.std_
# Test custom transformer
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load data
data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create custom transformer object
scaler = CustomScaler()
# Use custom standardization
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
print("Scaled training data:n", X_train_scaled)
Output:
Scaled training data:
[[-1.47393679 1.20365799 -1.56253475 -1.31260282]
[-0.13307079 2.99237573 -1.27600637 -1.04563275]
[ 1.08589829 0.08570939 0.38585821 0.28921757]
[-1.23014297 0.75647855 -1.2187007 -1.31260282]
[-1.7177306 0.30929911 -1.39061772 -1.31260282]....
**Explanation:**
The CustomScaler class implements a standardization process, similar to StandardScaler in scikit-learn.
* The **`fit`** method calculates the mean and standard deviation of the training data and stores these values.
* The **`transform`** method transforms the data based on the mean and standard deviation calculated in `fit`.
---
## 2. Custom Estimators
Estimators refer to the models themselves, such as regressors, classifiers, etc.
Custom estimators need to inherit from the BaseEstimator class and implement fit and predict methods.
**Steps to create a custom estimator:**
* **`fit` method**: Used to train the model and calculate required parameters (such as weights, bias, etc.).
* **`predict` method**: Makes predictions on input data based on the trained parameters.
### Custom Estimator Example: Simple Classifier
Suppose we want to implement a very simple classifier: use the mean of each feature as a threshold. If the value exceeds the mean, predict class 1; otherwise, predict class 0.
## Example
```python
from sklearn.base import BaseEstimator
import numpy as np
class SimpleClassifier(BaseEstimator):
def fit(self, X, y):
"""
Train model: calculate mean for each feature
"""
self.mean_ = np.mean(X, axis=0)
return self # Return the object itself
def predict(self, X):
"""
Classification based on mean: if feature value > mean, predict 1, otherwise predict 0
"""
return (X > self.mean_).astype(int)
# Test custom classifier
X_train = np.array([[1.5, 2.5], [2.0, 3.0], [3.5, 4.5], [4.0, 5.0]])
y_train = np.array([0, 0, 1, 1])
# Create custom classifier object
classifier = SimpleClassifier()
# Train model
classifier.fit(X_train, y_train)
# Make predictions
X_test = np.array([[2.5, 3.5], [1.0, 2.0]])
y_pred = classifier.predict(X_test)
print("Predictions:", y_pred)
Output:
Predictions: [
]
**Explanation:**
* The **`fit`** method calculates the mean of the training data and stores it in `self.mean_`.
* The **`predict`** method makes classification predictions by comparing the test data with the mean.
This custom classifier is not commonly used in practice, but it demonstrates how to create a simple model using BaseEstimator.
---
## 3. Custom Pipeline Steps
scikit-learn allows custom transformers and estimators to be used as pipeline steps. This enables integrating data preprocessing and model training into a single workflow. Custom models and functions can work with pipelines just like built-in transformers and estimators.
### Using Custom Transformers and Estimators in Pipeline
## Example
```python
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
class CustomScaler(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
"""
Calculate mean and standard deviation for each feature
"""
self.mean_ = np.mean(X, axis=0)
self.std_ = np.std(X, axis=0)
return self # Return the object itself
def transform(self, X):
"""
Standardize data
"""
return (X - self.mean_) / self.std_
class SimpleClassifier(BaseEstimator):
def fit(self, X, y):
"""
Train model: calculate mean for each feature
"""
self.mean_ = np.mean(X, axis=0)
return self # Return the object itself
def predict(self, X):
"""
Classification based on mean: if feature value > mean, predict 1, otherwise predict 0
"""
return (X > self.mean_).astype(int)
# Test custom classifier
X_train = np.array([[1.5, 2.5], [2.0, 3.0], [3.5, 4.5], [4.0, 5.0]])
y_train = np.array([0, 0, 1, 1])
# Create pipeline with custom scaler and classifier
pipeline = Pipeline([
('scaler', CustomScaler()), # Custom standardization
('classifier', SimpleClassifier()) # Custom classifier
])
# Train pipeline
pipeline.fit(X_train, y_train)
X_test = np.array([[2.5, 3.5], [1.0, 2.0]])
# Predict
y_pred = pipeline.predict(X_test)
print("Predictions:", y_pred)
Output:
Predictions: [
]
**Explanation:**
We use CustomScaler and SimpleClassifier as pipeline steps, allowing the pipeline to automatically execute data preprocessing and model training.
This way, the entire pipeline workflow can be completed through a single fit and predict method, maintaining efficiency and simplicity.
---
## How to Implement Custom Algorithms by Inheriting BaseEstimator and TransformerMixin
* **`BaseEstimator`**: It is the base class for all scikit-learn estimators. It provides `get_params()` and `set_params()` methods, which allow custom estimators to work with tools like `GridSearchCV`.
* **`TransformerMixin`**: It is the base class for all scikit-learn transformers. It provides the `fit_transform()` method, which allows transformers to work with `Pipeline`.
By inheriting these two base classes, we can easily create our own custom models
YouTip