Accelerate the scikit-learn machine learning library with sklearnex

Little scum · Posted on 4/27/2025 9:59:40 AM

Scikit-learn

SKTet, full name scikit-learn, is a machine learning library in Python, built on the basis of data science packages such as numpy, scipy, and matplotlib, covering almost all aspects of machine learning such as sample data, data preprocessing, model verification, feature selection, classification, regression, clustering, and dimensionality reduction. Unlike the deep learning stock available in various frameworks such as pytorch and TensorFlow, sklearn is the preferred library for traditional machine learning in python, and there are no other competitors.

Official Website:The hyperlink login is visible.
Source:The hyperlink login is visible.

Extension for Scikit-learn

As a classic machine learning framework, scikit-learn has been developing for more than ten years since its birth, but its computing speed has been widely criticized by users. Extension for Scikit-learn is a free AI software accelerator designed to help youExisting Scikit-learn code provides 10 to 100x faster speedups。 Software acceleration is achieved through vector instructions, memory optimization for AI hardware, threading, and optimization.

With the Scikit-learn extension, you can:

Training and inference are 100 times faster with the same mathematical precision
Benefit from performance improvements across different CPU hardware configurations, including GPU and multi-GPU configurations
Integrate extensions into your existing Scikit-learn applications without modifying code
Continue to use the open-source scikit-learn API
Use a few lines of code or enable and disable extensions in the command line

Source:The hyperlink login is visible.
Documentation:The hyperlink login is visible.

Test

First, use conda to prepare the environment and install the necessary packages with the following command:

Login is visible.

The new Python test code is as follows:

Login is visible.

The accelerated code is as follows:

Login is visible.

Source code interpretation:

1.X, y = make_regression(n_samples=2000000, n_features=100, noise=1, random_state=42)
make_regression is a function in scikit-learn that generates a dataset for linear regression problems.
n_samples=2000000: Specifies the number of samples in the generated dataset of 2,000,000.
n_features=100: Specifies that each sample has 100 features (i.e., input variables).
noise=1: The standard deviation of adding noise on the target value y is 1. This means that the target value will have some random perturbations.
random_state=42: Set the random seed to 42, ensuring that the generated data is the same every time the code is run (repeatability).
Results: X is a NumPy array of shape (2000000, 100) representing 2,000,000 samples with 100 features each; y is a one-dimensional array of 2,000,000 in length that represents the target value.

2. X_train, _, y_train, _ = train_test_split(X, y, test_size=0.33, random_state=42)
train_test_split is a function in scikit-learn that splits the dataset into training and test sets.
test_size=0.33: Indicates that 33% of the dataset is allocated to the test set and the remaining 67% is allocated to the training set.
random_state=42: Set the random seed to 42, ensuring that the result of the data split is the same every time you run the code.
: Use an underscore to ignore the return value. Here we only care about the training set part (X_train and y_train) and ignore the test set part.
Outcome:
X_train: The feature matrix of the training set, shaped like (1340000, 100) (about 67% of the data).
y_train: The target value of the training set with a length of 1,340,000.

3. model = LinearRegression()
LinearRegression is a class in scikit-learn that implements linear regression models.
This line of code creates an instance of a linear regression model and assigns values to the variable model.
By default, LinearRegression does not regularize data (i.e., no L1 or L2 regularization).

4. model.fit(X_train, y_train)
Fit is a method in the scikit-learn model that is used to train the model.
Here the fit method is called to fit the linear regression model using the training set data X_train and the target value y_train.
The model calculates an optimal set of weights (coefficients) and intercepts to minimize the error between the predicted and true values (usually least squares).
Once the training is complete, the model's parameters are stored in model.coef_ (the weight of the feature) and model.intercept_ (intercept).

The execution time is as follows:

You can see that the effect is very obvious both without acceleration and after acceleration,It takes about 16 seconds without acceleration and only 0.1 seconds after acceleration！！！！

[Tips] Accelerate the scikit-learn machine learning library with sklearnex

Related Posts

Sections viewed