使用其他轉換器實作新的轉換器

在許多情況下,自訂模型會利用已具有相關聯轉換器的現有模型。為了轉換這個拼湊模型,必須呼叫現有的轉換器。此範例示範如何執行此操作。範例 實作新的轉換器 可以透過使用 PCA 重新撰寫。然後,我們可以重複使用與此模型相關聯的轉換器。

自訂模型

讓我們使用 scikit-learn API 實作一個簡單的自訂模型。該模型是預處理,用於對不相關的相關隨機變數進行去相關。如果 X 是特徵矩陣,則 V=\frac{1}{n}X'X 是共變異數矩陣。我們計算 X V^{1/2}

import pickle
from io import BytesIO
import numpy
from numpy.testing import assert_almost_equal
from onnxruntime import InferenceSession
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from skl2onnx import update_registered_converter
from skl2onnx.algebra.onnx_operator import OnnxSubEstimator
from skl2onnx import to_onnx


class DecorrelateTransformer(TransformerMixin, BaseEstimator):
    """
    Decorrelates correlated gaussian features.

    :param alpha: avoids non inversible matrices
        by adding *alpha* identity matrix

    *Attributes*

    * `self.mean_`: average
    * `self.coef_`: square root of the coveriance matrix
    """

    def __init__(self, alpha=0.0):
        BaseEstimator.__init__(self)
        TransformerMixin.__init__(self)
        self.alpha = alpha

    def fit(self, X, y=None, sample_weights=None):
        self.pca_ = PCA(X.shape[1])
        self.pca_.fit(X)
        return self

    def transform(self, X):
        return self.pca_.transform(X)


def test_decorrelate_transformer():
    data = load_iris()
    X = data.data

    dec = DecorrelateTransformer()
    dec.fit(X)
    pred = dec.transform(X)
    cov = pred.T @ pred
    for i in range(cov.shape[0]):
        cov[i, i] = 1.0
    assert_almost_equal(numpy.identity(4), cov)

    st = BytesIO()
    pickle.dump(dec, st)
    dec2 = pickle.load(BytesIO(st.getvalue()))
    assert_almost_equal(dec.transform(X), dec2.transform(X))


test_decorrelate_transformer()

data = load_iris()
X = data.data

dec = DecorrelateTransformer()
dec.fit(X)
pred = dec.transform(X[:5])
print(pred)
[[-2.68412563e+00  3.19397247e-01 -2.79148276e-02 -2.26243707e-03]
 [-2.71414169e+00 -1.77001225e-01 -2.10464272e-01 -9.90265503e-02]
 [-2.88899057e+00 -1.44949426e-01  1.79002563e-02 -1.99683897e-02]
 [-2.74534286e+00 -3.18298979e-01  3.15593736e-02  7.55758166e-02]
 [-2.72871654e+00  3.26754513e-01  9.00792406e-02  6.12585926e-02]]

轉換為 ONNX

讓我們嘗試轉換它,看看會發生什麼事。

try:
    to_onnx(dec, X.astype(numpy.float32))
except Exception as e:
    print(e)
Unable to find a shape calculator for type '<class '__main__.DecorrelateTransformer'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

此錯誤表示沒有與 DecorrelateTransformer 相關聯的轉換器。讓我們來做吧。它需要實作以下兩個函式,一個形狀計算器和一個具有與下方相同簽名的轉換器。首先是形狀計算器。我們檢索輸入類型,並告訴輸出類型具有相同的類型、相同的列數和特定數量的欄。

def decorrelate_transformer_shape_calculator(operator):
    op = operator.raw_operator
    input_type = operator.inputs[0].type.__class__
    input_dim = operator.inputs[0].type.shape[0]
    output_type = input_type([input_dim, op.pca_.components_.shape[1]])
    operator.outputs[0].type = output_type

轉換器。我們需要注意的一件事是目標 opset。此資訊對於確保每個節點都按照該 opset 的規範定義非常重要。

def decorrelate_transformer_converter(scope, operator, container):
    op = operator.raw_operator
    opv = container.target_opset
    out = operator.outputs

    # We retrieve the unique input.
    X = operator.inputs[0]

    # We tell in ONNX language how to compute the unique output.
    # op_version=opv tells which opset is requested
    Y = OnnxSubEstimator(op.pca_, X, op_version=opv, output_names=out[:1])
    Y.add_to(scope, container)

我們需要讓 skl2onnx 知道新的轉換器。

update_registered_converter(
    DecorrelateTransformer,
    "SklearnDecorrelateTransformer",
    decorrelate_transformer_shape_calculator,
    decorrelate_transformer_converter,
)


onx = to_onnx(dec, X.astype(numpy.float32))

sess = InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])

exp = dec.transform(X.astype(numpy.float32))
got = sess.run(None, {"X": X.astype(numpy.float32)})[0]


def diff(p1, p2):
    p1 = p1.ravel()
    p2 = p2.ravel()
    d = numpy.abs(p2 - p1)
    return d.max(), (d / numpy.abs(p1)).max()


print(diff(exp, got))
(3.560125949597648e-07, 0.0003158352661960492)

讓我們檢查一下它是否也能使用 double。

onx = to_onnx(dec, X.astype(numpy.float64))

sess = InferenceSession(onx.SerializeToString(), providers=["CPUExecutionProvider"])

exp = dec.transform(X.astype(numpy.float64))
got = sess.run(None, {"X": X.astype(numpy.float64)})[0]
print(diff(exp, got))
(0.0, 0.0)

正如預期的那樣,使用 double 時差異會較小。

腳本的總執行時間:(0 分鐘 0.056 秒)

由 Sphinx-Gallery 產生的圖庫