SHAP Kernel Explainer for Tabular Data via Contextual AI¶
This tutorial demonstrates how to generate explanations using SHAP’s Kernel Explainer implemented by the Contextual AI library. Much of the tutorial overlaps with what is covered in the LIME tabular tutorial. To recap, the main steps for generating explanations are:
Get an explainer via the
ExplainerFactory
classBuild the text explainer
Call
explain_instance
Step 1: Import libraries¶
[1]:
# Some auxiliary imports for the tutorial
import sys
import random
import numpy as np
from pprint import pprint
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import shap
import os
# Set seed for reproducibility
np.random.seed(123456)
# Set the path so that we can import ExplainerFactory
sys.path.append('../../')
# Main Contextual AI imports
import xai
from xai.explainer import ExplainerFactory
Step 2: Train a model on a sample dataset¶
[2]:
# Load the dataset and prepare training and test sets
raw_data = datasets.load_breast_cancer()
X, y = raw_data['data'], raw_data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Instantiate a classifier, train, and evaluate on test set
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
/Users/i330688/venv_xai/lib/python3.6/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
[2]:
0.956140350877193
[3]:
raw_data['feature_names']
[3]:
array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
'mean smoothness', 'mean compactness', 'mean concavity',
'mean concave points', 'mean symmetry', 'mean fractal dimension',
'radius error', 'texture error', 'perimeter error', 'area error',
'smoothness error', 'compactness error', 'concavity error',
'concave points error', 'symmetry error',
'fractal dimension error', 'worst radius', 'worst texture',
'worst perimeter', 'worst area', 'worst smoothness',
'worst compactness', 'worst concavity', 'worst concave points',
'worst symmetry', 'worst fractal dimension'], dtype='<U23')
Step 3: Instantiate the explainer¶
[4]:
# Instantiate LimeTabularExplainer via the ExplainerFactory interface
explainer = ExplainerFactory.get_explainer(domain=xai.DOMAIN.TABULAR, algorithm=xai.ALG.SHAP)
Step 4: Build the explainer¶
Like with any explainer in Contextual AI, the SHAP Kernel Explainer implements a build_explainer
method to initialize the explainer (this can include pre-training a model or initializing some parameters). Note, however, that the build_explainer
for SHAP requires a different set of parameters than that of the LIME Tabular Explainer. This also goes for explain_instance
.
[5]:
explainer.build_explainer(
predict_fn=clf.predict_proba,
training_data=X_train,
feature_names=raw_data['feature_names']
)
Using 455 background data samples could cause slower run times. Consider using shap.kmeans(data, K) to summarize the background as K weighted samples.
Step 5: Generate some explanations¶
[6]:
clf.predict_proba(X_test[0].reshape(1, -1))
[6]:
array([[0., 1.]])
[7]:
exp = explainer.explain_instance(
instance=X_test[0],
num_samples=None,
num_features=10
)
pprint(exp)
HBox(children=(IntProgress(value=0, max=1), HTML(value='')))
{0: {'confidence': 0.0,
'explanation': [{'feature': 'mean texture = 11.97',
'score': -0.05705665528004522},
{'feature': 'mean area = 288.5',
'score': -0.059909773253767146},
{'feature': 'worst radius = 10.62',
'score': -0.06309303920038445},
{'feature': 'worst perimeter = 66.53',
'score': -0.10050683860714402},
{'feature': 'worst area = 342.9',
'score': -0.08910402332898809}]},
1: {'confidence': 1.0,
'explanation': [{'feature': 'mean texture = 11.97',
'score': 0.057056655280045665},
{'feature': 'mean area = 288.5',
'score': 0.05990977325376773},
{'feature': 'worst radius = 10.62',
'score': 0.06309303920038489},
{'feature': 'worst perimeter = 66.53',
'score': 0.10050683860714457},
{'feature': 'worst area = 342.9',
'score': 0.08910402332898854}]}}
Step 6: Save and load the explainer¶
Any explanation algorithm in Contextual AI can be saved/loaded via save_explainer
and load_explainer
, respectively.
[8]:
# Save the explainer somewhere
explainer.save_explainer('artefacts/shap_tabular.pkl')
[9]:
# Load the saved explainer in a new Explainer instance
new_explainer = ExplainerFactory.get_explainer(domain=xai.DOMAIN.TABULAR, algorithm=xai.ALG.SHAP)
new_explainer.load_explainer('artefacts/shap_tabular.pkl')
exp = new_explainer.explain_instance(
instance=X_test[0],
num_samples=None,
num_features=10
)
pprint(exp)
HBox(children=(IntProgress(value=0, max=1), HTML(value='')))
{0: {'confidence': 0.0,
'explanation': [{'feature': 'mean texture = 11.97',
'score': -0.05275998109397245},
{'feature': 'mean area = 288.5',
'score': -0.06533435495159351},
{'feature': 'worst radius = 10.62',
'score': -0.06228954336585418},
{'feature': 'worst perimeter = 66.53',
'score': -0.09468562012851084},
{'feature': 'worst area = 342.9',
'score': -0.09460083013039794}]},
1: {'confidence': 1.0,
'explanation': [{'feature': 'mean texture = 11.97',
'score': 0.05275998109397295},
{'feature': 'mean area = 288.5',
'score': 0.06533435495159409},
{'feature': 'worst radius = 10.62',
'score': 0.062289543365854405},
{'feature': 'worst perimeter = 66.53',
'score': 0.09468562012851142},
{'feature': 'worst area = 342.9',
'score': 0.0946008301303985}]}}
[10]:
# The SHAP model is pretty large, so remove it from disk
os.remove('artefacts/shap_tabular.pkl')