SHAP Kernel Explainer for Tabular Data via Contextual AI

This tutorial demonstrates how to generate explanations using SHAP’s Kernel Explainer implemented by the Contextual AI library. Much of the tutorial overlaps with what is covered in the LIME tabular tutorial. To recap, the main steps for generating explanations are:

  1. Get an explainer via the ExplainerFactory class

  2. Build the text explainer

  3. Call explain_instance

Step 1: Import libraries

[1]:
# Some auxiliary imports for the tutorial
import sys
import random
import numpy as np
from pprint import pprint
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import shap
import os

# Set seed for reproducibility
np.random.seed(123456)

# Set the path so that we can import ExplainerFactory
sys.path.append('../../')

# Main Contextual AI imports
import xai
from xai.explainer import ExplainerFactory

Step 2: Train a model on a sample dataset

[2]:
# Load the dataset and prepare training and test sets
raw_data = datasets.load_breast_cancer()
X, y = raw_data['data'], raw_data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate a classifier, train, and evaluate on test set
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
/Users/i330688/venv_xai/lib/python3.6/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
[2]:
0.956140350877193
[3]:
raw_data['feature_names']
[3]:
array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error',
       'fractal dimension error', 'worst radius', 'worst texture',
       'worst perimeter', 'worst area', 'worst smoothness',
       'worst compactness', 'worst concavity', 'worst concave points',
       'worst symmetry', 'worst fractal dimension'], dtype='<U23')

Step 3: Instantiate the explainer

[4]:
# Instantiate LimeTabularExplainer via the ExplainerFactory interface
explainer = ExplainerFactory.get_explainer(domain=xai.DOMAIN.TABULAR, algorithm=xai.ALG.SHAP)

Step 4: Build the explainer

Like with any explainer in Contextual AI, the SHAP Kernel Explainer implements a build_explainer method to initialize the explainer (this can include pre-training a model or initializing some parameters). Note, however, that the build_explainer for SHAP requires a different set of parameters than that of the LIME Tabular Explainer. This also goes for explain_instance.

[5]:
explainer.build_explainer(
    predict_fn=clf.predict_proba,
    training_data=X_train,
    feature_names=raw_data['feature_names']
)
Using 455 background data samples could cause slower run times. Consider using shap.kmeans(data, K) to summarize the background as K weighted samples.

Step 5: Generate some explanations

[6]:
clf.predict_proba(X_test[0].reshape(1, -1))
[6]:
array([[0., 1.]])
[7]:
exp = explainer.explain_instance(
    instance=X_test[0],
    num_samples=None,
    num_features=10
)

pprint(exp)
HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

{0: {'confidence': 0.0,
     'explanation': [{'feature': 'mean texture = 11.97',
                      'score': -0.05705665528004522},
                     {'feature': 'mean area = 288.5',
                      'score': -0.059909773253767146},
                     {'feature': 'worst radius = 10.62',
                      'score': -0.06309303920038445},
                     {'feature': 'worst perimeter = 66.53',
                      'score': -0.10050683860714402},
                     {'feature': 'worst area = 342.9',
                      'score': -0.08910402332898809}]},
 1: {'confidence': 1.0,
     'explanation': [{'feature': 'mean texture = 11.97',
                      'score': 0.057056655280045665},
                     {'feature': 'mean area = 288.5',
                      'score': 0.05990977325376773},
                     {'feature': 'worst radius = 10.62',
                      'score': 0.06309303920038489},
                     {'feature': 'worst perimeter = 66.53',
                      'score': 0.10050683860714457},
                     {'feature': 'worst area = 342.9',
                      'score': 0.08910402332898854}]}}

Step 6: Save and load the explainer

Any explanation algorithm in Contextual AI can be saved/loaded via save_explainer and load_explainer, respectively.

[8]:
# Save the explainer somewhere

explainer.save_explainer('artefacts/shap_tabular.pkl')
[9]:
# Load the saved explainer in a new Explainer instance

new_explainer = ExplainerFactory.get_explainer(domain=xai.DOMAIN.TABULAR, algorithm=xai.ALG.SHAP)
new_explainer.load_explainer('artefacts/shap_tabular.pkl')

exp = new_explainer.explain_instance(
    instance=X_test[0],
    num_samples=None,
    num_features=10
)

pprint(exp)
HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

{0: {'confidence': 0.0,
     'explanation': [{'feature': 'mean texture = 11.97',
                      'score': -0.05275998109397245},
                     {'feature': 'mean area = 288.5',
                      'score': -0.06533435495159351},
                     {'feature': 'worst radius = 10.62',
                      'score': -0.06228954336585418},
                     {'feature': 'worst perimeter = 66.53',
                      'score': -0.09468562012851084},
                     {'feature': 'worst area = 342.9',
                      'score': -0.09460083013039794}]},
 1: {'confidence': 1.0,
     'explanation': [{'feature': 'mean texture = 11.97',
                      'score': 0.05275998109397295},
                     {'feature': 'mean area = 288.5',
                      'score': 0.06533435495159409},
                     {'feature': 'worst radius = 10.62',
                      'score': 0.062289543365854405},
                     {'feature': 'worst perimeter = 66.53',
                      'score': 0.09468562012851142},
                     {'feature': 'worst area = 342.9',
                      'score': 0.0946008301303985}]}}
[10]:
# The SHAP model is pretty large, so remove it from disk
os.remove('artefacts/shap_tabular.pkl')