Explain your ML model: no more black boxes 🎁
Published:
1. What's in a black box?
- Explain individual predictions
- Understand models' behaviour
- Detect errors & biases
- Generate insights about data & create new features
2. Different types of interpretation
Visualization
Textual description
Formulae
$House price = 2800 * room + 10000 * {swimming pool} + $5000 * garage$
3. Trade-off between Accuracy and Interpretability
- Find a trade-off between accuracy and interpretability.
- Explain a choice of a particular algorithm to a client.
4. Feature importance
- Train the model
- Mix up all values of the feature X. Make a prediction on an updated data.
- Compute $Importance(X) = Accuracy_{actual} − Accuracy_{permutated}$.
- Restore the actual order of the feature's values. Repeat steps 2-3 with a next feature.
Advantages:
- Concise global explanation of the model's behaviour.
- Easy to interpret.
- No need to re-train a model again and again.
Disadvantages:
- Need the ground truth values for the target.
- Connection to a model's error. It's not always bad, simply not something we need in some cases.
5. Dependency plots
- Take one sample: a single student, no loans, balance is around $1000.
- Increase the latter feature up to 5000.
- Make a prediction on an updated sample.
- What is the model output if balance==10? And so on.
- Moving along the x axis, from smaller to larger values, plot the resulting predictions on the y axis.
Advantages:
- Easy to interpret.
- Enables the interpretation of causality
Disadvantages:
- One plot can give you the analysis of only one or two features. Plots with more features would be difficult for humans to comprehend.
- An assumption of the independent features. However, this assumption is often violated in real life. Why is this a problem? Imagine that we want to draw a PDP for the data with correlated features. While we change the values of one feature, the values of the related feature stay the same. As a result, we can get unrealistic data points. For instance, we are interested in the feature Weight, but the dataset also contains such a feature as Height. As we change the value of Weight, the value of Height is fixed so we can end up having a sample with Weight==200 kg and Height==150 cm.
- Opposite effects can cancel out the feature's impact. Imagine that a half of the values of a particular feature is positively correlated with the target: the higher the value, the higher the model's outcome. On the other hand, a half of the values is negatively correlated with the target: the lower the feature's value, the higher the prediction. In this case, a PDP may be a horizontal line since the positive effects got cancelled out by the negative ones.
6. Local interpretation
Source: Why Should I Trust You?
Advantages:
- Concise and clear explanations.
- Compatible with most of data types: texts, images, tabular data.
- The speed of computation as we focus on one sample at a time.
Disadvantages:
- Only linear models are used to approximate the model's local behaviour.
- No global explanations.
7. SHAP
Advantages:
- Global and local interpretation.
- Intuitively clear local explanations: the prediction is represented as a game outcome where the features are the team players.
Disadvantages:
- Shap returns only one value for each feature, not an interpretable model as LIME does.
- Slow when creating a global interpretation.
P.S. A quick (but totally not comprehensive!) overview of some tools for interpretable ML.