Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU #161

MoritzLaurer · 2023-01-20T14:58:25Z

🐛 Bug Report

Inference on a google colab GPU is very slow. There is no significant difference if the model runs on cuda or CPU

🔬 How To Reproduce

The following model.attribute(...) code runs for around 33 to 47 seconds both on a colab CPU or GPU. I tried passing the device to the model and the model.device confirms that it's running on cuda, but it still takes very long to run only 2 sentences. (I don't know the underlying computations for attribution enough to know if this is to be expected, or if this should be faster. If it's always that slow, then it seems practically infeasible to analyse larger corpora)

import inseq
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

print(inseq.list_feature_attribution_methods())
model = inseq.load_model("google/flan-t5-small", attribution_method="discretized_integrated_gradients", device=device)

model.to(device)

out = model.attribute(
    input_texts=["We were attacked by hackers. Was there a cyber attack?", "We were not attacked by hackers. Was there a cyber attack?"],
)

model.device

Environment

OS: linux, google colab
Python version: Python 3.8.10
Inseq version: 0.3.3

Expected behavior

Faster inference with a GPU/cuda

(Thanks btw, for the fix for returning the per-token scores in a dictionary, the new method works well :) )

The text was updated successfully, but these errors were encountered:

gsarti · 2023-01-20T15:29:47Z

Hi @MoritzLaurer , thanks for your comment!

The slowness you report is most likely specific to the discretized_integrated_gradient method, since the current implementation builds non-linear interpolation paths in a sequential manner. We currently have issue #113 tracking a bug with batching with this method, and we are in touch with the authors.

In the meantime, I suggest using the more common saliency or integrated_gradients approach that should be considerably faster on GPU. Bastings et al. 2022 shows how Gradient L2 (the default outcome using saliency in Inseq since v0.3.3) works well in terms of faithfulness on Transformer-based classifiers, so that could be a good starting point! Alternatively, attention attribution only requires forward passes, but it's less principled.

Hope it helps!

MoritzLaurer · 2023-01-22T14:31:33Z

ok, thanks, will try the other methods. (good to know that there might be a fix at some point, in my ad-hoc tests the discretized_integrated_gradient method seems to make the most interpretable attributions)

MoritzLaurer added the bug Something isn't working label Jan 20, 2023

gsarti changed the title ~~Attribute is very slow, also on google colab GPU~~ Slow DiscretizedIntegratedGradientAttribution method, also on GPU Jan 20, 2023

gsarti added enhancement New feature or request and removed bug Something isn't working labels Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU #161

Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU #161

MoritzLaurer commented Jan 20, 2023 •

edited

gsarti commented Jan 20, 2023 •

edited

MoritzLaurer commented Jan 22, 2023

Slow DiscretizedIntegratedGradientAttribution method, also on GPU #161

Slow DiscretizedIntegratedGradientAttribution method, also on GPU #161

Comments

MoritzLaurer commented Jan 20, 2023 • edited

🐛 Bug Report

🔬 How To Reproduce

Environment

Expected behavior

gsarti commented Jan 20, 2023 • edited

MoritzLaurer commented Jan 22, 2023

Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU #161

Slow `DiscretizedIntegratedGradientAttribution` method, also on GPU #161

MoritzLaurer commented Jan 20, 2023 •

edited

gsarti commented Jan 20, 2023 •

edited