Bug-Tracker MPS issues #154

lsickert · 2022-12-16T18:48:58Z

🐛 Bug Report

Even after updating to the newest pytorch version 1.13.1 several issues with the mps-backend still remain when it is enabled in the code. There still seems to be some inconsistency across the different devices depending on the operations that are run, as can be seen below.

The goal of this issue is primarily to collect and highlight these problems.

🔬 How To Reproduce

Steps to reproduce the behavior:

go to inseq/utils/torch_utils and change cpu to mps in line 229 to enable the mps-backend
run make fast-test to run the tests

Code sample

see above

Environment

OS: macOS
Python version: 3.9.7

Screenshots

Running the tests this way generates the following error report:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/attr/feat/test_feature_attribution.py::test_mcd_weighted_attribution - NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - NotImplementedError: The operator 'aten::cumsum.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
==================================================================== 3 failed, 25 passed, 442 deselected, 6 warnings in 76.36s (0:01:16) =====================================================================

When run with the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 set, the following errors are still occuring:

========================================================================================== short test summary info ===========================================================================================
FAILED tests/models/test_huggingface_model.py::test_attribute_slice_seq2seq - RuntimeError: shape '[2, 1]' is invalid for input of size 1
FAILED tests/models/test_huggingface_model.py::test_attribute_decoder - AssertionError: assert 26 == 27
==================================================================== 2 failed, 26 passed, 442 deselected, 6 warnings in 113.36s (0:01:53) ====================================================================

These errors do not occur when running the tests on other backends, implying that there is still some inconsistency between mps and the other torch backends.

📈 Expected behavior

All tests should run consistently across all torch backends.

📎 Additional context

The text was updated successfully, but these errors were encountered:

gsarti · 2022-12-17T18:00:23Z

I wouldn't dismiss the last errors (with PYTORCH_ENABLE_MPS_FALLBACK=1) as MPS problems necessarily, since they do not seem to be related to the content of the computed matrices, but rather to their shape.

Does enabling MPS fallback let you run regular calls to model.attribute in your code @lsickert ?

lsickert · 2023-01-15T20:01:49Z

I have not tested this any further yet, but I thought that it is strange that these errors in the tests only occur when MPS is enabled and neither on CUDA or CPU, so I feel like there might still be something wrong with the backend here. I will test a bit more once I get my new MacBook in a few days (since then I will be able test on both M1 and AMD GPU for a few days before I return my old one).

lsickert · 2023-01-18T22:31:49Z

After some more testing I believe it is certainly better to stay away from MPS for now until it is more mature. E.g. just running this basic setup

model = inseq.load_model("gpt2", "saliency")
out = model.attribute(
    "The developer argued",
    generation_args={"max_new_tokens": 10},
)

with MPS gives the generation output of: The developer argued that\n\n\n\n\n\n\n\n\n
without MPS it gives the generation output of: The developer argued that the new system would be more efficient and more

Considering that the first one is basically only empty tokens that get generated it certainly explains the different shapes I got in the error messages above.

gsarti · 2023-01-19T09:25:12Z

Thanks for testing this @lsickert! Maybe worth raising an issue on HF transformers to understand where's the issue since it's clearly a problem with generation. Would you be willing to take the matter into your hands?

lsickert · 2023-01-19T10:30:56Z

Yes sure, I can raise it with them, although I am fairly confident that it is probably an underlying PyTorch issue since I managed to trace it back to a place in the gpt2-code where transformers starts calling some tensor methods. But to be sure I will have to investigate a bit more.

lsickert added bug Something isn't working help wanted Extra attention is needed labels Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug-Tracker MPS issues #154

Bug-Tracker MPS issues #154

lsickert commented Dec 16, 2022 •

edited

gsarti commented Dec 17, 2022

lsickert commented Jan 15, 2023

lsickert commented Jan 18, 2023

gsarti commented Jan 19, 2023

lsickert commented Jan 19, 2023

Bug-Tracker MPS issues #154

Bug-Tracker MPS issues #154

Comments

lsickert commented Dec 16, 2022 • edited

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

Screenshots

📈 Expected behavior

📎 Additional context

gsarti commented Dec 17, 2022

lsickert commented Jan 15, 2023

lsickert commented Jan 18, 2023

gsarti commented Jan 19, 2023

lsickert commented Jan 19, 2023

lsickert commented Dec 16, 2022 •

edited