Pathfinder task #11

cifkao · 2020-12-07T20:31:46Z

Could you please specify which pathfinder task is used in the paper? I'm assuming it's pathfinder32, but which difficulty?

Also, the task is broken. There is no way to specify the path to the data, and the pipeline code tries to reference _PATHFINER_TFDS_PATH (note the typo), which is never defined (even without the typo).

The text was updated successfully, but these errors were encountered:

MostafaDehghani · 2020-12-07T22:52:13Z

Hi @cifkao,

Thanks for raising these issues and for helping us improve the codebase.

For LRA, we use pathfinder32_hard for the Pathfinder experiments and pathfinder128_hard for the Path-X.
I'll send a PR to add this to the readme for the task.

Also regarding _PATHFINER_TFDS_PATH, it should be set to the directory where you have the unzipped data from https://storage.googleapis.com/long-range-arena/lra_release.gz, i.e. where the following data for all the pathfinder tasks live:

pathfinder128/
pathfinder256/
pathfinder32/
pathfinder64/

I'll address this issue in my PR as well.

cifkao · 2020-12-08T11:06:34Z

Thanks. I set _PATHFINER_TFDS_PATH as you advised, and now I'm getting this error:

I1208 11:46:39.121981 140014513174336 dataset_builder.py:529] Constructing tf.data.Dataset for split hard[:80%], from /mnt/beegfs/projects/tpt-s2a-4/data/lra_release/pathfinder32/1.0.0
Traceback (most recent call last):
  File "lra_benchmarks/image/train.py", line 420, in <module>
    app.run(main)
  File "/mnt/beegfs/home/cifka/venv/lra/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/mnt/beegfs/home/cifka/venv/lra/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "lra_benchmarks/image/train.py", line 337, in main
    normalize=normalize)
  File "/mnt/beegfs/home/cifka/d/projects/long-range-arena/lra_benchmarks/image/input_pipeline.py", line 182, in get_pathfinder_base_datasets
    train_dataset = get_split(f'{split}[:80%]')
  File "/mnt/beegfs/home/cifka/d/projects/long-range-arena/lra_benchmarks/image/input_pipeline.py", line 175, in get_split
    split=split, decoders={'image': tfds.decode.SkipDecoding()})
  File "/mnt/beegfs/home/cifka/venv/lra/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 535, in as_dataset
    ) % (self.name, self._data_dir_root))
AssertionError: Dataset pathfinder32: could not find data in /mnt/beegfs/projects/tpt-s2a-4/data/lra_release/. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.

I think the problem is my pathfinder32 doesn't contain a directory called 1.0.0:

$ ls /mnt/beegfs/projects/tpt-s2a-4/data/lra_release/pathfinder32/
curv_baseline  curv_contour_length_14  curv_contour_length_9

MostafaDehghani · 2020-12-08T11:47:09Z

There is a small problem with the zip file we release. Some extra files slipped into the zip file while archiving. We are fixing that and will upload a new zip file with a better structure of directories and no unnecessary files.
In the meantime, can you use this path /mnt/beegfs/projects/tpt-s2a-4/data/lra_release/lra_release and see if it works?

cifkao · 2020-12-08T14:40:55Z

In the meantime, can you use this path /mnt/beegfs/projects/tpt-s2a-4/data/lra_release/lra_release and see if it works?

I had in fact deleted everything except for lra_release/lra_release, because it seemed to be a subset of what is under lra_release/lra_release. So actually /mnt/beegfs/projects/tpt-s2a-4/data/lra_release/ should already have all the files (I do have the pathfinder32 directory there).

Either way, I checked the archive and I cannot find anything called 1.0.0 (which is where it's trying to load the data from).

MostafaDehghani · 2020-12-08T17:03:07Z

Thank you @cifkao for checking this and sorry for the trouble.

We checked and it turned out that we released the raw images for the pathfinder datasets and you need to make a TFDS files that you can generate using this code:
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/data/pathfinder.py

However, we now also have the generated TFDS files available to make it convenient for people to use LRA. Here you can download the TFDS files for pathfinder: https://storage.cloud.google.com/long-range-arena/pathfinder_tfds.gz

and then set _PATHFINER_TFDS_PATH to the unzipped directory. Let us know if you hit any other issue.

cifkao · 2020-12-08T22:43:50Z

Seems to work, thanks!

MostafaDehghani · 2020-12-08T22:51:13Z

Perfect! Let's keep this issue open until I send the PR that adds these information to the Readme :)

renebidart · 2021-06-12T15:52:17Z

Is anyone able to reproduce the paper's results using performer on pathfinder? Accuracy is much worse (62% vs. 77%). I was able to approximately reproduce with transformer and bigbird.

Splend1d · 2021-08-20T16:58:47Z

Is anyone able to reproduce the paper's results using performer on pathfinder? Accuracy is much worse (62% vs. 77%). I was able to approximately reproduce with transformer and bigbird.

@renebidart
Same here, although for me the results were much worse. (52% for performer)
bigbird is reproducible (73.48%)
my training shell script is as follow :

export _PATHFINER_TFDS_PATH=./TFDS
difficulty=hard
PYTHONPATH="$(pwd)":"$PYTHON_PATH" python lra_benchmarks/image/train.py \
      --config=lra_benchmarks/image/configs/pathfinder32/bigbird_base.py \
      --model_dir=./results/pathfinder32_${difficulty} \
      --task_name pathfinder32_${difficulty}

alexmathfb · 2021-09-01T15:51:50Z

This version of the link does not require you to log in to a Google account: https://storage.googleapis.com/long-range-arena/pathfinder_tfds.gz

yinzhangyue · 2021-11-16T10:34:26Z

I can't reproduce performer's result in pathfinder32_hard task either. Get just 50.47% best eval result.
My training shell script is as follow :
PYTHONPATH="$(pwd)":"$PYTHON_PATH" python lra_benchmarks/image/train.py \ --config=lra_benchmarks/image/configs/pathfinder32/performer_base.py \ --model_dir=./tmp/pathfinder_F \ --task_name=pathfinder32_hard

vladyorsh · 2021-11-19T18:24:09Z

@yinzhangyue

I can't reproduce performer's result in pathfinder32_hard task either. Get just 50.47% best eval result. My training shell script is as follow : PYTHONPATH="$(pwd)":"$PYTHON_PATH" python lra_benchmarks/image/train.py \ --config=lra_benchmarks/image/configs/pathfinder32/performer_base.py \ --model_dir=./tmp/pathfinder_F \ --task_name=pathfinder32_hard

Me neither. Furthermore, I've taken a look at the model config and it doesn't make sense -- the QKV dim is set to 16, while MLP and hidden are 32. I've skimmed through the code, these are actual dimensions, not the head ones after split.

The hyper-parameters used for the xformer model are as follow:
4 layers, 8 heads, 128 as the hidden dimensions of FFN blocks, 128 as the query/key/value hidden
dimensions, and the learning rate of 0.01.

Similar problems exist with other tasks.

MostafaDehghani · 2021-11-20T15:35:15Z

@EternalSorrrow,
Please take a look at my comment here: #37 (comment)

I'll soon send a fix for the issue with the configs of pathfinder.

vladyorsh · 2021-11-29T16:11:15Z

@MostafaDehghani
Thanks, these make sense. Albeit I'm still struggling to reproduce the results after re-implementation.

NathanLeroux-git · 2023-10-02T18:09:20Z

Thank you @cifkao for checking this and sorry for the trouble.

We checked and it turned out that we released the raw images for the pathfinder datasets and you need to make a TFDS files that you can generate using this code: https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/data/pathfinder.py

However, we now also have the generated TFDS files available to make it convenient for people to use LRA. Here you can download the TFDS files for pathfinder: https://storage.cloud.google.com/long-range-arena/pathfinder_tfds.gz

and then set _PATHFINER_TFDS_PATH to the unzipped directory. Let us know if you hit any other issue.

Hello,
I fallowed these steps and use this file https://storage.cloud.google.com/long-range-arena/pathfinder_tfds.gz, but I still got the fallowing issue
"AssertionError: Dataset pathfinder32: could not find data in /Users/user/tensorflow_datasets"
while "tensorflow_datasets" is the repository where I extracted the .gz file, and I replaced _PATHFINER_TFDS_PATH="/Users/user/tensorflow_datasets".
Did you had the chance to fix these previous issues?
Thanks

liuyang148 mentioned this issue Aug 31, 2021

Pathfinder task cannot converge. #37

Closed

GeoffNN mentioned this issue Apr 20, 2022

Added instructions for loading the TFDS pathfinder data #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pathfinder task #11

Pathfinder task #11

cifkao commented Dec 7, 2020 •

edited

MostafaDehghani commented Dec 7, 2020

cifkao commented Dec 8, 2020

MostafaDehghani commented Dec 8, 2020 •

edited

cifkao commented Dec 8, 2020 •

edited

MostafaDehghani commented Dec 8, 2020

cifkao commented Dec 8, 2020

MostafaDehghani commented Dec 8, 2020

renebidart commented Jun 12, 2021

Splend1d commented Aug 20, 2021

alexmathfb commented Sep 1, 2021

yinzhangyue commented Nov 16, 2021

vladyorsh commented Nov 19, 2021 •

edited

MostafaDehghani commented Nov 20, 2021

vladyorsh commented Nov 29, 2021

NathanLeroux-git commented Oct 2, 2023

Pathfinder task #11

Pathfinder task #11

Comments

cifkao commented Dec 7, 2020 • edited

MostafaDehghani commented Dec 7, 2020

cifkao commented Dec 8, 2020

MostafaDehghani commented Dec 8, 2020 • edited

cifkao commented Dec 8, 2020 • edited

MostafaDehghani commented Dec 8, 2020

cifkao commented Dec 8, 2020

MostafaDehghani commented Dec 8, 2020

renebidart commented Jun 12, 2021

Splend1d commented Aug 20, 2021

alexmathfb commented Sep 1, 2021

yinzhangyue commented Nov 16, 2021

vladyorsh commented Nov 19, 2021 • edited

MostafaDehghani commented Nov 20, 2021

vladyorsh commented Nov 29, 2021

NathanLeroux-git commented Oct 2, 2023

cifkao commented Dec 7, 2020 •

edited

MostafaDehghani commented Dec 8, 2020 •

edited

cifkao commented Dec 8, 2020 •

edited

vladyorsh commented Nov 19, 2021 •

edited