Issues with Evaluation Scripts #7

A-Ayerh · 2024-01-04T02:49:49Z

This issue is related to commit `fbaf82d`

After running the script: bash ./scripts/eval_decoding.sh , the results came out to be:

corpus BLEU-1 score: 0
corpus BLEU-2 score: 0
corpus BLEU-3 score: 0
corpus BLEU-4 score: 0

{'rouge-1': {'r': 0.0960104371521744, 'p': 0.13671808632706614, 'f': 0.10633835733307583}, 'rouge-2': {'r': 0.011719396402741052, 'p': 0.013988694184239035, 'f': 0.01133032845861094}, 'rouge-l': {'r': 0.09090843088332022, 'p': 0.12862700453138184, 'f': 0.10046980133298505}}

Removing the .squeeze and .tolist may have some affect on the results...

I'll be working on this as well @MikeWangWZHL , thanks for acting fast!

The text was updated successfully, but these errors were encountered:

A-Ayerh · 2024-01-08T03:01:43Z

My predicted string in the BrainTranslator-all_decoding_result.txt file are all the same, strangely.

Ex:

target string: Everything its title implies, a standard-issue crime drama spat out from the Tinseltown assembly line.
predicted string: </s><s><s><s>He was born in the United States and raised in the UK.</s>
################################################

target string: This odd, poetic road movie, spiked by jolts of pop music, pretty much takes place in Morton's ever-watchful gaze -- and it's a tribute to the actress, and to her inventive director, that the journey is such a mesmerizing one.
predicted string: </s><s><s><s>He was born in the United States and raised in the UK.</s>
################################################

Perhaps my terminal message before presenting the BLEU scores may be relevant:

[INFO]subjects: ALL
[INFO]eeg type: GD
[INFO]using bands: ['_t1', '_t2', '_a1', '_a2', '_b1', '_b2', '_g1', '_g2']
[INFO]using device cuda:1

[INFO]loading 3 task datasets
[INFO]using subjects: ['ZAB', 'ZDM', 'ZDN', 'ZGW', 'ZJM', 'ZJN', 'ZJS', 'ZKB', 'ZKH', 'ZKW', 'ZMG', 'ZPH']
train divider = 320
dev divider = 360
[INFO]initializing a test set...
++ adding task to dataset, now we have: 456
[INFO]using subjects: ['ZAB', 'ZDM', 'ZDN', 'ZGW', 'ZJM', 'ZJN', 'ZJS', 'ZKB', 'ZKH', 'ZKW', 'ZMG', 'ZPH']
train divider = 240
dev divider = 270
[INFO]initializing a test set...
discard length zero instance: He was the son of a blacksmith Timothy Bush, Jr. and Lydia Newcomb and was born in Penfield, Monroe Co., New York on January 28, 1797.
discard length zero instance: Mary Lilian Baels (November 28, 1916 - June 7, 2002) was best known as Princess de Ruthy, the controversial morganatic second wife of King Leopold III of the Belgians.
++ adding task to dataset, now we have: 806
[INFO]using subjects: ['YSD', 'YSL', 'YDG', 'YLS', 'YMS', 'YAC', 'YFS', 'YDR', 'YAG', 'YTL', 'YFR', 'YMD', 'YRK', 'YAK', 'YIS', 'YRH', 'YRP', 'YHS']
train divider = 279
dev divider = 313
[INFO]initializing a test set...
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
expect word eeg embedding dim to be 840, but got 0, return None
++ adding task to dataset, now we have: 1407
[INFO]input tensor size: torch.Size([56, 840])

[INFO]test_set size: 1407

underkongkong · 2024-01-08T04:22:45Z

I'm facing the same problem as all generated sentences are the same sentence. It seems that the pre-trained encoder makes all the features similar.

A-Ayerh · 2024-01-08T04:26:57Z

@underkongkong Have you tried playing around with the config file parameters yet?

I wasn't sure if that would make a big difference.

aysrox · 2024-01-14T15:20:35Z

In my case the predicted string was like something:
predicted string: ~~He was born in the United States and studied at New York University.~~

~~Not sure how to fix this...~~

yanlirock · 2024-01-14T16:19:58Z

same here

underkongkong · 2024-01-18T03:36:31Z

Anyone solved this problem?

MikeWangWZHL · 2024-01-19T04:06:06Z

Thanks for everyone's effort in the discussion; I haven't got time to test out the issue but will work on it later;
Before further notice, please consider STOP using the code for the purpose of reproducing the results in the paper, as mentioned in the README note, it will probably fail; Nevertheless, the overall idea is still valid for potential future work with stronger LLMs. Sorry again for the inconvenience!

girlsending0 · 2024-01-25T07:12:57Z

I found how to fix this problem.

In eval_decoding.py file,

predictions=tokenizer.encode(predicted_string) this code should be changed to predictions=tokenizer.encode(predicted_string[0])

predicted_string is list, so we put the only string.
In our case, we set the batch size=1, so we fix the predicted_string => predicted_string[0]

This code fix the below problem:
corpus BLEU-1 score: 0
corpus BLEU-2 score: 0
corpus BLEU-3 score: 0
corpus BLEU-4 score: 0

{'rouge-1': {'r': 0.0960104371521744, 'p': 0.13671808632706614, 'f': 0.10633835733307583}, 'rouge-2': {'r': 0.011719396402741052, 'p': 0.013988694184239035, 'f': 0.01133032845861094}, 'rouge-l': {'r': 0.09090843088332022, 'p': 0.12862700453138184, 'f': 0.10046980133298505}}

to (in my case, )

corpus BLEU-1 score: 0.11137150833175373
corpus BLEU-2 score: 0.02308700455643944
corpus BLEU-3 score: 0.0057795258674805845
corpus BLEU-4 score: 0.0018112469683353798

But in my case, BrainTranslator model generate the only one sentence..

I am doing research with the author's code. We will update in the future if there are any corrections.

Thanks to the @MikeWangWZHL .

underkongkong · 2024-01-25T09:50:57Z

I found how to fix this problem.

In eval_decoding.py file,

predictions=tokenizer.encode(predicted_string) this code should be changed to predictions=tokenizer.encode(predicted_string[0])

predicted_string is list, so we put the only string. In our case, we set the batch size=1, so we fix the predicted_string => predicted_string[0]

This code fix the below problem: corpus BLEU-1 score: 0 corpus BLEU-2 score: 0 corpus BLEU-3 score: 0 corpus BLEU-4 score: 0

{'rouge-1': {'r': 0.0960104371521744, 'p': 0.13671808632706614, 'f': 0.10633835733307583}, 'rouge-2': {'r': 0.011719396402741052, 'p': 0.013988694184239035, 'f': 0.01133032845861094}, 'rouge-l': {'r': 0.09090843088332022, 'p': 0.12862700453138184, 'f': 0.10046980133298505}}

to (in my case, )

corpus BLEU-1 score: 0.11137150833175373 corpus BLEU-2 score: 0.02308700455643944 corpus BLEU-3 score: 0.0057795258674805845 corpus BLEU-4 score: 0.0018112469683353798

But in my case, BrainTranslator model generate the only one sentence..

I am doing research with the author's code. We will update in the future if there are any corrections.

Thanks to the @MikeWangWZHL .

I can't find this code in this project: predictions=tokenizer.encode(predicted_string)

A-Ayerh changed the title ~~Issues with Evaluation Scripts and Missing Files for EEG-to-Text Decoding Model~~ Issues with Evaluation Scripts Jan 8, 2024

MikeWangWZHL mentioned this issue Jan 19, 2024

Issues with eval script #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with Evaluation Scripts #7

Issues with Evaluation Scripts #7

A-Ayerh commented Jan 4, 2024 •

edited

A-Ayerh commented Jan 8, 2024 •

edited

underkongkong commented Jan 8, 2024

A-Ayerh commented Jan 8, 2024

aysrox commented Jan 14, 2024

yanlirock commented Jan 14, 2024

underkongkong commented Jan 18, 2024

MikeWangWZHL commented Jan 19, 2024

girlsending0 commented Jan 25, 2024

underkongkong commented Jan 25, 2024

Issues with Evaluation Scripts #7

Issues with Evaluation Scripts #7

Comments

A-Ayerh commented Jan 4, 2024 • edited

This issue is related to commit fbaf82d

A-Ayerh commented Jan 8, 2024 • edited

Ex:

underkongkong commented Jan 8, 2024

A-Ayerh commented Jan 8, 2024

aysrox commented Jan 14, 2024

yanlirock commented Jan 14, 2024

underkongkong commented Jan 18, 2024

MikeWangWZHL commented Jan 19, 2024

girlsending0 commented Jan 25, 2024

underkongkong commented Jan 25, 2024

A-Ayerh commented Jan 4, 2024 •

edited

This issue is related to commit `fbaf82d`

A-Ayerh commented Jan 8, 2024 •

edited