Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanproto recall为1.0的情况 #248

Open
JayShJi opened this issue Nov 8, 2022 · 4 comments
Open

Spanproto recall为1.0的情况 #248

JayShJi opened this issue Nov 8, 2022 · 4 comments

Comments

@JayShJi
Copy link

JayShJi commented Nov 8, 2022

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。
FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

@Shajiu
Copy link

Shajiu commented Dec 7, 2022

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

大佬你好~ 请问你是怎么复现的,我怎么没法运行呢?源码是否有问题呀?

@Shajiu
Copy link

Shajiu commented Dec 11, 2022

您是怎么实现的呢?
ou should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Fail to resize token embeddings.
Running tokenizer on dataset: 0%| | 0/20 [00:00<?, ?ba/s]
Traceback (most recent call last):
File "/code/SpanProto/nlp_trainer.py", line 285, in
main()
File "/code/SpanProto/nlp_trainer.py", line 135, in main
tokenized_datasets = processor.get_tokenized_datasets()
File "/code/SpanProto/processor/ProcessorBase.py", line 308, in get_tokenized_datasets
raw_datasets = raw_datasets.map(
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/dataset_dict.py", line 494, in map
{
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/dataset_dict.py", line 495, in
k: dataset.map(
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2092, in map
return self._map_single(
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 485, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/fingerprint.py", line 411, in wrapper
out = func(self, *args, **kwargs)
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2486, in _map_single
writer.write_batch(batch)
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_writer.py", line 458, in write_batch
pa_table = pa.Table.from_pydict(typed_sequence_examples)
File "pyarrow/table.pxi", line 1868, in pyarrow.lib.Table.from_pydict
File "pyarrow/table.pxi", line 2658, in pyarrow.lib._from_pydict
File "pyarrow/array.pxi", line 342, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 230, in pyarrow.lib.array
File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol
File "/home/shajiu/anaconda3/lib/python3.9/site-packages/datasets/arrow_writer.py", line 140, in arrow_array
out = pa.array(cast_to_python_objects(self.data, only_1d_for_numpy=True), type=type)
File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert {'input_ids': [[101, 23236, 24853, 2578, 2038, 11241, 1999, 10594, 5480, 2104, 1996, 2231, 1997, 2634, 17531, 2080, 4712, 5679, 1006, 21307, 4523, 1007, 1012, 102], [101, 2087, 1997, 9666, 2884, 1005, 1055, 3934, 2031, 2042, 1999, 5726, 28649, 2412, 2144, 1012, 102], [101, 1999, 2804, 2027, 8678, 2006, 1036, 1036, 1996, 4918, 2829, 1998, 29044, 2100, 2265, 1036, 1036, 1010, 2029, 2743, 2006, 5095, 16956, 2076, 1996, 3865, 1012, 102], [101, 1036, 1036, 8952, 2866, 1036, 1036, 3964, 2008, 1996, 4234, 2792, 1006, 1036, 1036, 18712, 2891, 15851, 2051, 24901, 2015, 999, 102], [101, 2016, 2363, 2014, 5065, 1997, 2671, 3014, 1999, 9440, 1998, 14266, 2013, 2624, 5277, 2110, 2118, 1010, 1998, 2038, 3687, 13099, 10618, 2015, 2004, 2019, 5057, 2966, 16661, 1010, 10516, 9450, 1010, 1998, 18440, 13592, 2015, 9450, 1012, 102]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'offset_mapping': [[(0, 0), (0, 4), (5, 16), (17, 25), (26, 29), (30, 38), (39, 41), (42, 46), (46, 48), (49, 54), (55, 58), (59, 69), (70, 72), (73, 78), (79, 81), (81, 82), (83, 92), (93, 99), (100, 101), (102, 104), (104, 106), (107, 108), (109, 110), (0, 0)], [(0, 0), (0, 4), (5, 7), (8, 11), (11, 13), (14, 15), (15, 16), (17, 25), (26, 30), (31, 35), (36, 38), (39, 43), (43, 49), (50, 54), (55, 60), (61, 62), (0, 0)], [(0, 0), (0, 2), (3, 11), (12, 16), (17, 29), (30, 32), (33, 34), (34, 35), (36, 39), (40, 47), (48, 53), (54, 57), (58, 63), (63, 64), (65, 69), (70, 71), (71, 72), (73, 74), (75, 80), (81, 84), (85, 87), (88, 96), (97, 105), (106, 112), (113, 116), (117, 122), (123, 124), (0, 0)], [(0, 0), (0, 1), (1, 2), (3, 8), (9, 11), (12, 13), (13, 14), (15, 20), (21, 25), (26, 29), (30, 39), (40, 47), (48, 49), (50, 51), (51, 52), (53, 55), (55, 57), (57, 60), (61, 65), (65, 69), (69, 70), (71, 72), (0, 0)], [(0, 0), (0, 3), (4, 12), (13, 16), (17, 25), (26, 28), (29, 36), (37, 43), (44, 46), (47, 52), (53, 56), (57, 66), (67, 71), (72, 75), (76, 81), (82, 87), (88, 98), (99, 100), (101, 104), (105, 108), (109, 115), (116, 126), (127, 140), (140, 141), (142, 144), (145, 147), (148, 157), (158, 165), (166, 176), (177, 178), (179, 186), (187, 197), (198, 199), (200, 203), (204, 208), (208, 211), (211, 212), (213, 223), (224, 225), (0, 0)]]} with type BatchEncoding: did not recognize Python value type when inferring an Arrow data type

@liyongqi2002
Copy link

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

I found the same problem, hoping the author can answer it.

@swaggy66
Copy link

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

您好!

在运行您提供的脚本后,发现recall一直为1.0的情况,不知是否是下面这行代码的原因(这里将真实标签也拼进去,后面评测的时候泄露了)https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588

此外,在将此处改成query_all_spans = query_predict_spans后,得到的结果和论文中相差较大,不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

应该是分词导致的公共序列问题,导致召回率为一

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants