Task Parsing Part for Pytorch Implementation #552

ZhengTang1120 · 2021-09-16T19:18:44Z

@MihaiSurdeanu @bethard @kwalcock
Here is my current code of the MTL in torch. Thanks to Steve, I already fixed few bugs in the code.

This is just the task manager and file reader part, I will do another pull after I get the NER task implemented.

MihaiSurdeanu · 2021-09-16T20:47:36Z

Looks nice so far!

also refined some functions in embedding layer

Now, the model initialization part is working

fixed bugs on UNK word embedding set dropout prob to 0.1 add clipping

kwalcock

How in the world can I delete this comment?

kwalcock · 2021-10-19T22:45:04Z

main/src/main/python/embeddings/wordEmbeddingMap.py

+    w2i = {}
+    i = 0


I think that this might help. In the previous version with the head start of i = 1, it seems like the wrong vectors might have been used. If one looked up "," in w2i, it might have been mapped to 2 instead of 1.

This is because we treated empty string "" and unknown "" differently in the previous version, 0 was token by , and i was starting from 1.
In the current version, the "" and "" share the same embedding, so we do not need an extra id for ""/"".

kwalcock · 2021-10-19T22:57:03Z

main/src/main/python/embeddings/wordEmbeddingMap.py

@@ -21,15 +21,13 @@ def load(config):
            else:
                delimiter = " "
            word, *rest = line.rstrip().split(delimiter)
+            word = "<UNK>" if word == "" else word


IF Python is OK using an empty string as a key, this should not be necessary.

It is easier to change the key here instead of changing all tokens through the codes...

kwalcock · 2021-10-19T23:05:17Z

main/src/main/python/embeddings/wordEmbeddingMap.py

-                emb_dict["<UNK>"] = vector
-            else:
-                emb_dict[word] = vector    
+            emb_dict[word] = vector    


Are two copies of the arrays being kept temporarily: one in emb_dict and another in weights? If memory is an issue, it seems like one could record this vector right away in weights.

You are right, I will refine this later. Thanks!

ZhengTang1120 added 3 commits September 15, 2021 18:58

init code

d86fccb

Update columnReader.py

07c4142

refined the code and fixed few bugs

c9ec5b8

ZhengTang1120 added 21 commits September 20, 2021 12:34

initial code for metal

9311763

refine metal, added layers(partial)

a33fe36

fixed some bugs, init code for embeddings

822f6c2

more implementation for embedding layer

8ef31d2

init code for rnnLayer

ddcf223

also refined some functions in embedding layer

forward layer implementation

03229a4

greedy forward layer

5a6b128

add more functions to layers, init viterbi layer

0aa3aaa

traverse the code, fixed bugs

e121618

Now, the model initialization part is working

finished the whole model except the viterbi part

4cbeb68

finally training...

470eca9

the training pipeline is working now

2b91e03

fix some minor issues

1892713

make minor changes, implemented Viterbi decoder

850e9d2

Update forwardLayer.py

b8e2d3c

Update forwardLayer.py

c5476bc

fixed bugs in viterbi decoder

9199eed

fixed some bugs, changed default learning rate

fdaf8e4

add features and fixed bugs

c2b7193

fixed bugs on UNK word embedding set dropout prob to 0.1 add clipping

Update wordEmbeddingMap.py

66e6400

Update wordEmbeddingMap.py

46bc27e

kwalcock reviewed Oct 19, 2021

View reviewed changes

Update seqScorer.py

21d861f

ZhengTang1120 added 29 commits February 2, 2022 21:53

Update pytorch2onnx.py

298bdc4

Update pytorch2onnx.py

949522c

Update pytorch2onnx.py

d05d81b

Update pytorch2onnx.py

9bc0ac8

Update pytorch2onnx.py

ebbfdcf

Update pytorch2onnx.py

12a777d

Update pytorch2onnx.py

84d5517

fix bug in viterbi decoding

a2bfc83

Update pytorch2onnx.py

3269cd4

Update pytorch2onnx.py

c90015c

Update pytorch2onnx.py

81c1bc5

Update pytorch2onnx.py

91911ae

debug decoder

b1be4b5

decoder error...

40654a6

Update pytorch2onnx.py

7cda4dd

Update viterbiForwardLayer.py

7e673e7

trying to fix the viterbi decoder

1857a5f

Update viterbiForwardLayer.py

14beb42

add other embeddings to onnx model

3aee7e7

Update embeddingLayer.py

cd46faa

fix bug in distance embeddings

fd514a2

Update embeddingLayer.py

2786971

Update embeddingLayer.py

9b97c68

Update pytorch2onnx.py

a20b2c5

implement viterbi decoding

a002222

remove pick span and transduce to simplify the model

743bbc5

Update mtl-en-pos-chunk-srlp.conf

ca34b1c

save the json only once to save memory and space

52d903c

Update mtl-en-srla.conf

2057c8a

kwalcock marked this pull request as draft February 15, 2023 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task Parsing Part for Pytorch Implementation #552

Task Parsing Part for Pytorch Implementation #552

ZhengTang1120 commented Sep 16, 2021

MihaiSurdeanu commented Sep 16, 2021

kwalcock left a comment •

edited

kwalcock Oct 19, 2021

ZhengTang1120 Oct 27, 2021

kwalcock Oct 19, 2021

ZhengTang1120 Oct 27, 2021

kwalcock Oct 19, 2021

ZhengTang1120 Oct 27, 2021

Task Parsing Part for Pytorch Implementation #552

Are you sure you want to change the base?

Task Parsing Part for Pytorch Implementation #552

Conversation

ZhengTang1120 commented Sep 16, 2021

MihaiSurdeanu commented Sep 16, 2021

kwalcock left a comment • edited

Choose a reason for hiding this comment

kwalcock Oct 19, 2021

Choose a reason for hiding this comment

ZhengTang1120 Oct 27, 2021

Choose a reason for hiding this comment

kwalcock Oct 19, 2021

Choose a reason for hiding this comment

ZhengTang1120 Oct 27, 2021

Choose a reason for hiding this comment

kwalcock Oct 19, 2021

Choose a reason for hiding this comment

ZhengTang1120 Oct 27, 2021

Choose a reason for hiding this comment

kwalcock left a comment •

edited