Elastic Weight Consolidation + regularization refactor #742

varisd · 2018-07-27T19:14:44Z

Implemented EWC from: https://arxiv.org/pdf/1612.00796.pdf
Moved the regularization to a separate module/classes.
Added GradientRunner for easier gradient output.
Added script for gradient averaging.

NOTE: currently is not working with DelayedUpdateTrainer due to a problem with trainer variable restoration.

jlibovicky · 2018-07-30T20:29:26Z

neuralmonkey/learning_utils.py

@@ -423,7 +423,7 @@ def _check_savable_dict(data):
            return False

        supported_type = Union[
-            List[Dict[str, np.ndarray]],
+            List[Dict[str, Union[np.ndarray, np.float32]]],


Je to určitě np.float32 nemá to být normální float?

Je to kvuli gradient_runneru, ktery vraci dict gradientu a ty jsou vystupem volani session.run, takze myslim, ze np.float32 je na miste

@varisd má pravdu. session.run() vrací numpyovský floaty, který nejsou subclassy od pythonovskejch floatů, takže by to padalo.

jlibovicky · 2018-07-30T20:30:04Z

neuralmonkey/runners/gradient_runner.py

+            image_summaries=None)
+
+
+class GradientRunner(BaseRunner[SupportedDecoder]):


Dokumentace: co to dělá, k čeu je tpo potenciálně dobrý.

jlibovicky · 2018-07-30T20:30:30Z

neuralmonkey/runners/gradient_runner.py

+                gradient_dict[name] = val
+
+        self.result = ExecutionResult(
+            outputs=[gradient_dict],


Musí to být v tom listu?

Jo, jinak ti to bude vracet, jen nazvy tech promennych a ne dicty. Muze za to tento radek:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/runners/base_runner.py#L77

jlibovicky · 2018-07-30T20:32:35Z

neuralmonkey/trainers/cross_entropy_trainer.py

@@ -25,10 +26,9 @@ class CrossEntropyTrainer(GenericTrainer):
    def __init__(self,
                 decoders: List[Any],
                 decoder_weights: List[ObjectiveWeight] = None,
-                 l1_weight: float = 0.,
-                 l2_weight: float = 0.,


Já bych tady tu L2 a L1 nechal jako syntactic sugar a v konftruktoru udělal regularizátora, co zkonstruuje ty regularizátoři a přidá je to listu.

jlibovicky · 2018-07-30T20:33:19Z

neuralmonkey/trainers/regularizers.py

+class EWCRegularizer(BaseRegularizer):
+    """Regularizer based on the Elastic Weight Consolidation.
+
+    TODO description


jlibovicky · 2018-07-30T20:33:36Z

neuralmonkey/trainers/regularizers.py

+from neuralmonkey.logging import log
+
+
+class BaseRegularizer:


Pojmenova bych to jenom Regularizer

Akorát bez toho překlepu :-D

varisd · 2018-07-31T20:15:07Z

Note, that currently this branch removes the default l1, l2 logging into summaries

jlibovicky · 2018-07-31T22:32:38Z

Oh, I didn't even notice. I'd like to keep the L2 plot in TensorBoard.

varisd · 2018-08-01T18:28:56Z

The L2 plot is back.

jindrahelcl

Přestože jsi změnil milion testů, funkcionalita, kterou tady introducuješ (jmenovitě gradient runner a EWC regularizer) zůstává netestovaná.

Refaktor na regularizery mi přijde užitečnej, ale ještě bych se zamyslel, jestli váha regularizeru je opravdu součástí regularizeru. Kdyby to tak nebylo, odpadly by trampoty s defaultníma vahama, kterejm se věnuju v tom review.

jindrahelcl · 2018-08-03T14:37:57Z

neuralmonkey/trainers/cross_entropy_trainer.py

+            regularizers = []
+        if l1_weight > 0.:
+            if L1Regularizer in [type(r) for r in regularizers]:
+                warn("You specified both trainer l1_weight "


já bych tady newarnoval, ale rovnou chcípal.

... a jak na to koukám, tak je to docela humus, protože na ten seznam regularizes neimposuješ žádný constrainty, takže tam klidně dva L1 regularizery bejt můžou a stěžovat si to nebude.

... a co když někdo podědí od L1 regularizeru (třeba aby mu dal předdefiovanou váhu)? to to projde bez varování, protože type(r) nebude L1Regularizer.

proto tam je warn (kdyby si nahodou nevsim, ze to definuje dvema zpusoby)... nechci uzivateli svazovat ruce, at kldine do seznamu hodi L1Regularizeru, kolik chce

házet víc stejnějch regularizerů je blbost a mělo by to spadnout protože je to téměř jistě něco, co nechceš. Stačí sečíst ty weights a bude ti stačit jen jeden.

klidne muzu to kontrolu vyhodit uplne, co ja vim

jindrahelcl · 2018-08-03T14:38:07Z

neuralmonkey/trainers/cross_entropy_trainer.py

+
+        if l2_weight > 0.:
+            if L2Regularizer in [type(r) for r in regularizers]:
+                warn("You specified both trainer l2_weight "


jindrahelcl · 2018-08-03T14:38:30Z

neuralmonkey/trainers/generic_trainer.py


 from neuralmonkey.model.model_part import ModelPart
 from neuralmonkey.runners.base_runner import (
    Executable, ExecutionResult, NextExecute)
+from neuralmonkey.trainers.regularizers import (
+    Regularizer, L2Regularizer)


to se nevejde na jednu řádku?

jindrahelcl · 2018-08-03T14:40:27Z

neuralmonkey/trainers/generic_trainer.py

-                              collections=["summary_train"])
+            self.losses = [o.loss for o in objectives] + reg_values
+
+            # we always want to include l2 values in the summary


nechápu. ty grafy nejsou stejný a každá diagnostická informace dobrá, nebo ne?

myslim, ze docela koreluji + tys je nekdy pouzival? (vetsinou ti staci l2)

to že dvě veličiny korelujou neznamená, že je zajímavá jen jedna. už jsme viděli i grafy kde L1 rostla a L2 klesala.

hod mi use case, kdy ti v minulosti analyza L1 pomohla

hoď mi důkaz, že mi v budoucnosti nemůže pomoct

ok, pridam to tam

jindrahelcl · 2018-08-03T14:41:26Z

neuralmonkey/trainers/generic_trainer.py

+
+            # we always want to include l2 values in the summary
+            if L2Regularizer not in [type(r) for r in self.regularizers]:
+                reg_values.append(L2Regularizer().value(regularizable))


nešlo by to udělat nějakejma statickejma metodama? tady takhle konstruovat něco, co se konstruuje z konfigu mi přijde ohavný

navíc, co je tohle vůbec za humus? :-) Přidáváš tady do reg_values, který pak zipuješ s něčím, kam jsi nic nepřidal, takže se to tam afaik stejně neukáže. reg_values se nesmí měnit (taky kvůli sémantice - obsahuje to hodnoty regularizátorů, ale já L2 nepoužívám jako regularizátor, takže by jeho hodnota neměla bejt v tomhle seznamu)

udělal bych tady zvláštní if a v něm rovnou volal tf.summary.scalar

Ten L2Regularizer by se mel pridat i do self.regularizers.

reg_values se potom, co se pridaji do summaries, dale uz nikde nepouzivaji, takze neni problem tam pridat tenhle "default" regularizer

Jenže to neni regularizer. Takže až k tomu za měsíc někdo přijde a bude chtít něco dělat s regularizerama, tak se tenhle sémantickej bug projeví.

Jde mi jen o to, aby v proměnnejch, co se nějak jmenujou, byly věci, který tomu odpovídaj.

A neadresoval jsi mojí poznámku o tom, že by se neměl volat konstruktor, ale že by to mělo bejt buď funkcí nebo statickou metodou.

jindrahelcl · 2018-08-03T15:16:05Z

neuralmonkey/trainers/regularizers.py

+
+    def __init__(self,
+                 name: str = "train_l1",
+                 weight: float = 1.0e-8) -> None:


už jsem to napsal níž - vyházet neopodstatněný defaultní hodnoty u name i u weight

pokud se nějaká hodnota doporučuje (což u L1 ani L2 neni ten případ), napsal bych to do docstringu.

jindrahelcl · 2018-08-03T15:16:37Z

neuralmonkey/trainers/regularizers.py

+
+        log("Loading gradient estimates from {}".format(gradients_file))
+        self.gradients = np.load(gradients_file)
+        log("Gradient estimates loaded")


tečky za větama v lozích (4x) :-)

Ok, ale tohle neplati pro celou codebase

No jo no, ale když to vidim tak vidim že by tam asi měla bejt...

jindrahelcl · 2018-08-03T15:16:58Z

neuralmonkey/trainers/regularizers.py

+        self.gradients = np.load(gradients_file)
+        log("Gradient estimates loaded")
+
+    def value(self, variables: List[tf.Tensor]) -> float:


tahle funkce chce trochu komentářů nebo docstring co se tu děje

jindrahelcl · 2018-08-03T15:19:27Z

neuralmonkey/trainers/regularizers.py

+            if (var_name in self.gradients.files
+                    and self.init_vars.has_tensor(var_name)):
+                init_var = self.init_vars.get_tensor(var_name)
+                gradient = tf.constant(


proč to tady obaluješ tou konstantou? tf.square neakceptuje čísla? a co víc, nemůžeš si tu druhou mocninu předpočítat mimo tensorflow?

ad constanta: chci ty gradienty mit v grafu pojmenovane
ad predpocitani: muzu, dobry napad

jestli to je kvůli jménu, tak je to asi takhle dobrý, protože kdyby sis předpočítával tu druhou mocninu vedle, tak bys pak ten gradient v grafu vůbec neměl.

momentalne to resim... je dobre to mit pojmenovane, square se predpocita o krok drive v numpy... jeste jsem to nepushnul

jindrahelcl · 2018-08-03T15:22:18Z

neuralmonkey/trainers/regularizers.py

+        return ewc_value
+
+
+L1 = L1Regularizer()


tady bych tu defalutní váhu asi dal, ale napsal bych to do modulovýho dokstringu. Ale i tak mam strach že dostupnost defaultní hodnoty pro tyhle regularizery dá userům pocit, že defaultní hodnota je to, co maj použít, ale to vůbec neni pravda, tadyta 1e-8 je založená na nějaký jiný defaultní hodnotě, pro kterou nemáme žádný vysvětlení.

Tak tyhlety zkratky vyhodim

jindrahelcl · 2018-08-03T19:54:27Z

Právě kvůli ty sémantice bych to nedělal.. regularizers jsou regularizers a new regularizers and then some.. Dne pá 3. 8. 2018 21:48 uživatel Dušan Variš <notifications@github.com> napsal:

…

***@***.**** commented on this pull request. ------------------------------ In neuralmonkey/trainers/generic_trainer.py <#742 (comment)>: > # unweighted losses for fetching - self.losses = [o.loss for o in objectives] + [l1_value, l2_value] - tf.summary.scalar("train_l1", l1_value, - collections=["summary_train"]) - tf.summary.scalar("train_l2", l2_value, - collections=["summary_train"]) + self.losses = [o.loss for o in objectives] + reg_values + + # we always want to include l2 values in the summary + if L2Regularizer not in [type(r) for r in self.regularizers]: + reg_values.append(L2Regularizer().value(regularizable)) Ten L2Regularizer by se mel pridat i do self.regularizers. reg_values se potom, co se pridaji do summaries, dale uz nikde nepouzivaji, takze neni problem tam pridat tenhle "default" regularizer — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#742 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwcs4sS68v6cTEfJRcyTu-TgJa6xpDFks5uNKkLgaJpZM4VkjNe> .

jindrahelcl · 2018-08-03T19:54:42Z

S/new/ne/ Dne pá 3. 8. 2018 21:54 uživatel Jindra Helcl <jindra.helcl@gmail.com> napsal:

…

Právě kvůli ty sémantice bych to nedělal.. regularizers jsou regularizers a new regularizers and then some.. Dne pá 3. 8. 2018 21:48 uživatel Dušan Variš ***@***.***> napsal: > ***@***.**** commented on this pull request. > ------------------------------ > > In neuralmonkey/trainers/generic_trainer.py > <#742 (comment)>: > > > > # unweighted losses for fetching > - self.losses = [o.loss for o in objectives] + [l1_value, l2_value] > - tf.summary.scalar("train_l1", l1_value, > - collections=["summary_train"]) > - tf.summary.scalar("train_l2", l2_value, > - collections=["summary_train"]) > + self.losses = [o.loss for o in objectives] + reg_values > + > + # we always want to include l2 values in the summary > + if L2Regularizer not in [type(r) for r in self.regularizers]: > + reg_values.append(L2Regularizer().value(regularizable)) > > Ten L2Regularizer by se mel pridat i do self.regularizers. > > reg_values se potom, co se pridaji do summaries, dale uz nikde > nepouzivaji, takze neni problem tam pridat tenhle "default" regularizer > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#742 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABwcs4sS68v6cTEfJRcyTu-TgJa6xpDFks5uNKkLgaJpZM4VkjNe> > . >

jindrahelcl · 2018-08-08T10:16:14Z

neuralmonkey/trainers/generic_trainer.py

@@ -7,8 +7,7 @@
 from neuralmonkey.model.model_part import ModelPart
 from neuralmonkey.runners.base_runner import (
    Executable, ExecutionResult, NextExecute)
-from neuralmonkey.trainers.regularizers import (
-    Regularizer, L2Regularizer)
+from neuralmonkey.trainers.regularizers import (Regularizer, L2Regularizer)


bez závorek

jindrahelcl · 2018-08-08T10:17:15Z

neuralmonkey/trainers/generic_trainer.py

@@ -40,6 +39,7 @@ class Objective(NamedTuple(


 # pylint: disable=too-few-public-methods,too-many-locals,too-many-branches
+# pylint: disable=too-many-statements


too many statements znamená, že by se to mělo rozsekat na funkce. nejde to nějak?

souhlasim, nechtelo se mi v tom vcera vrtat... kazdopadne to jeste v tomhle PR zkusim zapracovat

jindrahelcl · 2018-08-08T10:17:42Z

neuralmonkey/trainers/generic_trainer.py

-                              collections=["summary_train"])
+            self.losses = [o.loss for o in objectives] + reg_values
+
+            # we always want to include l2 values in the summary


hoď mi důkaz, že mi v budoucnosti nemůže pomoct

jindrahelcl · 2018-08-08T10:18:58Z

neuralmonkey/trainers/regularizers.py

-class Regularizer:
-    """Base class for the regularizers."""
+class Regularizer(metaclass=ABCMeta):
+    """Base clas    s for regularizers.


ulítnul ti tady tabulátor

jindrahelcl · 2018-08-08T10:19:22Z

neuralmonkey/trainers/regularizers.py

+    """Base clas    s for regularizers.
+
+    Regularizer objects are used to introduce additional loss terms to
+    the trainerthus constraining the model variable during training. These


s/trainerthus/trainer, thus/

jindrahelcl · 2018-08-08T10:35:53Z

neuralmonkey/trainers/regularizers.py

@@ -84,15 +95,18 @@ class EWCRegularizer(Regularizer):

    Implements Elastic Weight Consolidation from the "Overcoming catastrophic
    forgetting in neural networks" paper.
+    The regularizer applies separate regularization weight to each trainable


a separate

jindrahelcl · 2018-08-08T10:36:41Z

neuralmonkey/trainers/regularizers.py

@@ -84,15 +95,18 @@ class EWCRegularizer(Regularizer):

    Implements Elastic Weight Consolidation from the "Overcoming catastrophic
    forgetting in neural networks" paper.
+    The regularizer applies separate regularization weight to each trainable
+    variable based on how important the variable was for the previously


based on its importance for the previously learned task

jindrahelcl · 2018-08-08T10:37:37Z

neuralmonkey/trainers/regularizers.py

-
-        log("Loading initial variables for EWC from {}".format(variables_file))
+        log("Loading initial variables for EWC from "
+            "{}.".format(variables_file))


konkatenaci stringů nedělej a druhej řádek (pokud se to fakt nevejde na jednu) začni .format.

jindrahelcl · 2018-08-08T10:39:36Z

neuralmonkey/trainers/regularizers.py

        ewc_value = tf.constant(0.0)
        for var in variables:
-            var_name = var.name.split(":")[0]
+            var_name = var.name


uvědomuješ si, že přístup přes tečku nic nestojí a že tohle je vlastně kopírování jedný lokální proměnný do druhý, jejíž jméno se liší jedním znakem?

jindrahelcl · 2018-08-08T10:49:49Z

neuralmonkey/trainers/regularizers.py

+                    name="{}_init_value".format(init_var_name))
+                grad_squared = tf.constant(
+                    np.square(self.gradients[var_name]),
+                    name="{}_ewc_weight".format(init_var_name))


Teď si ale právě nepojmenováváš v grafu gradienty, ale jejich druhý mocniny. To jsi mohl docílit už tím, že bys napsal tf.square(gradient, name="kráva")

nevim, takhle ale uz do grafu pridavam predpociane druhe mocniny... v opacnem pripade (s tf.square) by se ten vypocet provadel pokazde, kdyz bys potreboval ten vysledek

ve vysledku tam asi takovy rozdil v rychlosti nebude, ale prijde mi to rozumnejsi si konstanty predpocitat, nez je fouknes do grafu

jindrahelcl

Chápu správně, že GradientRunner se používá s natrénovaným modelem, na kterej na jednu nějakou batch aplikuju trainer, kterej vrátí nějaký gradienty?

Btw. pořád to neni testovaný v žádným testu. Navrhuju, aby jeden test seběhnul ten runner, kterej to někde uloží, a druhej test pustil ten EWC regularizer. Anebo to jen fikaně přidat do testu, kterej má ukládání už pořešený. Zkrátka aby se použil jak ewc regularizer, tak gradient runner.

Jinak se mi líbí jak už to hezky konverguje k mergovatelnosti. :-)

jindrahelcl · 2018-08-09T09:42:22Z

tests/nematus.ini

@@ -87,7 +87,7 @@ rnn_cell="NematusGRU"
 ; This block just fills the arguments of the trainer __init__ method.
 class=trainers.cross_entropy_trainer.CrossEntropyTrainer
 decoders=[<decoder>]
-l2_weight=1.0e-8
+regularizers=[trainers.regularizers.L2]


klidně bych to z těch testů vyhodil a v jednom jich otestoval víc

Jo, muzeme to povyhazovat

varisd · 2018-08-09T13:12:59Z

Jo, testy jsem jeste neresil.

Ten gradient runner pouzivam, abych si vyprintil gradienty z celeho datasetu (vetsinou validacniho, protoze neni tak velky). Z toho si pak bokem udelam prumer gradientu pro kazdou vahu modelu; ty pak muzu pouzit v EWC.

jindrahelcl · 2018-08-09T13:24:13Z

Tak kdyby se ti podařilo tuhle pipelinu nasimulovat v tests/tests_run.sh tak by to bylo úplně nejlepší.

varisd · 2019-01-31T13:16:09Z

ping, uz jsem tuhle vetev rebasoval minimalne trikrat... pokud ji nemate v planu mergovat, dejte mi vedet

jlibovicky · 2019-01-31T17:09:47Z

Mno já jsem myslel, že se to zamerguje, až bude hotovej refaktoříček s tf datasetem (stejně jako hromada commitů, který si syslim ve štelářku a nedělám z toho PR).

jlibovicky requested changes Jul 30, 2018

View reviewed changes

varisd force-pushed the ewc branch from 3a19ab7 to f7e6795 Compare July 31, 2018 16:55

jindrahelcl requested changes Aug 3, 2018

View reviewed changes

jindrahelcl requested changes Aug 8, 2018

View reviewed changes

varisd force-pushed the ewc branch from 41dd9f0 to 2e4683b Compare August 8, 2018 20:23

jindrahelcl requested changes Aug 9, 2018

View reviewed changes

varisd force-pushed the ewc branch from 551f8ae to a7637d8 Compare November 16, 2018 16:24

varisd and others added 16 commits January 9, 2019 16:29

workaround for train_set batching during inference time

299c1bc

added batching schemes from tensor2tensor

7a62312

fixing failed travis tests

a97affc

fixing mypy and pylint errors + noise preprocessor refactor

619294f

morphing test files, fixing encountered bugs

fc89074

fixing some bugs, tests, and linters

9b54b01

fixed options for bucketed batching; added docstring

34eaf49

implemented token_level batch_size

5da678b

bucketed token-level batching tested

ff98b28

dataset refactored into three modules

298c432

implementation of EWC, gradient_runner and gradient averaging script

16d0a6c

fixed tests

5bcf362

addressing PR reviews

5dec44f

generic_trainer always adds l2 values to summaries

872e029

addressing PR reviews + fixed variable fetching in EWCRegularizer

1c8a963

fixed pylints in generic_trainer, fixed typos

f848ecf

varisd added 5 commits January 30, 2019 17:05

removed predefined L1, L2 regularizers

19f6971

removed squaring of gradients in EWCRegularizer

74ae170

added script to compute Empirical Fisher

3c8302e

naacl19 EWC branch cleanup

dbe24ec

rebased EWC to master

c676d9b

varisd force-pushed the ewc branch from a7637d8 to c676d9b Compare January 30, 2019 18:00

varisd force-pushed the ewc branch from c42c34a to 65baca5 Compare January 31, 2019 16:29

bugfix in DelayedUpdateTrainer

d826ad7

varisd force-pushed the ewc branch from 65baca5 to d826ad7 Compare January 31, 2019 17:07

		image_summaries=None)


		class GradientRunner(BaseRunner[SupportedDecoder]):

		@@ -40,6 +39,7 @@ class Objective(NamedTuple(


		# pylint: disable=too-few-public-methods,too-many-locals,too-many-branches
		# pylint: disable=too-many-statements

Elastic Weight Consolidation + regularization refactor #742

Are you sure you want to change the base?

Elastic Weight Consolidation + regularization refactor #742

Conversation

varisd commented Jul 27, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varisd commented Jul 31, 2018

jlibovicky commented Jul 31, 2018

varisd commented Aug 1, 2018

jindrahelcl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varisd Aug 7, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varisd Aug 7, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jindrahelcl commented Aug 3, 2018 via email

jindrahelcl commented Aug 3, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jindrahelcl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varisd commented Aug 9, 2018

jindrahelcl commented Aug 9, 2018

varisd commented Jan 31, 2019

jlibovicky commented Jan 31, 2019

varisd commented Jul 27, 2018 •

edited

varisd Aug 7, 2018 •

edited

varisd Aug 7, 2018 •

edited