Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Weight Consolidation + regularization refactor #742

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Conversation

varisd
Copy link
Member

@varisd varisd commented Jul 27, 2018

  • Implemented EWC from: https://arxiv.org/pdf/1612.00796.pdf
  • Moved the regularization to a separate module/classes.
  • Added GradientRunner for easier gradient output.
  • Added script for gradient averaging.

NOTE: currently is not working with DelayedUpdateTrainer due to a problem with trainer variable restoration.

@@ -423,7 +423,7 @@ def _check_savable_dict(data):
return False

supported_type = Union[
List[Dict[str, np.ndarray]],
List[Dict[str, Union[np.ndarray, np.float32]]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je to určitě np.float32 nemá to být normální float?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je to kvuli gradient_runneru, ktery vraci dict gradientu a ty jsou vystupem volani session.run, takze myslim, ze np.float32 je na miste

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varisd má pravdu. session.run() vrací numpyovský floaty, který nejsou subclassy od pythonovskejch floatů, takže by to padalo.

image_summaries=None)


class GradientRunner(BaseRunner[SupportedDecoder]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dokumentace: co to dělá, k čeu je tpo potenciálně dobrý.

gradient_dict[name] = val

self.result = ExecutionResult(
outputs=[gradient_dict],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Musí to být v tom listu?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jo, jinak ti to bude vracet, jen nazvy tech promennych a ne dicty. Muze za to tento radek:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/runners/base_runner.py#L77

@@ -25,10 +26,9 @@ class CrossEntropyTrainer(GenericTrainer):
def __init__(self,
decoders: List[Any],
decoder_weights: List[ObjectiveWeight] = None,
l1_weight: float = 0.,
l2_weight: float = 0.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Já bych tady tu L2 a L1 nechal jako syntactic sugar a v konftruktoru udělal regularizátora, co zkonstruuje ty regularizátoři a přidá je to listu.

class EWCRegularizer(BaseRegularizer):
"""Regularizer based on the Elastic Weight Consolidation.

TODO description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

from neuralmonkey.logging import log


class BaseRegularizer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pojmenova bych to jenom Regularizer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Akorát bez toho překlepu :-D

@varisd
Copy link
Member Author

varisd commented Jul 31, 2018

Note, that currently this branch removes the default l1, l2 logging into summaries

@jlibovicky
Copy link
Contributor

Oh, I didn't even notice. I'd like to keep the L2 plot in TensorBoard.

@varisd
Copy link
Member Author

varisd commented Aug 1, 2018

The L2 plot is back.

Copy link
Member

@jindrahelcl jindrahelcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Přestože jsi změnil milion testů, funkcionalita, kterou tady introducuješ (jmenovitě gradient runner a EWC regularizer) zůstává netestovaná.

Refaktor na regularizery mi přijde užitečnej, ale ještě bych se zamyslel, jestli váha regularizeru je opravdu součástí regularizeru. Kdyby to tak nebylo, odpadly by trampoty s defaultníma vahama, kterejm se věnuju v tom review.

regularizers = []
if l1_weight > 0.:
if L1Regularizer in [type(r) for r in regularizers]:
warn("You specified both trainer l1_weight "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

já bych tady newarnoval, ale rovnou chcípal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... a jak na to koukám, tak je to docela humus, protože na ten seznam regularizes neimposuješ žádný constrainty, takže tam klidně dva L1 regularizery bejt můžou a stěžovat si to nebude.

... a co když někdo podědí od L1 regularizeru (třeba aby mu dal předdefiovanou váhu)? to to projde bez varování, protože type(r) nebude L1Regularizer.

Copy link
Member Author

@varisd varisd Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proto tam je warn (kdyby si nahodou nevsim, ze to definuje dvema zpusoby)... nechci uzivateli svazovat ruce, at kldine do seznamu hodi L1Regularizeru, kolik chce

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

házet víc stejnějch regularizerů je blbost a mělo by to spadnout protože je to téměř jistě něco, co nechceš. Stačí sečíst ty weights a bude ti stačit jen jeden.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klidne muzu to kontrolu vyhodit uplne, co ja vim


if l2_weight > 0.:
if L2Regularizer in [type(r) for r in regularizers]:
warn("You specified both trainer l2_weight "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


from neuralmonkey.model.model_part import ModelPart
from neuralmonkey.runners.base_runner import (
Executable, ExecutionResult, NextExecute)
from neuralmonkey.trainers.regularizers import (
Regularizer, L2Regularizer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to se nevejde na jednu řádku?

collections=["summary_train"])
self.losses = [o.loss for o in objectives] + reg_values

# we always want to include l2 values in the summary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L1 taky!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L2 staci

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nechápu. ty grafy nejsou stejný a každá diagnostická informace dobrá, nebo ne?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

myslim, ze docela koreluji + tys je nekdy pouzival? (vetsinou ti staci l2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to že dvě veličiny korelujou neznamená, že je zajímavá jen jedna. už jsme viděli i grafy kde L1 rostla a L2 klesala.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hod mi use case, kdy ti v minulosti analyza L1 pomohla

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoď mi důkaz, že mi v budoucnosti nemůže pomoct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, pridam to tam


# we always want to include l2 values in the summary
if L2Regularizer not in [type(r) for r in self.regularizers]:
reg_values.append(L2Regularizer().value(regularizable))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nešlo by to udělat nějakejma statickejma metodama? tady takhle konstruovat něco, co se konstruuje z konfigu mi přijde ohavný

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

navíc, co je tohle vůbec za humus? :-) Přidáváš tady do reg_values, který pak zipuješ s něčím, kam jsi nic nepřidal, takže se to tam afaik stejně neukáže. reg_values se nesmí měnit (taky kvůli sémantice - obsahuje to hodnoty regularizátorů, ale já L2 nepoužívám jako regularizátor, takže by jeho hodnota neměla bejt v tomhle seznamu)

udělal bych tady zvláštní if a v něm rovnou volal tf.summary.scalar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ten L2Regularizer by se mel pridat i do self.regularizers.

reg_values se potom, co se pridaji do summaries, dale uz nikde nepouzivaji, takze neni problem tam pridat tenhle "default" regularizer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenže to neni regularizer. Takže až k tomu za měsíc někdo přijde a bude chtít něco dělat s regularizerama, tak se tenhle sémantickej bug projeví.

Jde mi jen o to, aby v proměnnejch, co se nějak jmenujou, byly věci, který tomu odpovídaj.

A neadresoval jsi mojí poznámku o tom, že by se neměl volat konstruktor, ale že by to mělo bejt buď funkcí nebo statickou metodou.


def __init__(self,
name: str = "train_l1",
weight: float = 1.0e-8) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

už jsem to napsal níž - vyházet neopodstatněný defaultní hodnoty u name i u weight

pokud se nějaká hodnota doporučuje (což u L1 ani L2 neni ten případ), napsal bych to do docstringu.


log("Loading gradient estimates from {}".format(gradients_file))
self.gradients = np.load(gradients_file)
log("Gradient estimates loaded")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tečky za větama v lozích (4x) :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, ale tohle neplati pro celou codebase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No jo no, ale když to vidim tak vidim že by tam asi měla bejt...

self.gradients = np.load(gradients_file)
log("Gradient estimates loaded")

def value(self, variables: List[tf.Tensor]) -> float:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tahle funkce chce trochu komentářů nebo docstring co se tu děje

if (var_name in self.gradients.files
and self.init_vars.has_tensor(var_name)):
init_var = self.init_vars.get_tensor(var_name)
gradient = tf.constant(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proč to tady obaluješ tou konstantou? tf.square neakceptuje čísla? a co víc, nemůžeš si tu druhou mocninu předpočítat mimo tensorflow?

Copy link
Member Author

@varisd varisd Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ad constanta: chci ty gradienty mit v grafu pojmenovane
ad predpocitani: muzu, dobry napad

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jestli to je kvůli jménu, tak je to asi takhle dobrý, protože kdyby sis předpočítával tu druhou mocninu vedle, tak bys pak ten gradient v grafu vůbec neměl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

momentalne to resim... je dobre to mit pojmenovane, square se predpocita o krok drive v numpy... jeste jsem to nepushnul

return ewc_value


L1 = L1Regularizer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tady bych tu defalutní váhu asi dal, ale napsal bych to do modulovýho dokstringu. Ale i tak mam strach že dostupnost defaultní hodnoty pro tyhle regularizery dá userům pocit, že defaultní hodnota je to, co maj použít, ale to vůbec neni pravda, tadyta 1e-8 je založená na nějaký jiný defaultní hodnotě, pro kterou nemáme žádný vysvětlení.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tak tyhlety zkratky vyhodim

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsem pro

@jindrahelcl
Copy link
Member

jindrahelcl commented Aug 3, 2018 via email

@jindrahelcl
Copy link
Member

jindrahelcl commented Aug 3, 2018 via email

@@ -7,8 +7,7 @@
from neuralmonkey.model.model_part import ModelPart
from neuralmonkey.runners.base_runner import (
Executable, ExecutionResult, NextExecute)
from neuralmonkey.trainers.regularizers import (
Regularizer, L2Regularizer)
from neuralmonkey.trainers.regularizers import (Regularizer, L2Regularizer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bez závorek

@@ -40,6 +39,7 @@ class Objective(NamedTuple(


# pylint: disable=too-few-public-methods,too-many-locals,too-many-branches
# pylint: disable=too-many-statements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many statements znamená, že by se to mělo rozsekat na funkce. nejde to nějak?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

souhlasim, nechtelo se mi v tom vcera vrtat... kazdopadne to jeste v tomhle PR zkusim zapracovat

collections=["summary_train"])
self.losses = [o.loss for o in objectives] + reg_values

# we always want to include l2 values in the summary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoď mi důkaz, že mi v budoucnosti nemůže pomoct

class Regularizer:
"""Base class for the regularizers."""
class Regularizer(metaclass=ABCMeta):
"""Base clas s for regularizers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ulítnul ti tady tabulátor

"""Base clas s for regularizers.

Regularizer objects are used to introduce additional loss terms to
the trainerthus constraining the model variable during training. These
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/trainerthus/trainer, thus/

@@ -84,15 +95,18 @@ class EWCRegularizer(Regularizer):

Implements Elastic Weight Consolidation from the "Overcoming catastrophic
forgetting in neural networks" paper.
The regularizer applies separate regularization weight to each trainable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a separate

@@ -84,15 +95,18 @@ class EWCRegularizer(Regularizer):

Implements Elastic Weight Consolidation from the "Overcoming catastrophic
forgetting in neural networks" paper.
The regularizer applies separate regularization weight to each trainable
variable based on how important the variable was for the previously
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on its importance for the previously learned task


log("Loading initial variables for EWC from {}".format(variables_file))
log("Loading initial variables for EWC from "
"{}.".format(variables_file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

konkatenaci stringů nedělej a druhej řádek (pokud se to fakt nevejde na jednu) začni .format.

ewc_value = tf.constant(0.0)
for var in variables:
var_name = var.name.split(":")[0]
var_name = var.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uvědomuješ si, že přístup přes tečku nic nestojí a že tohle je vlastně kopírování jedný lokální proměnný do druhý, jejíž jméno se liší jedním znakem?

name="{}_init_value".format(init_var_name))
grad_squared = tf.constant(
np.square(self.gradients[var_name]),
name="{}_ewc_weight".format(init_var_name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teď si ale právě nepojmenováváš v grafu gradienty, ale jejich druhý mocniny. To jsi mohl docílit už tím, že bys napsal tf.square(gradient, name="kráva")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevim, takhle ale uz do grafu pridavam predpociane druhe mocniny... v opacnem pripade (s tf.square) by se ten vypocet provadel pokazde, kdyz bys potreboval ten vysledek

ve vysledku tam asi takovy rozdil v rychlosti nebude, ale prijde mi to rozumnejsi si konstanty predpocitat, nez je fouknes do grafu

Copy link
Member

@jindrahelcl jindrahelcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chápu správně, že GradientRunner se používá s natrénovaným modelem, na kterej na jednu nějakou batch aplikuju trainer, kterej vrátí nějaký gradienty?

Btw. pořád to neni testovaný v žádným testu. Navrhuju, aby jeden test seběhnul ten runner, kterej to někde uloží, a druhej test pustil ten EWC regularizer. Anebo to jen fikaně přidat do testu, kterej má ukládání už pořešený. Zkrátka aby se použil jak ewc regularizer, tak gradient runner.

Jinak se mi líbí jak už to hezky konverguje k mergovatelnosti. :-)

@@ -87,7 +87,7 @@ rnn_cell="NematusGRU"
; This block just fills the arguments of the trainer __init__ method.
class=trainers.cross_entropy_trainer.CrossEntropyTrainer
decoders=[<decoder>]
l2_weight=1.0e-8
regularizers=[trainers.regularizers.L2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klidně bych to z těch testů vyhodil a v jednom jich otestoval víc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jo, muzeme to povyhazovat

@varisd
Copy link
Member Author

varisd commented Aug 9, 2018

Jo, testy jsem jeste neresil.

Ten gradient runner pouzivam, abych si vyprintil gradienty z celeho datasetu (vetsinou validacniho, protoze neni tak velky). Z toho si pak bokem udelam prumer gradientu pro kazdou vahu modelu; ty pak muzu pouzit v EWC.

@jindrahelcl
Copy link
Member

Tak kdyby se ti podařilo tuhle pipelinu nasimulovat v tests/tests_run.sh tak by to bylo úplně nejlepší.

@varisd
Copy link
Member Author

varisd commented Jan 31, 2019

ping, uz jsem tuhle vetev rebasoval minimalne trikrat... pokud ji nemate v planu mergovat, dejte mi vedet

@jlibovicky
Copy link
Contributor

Mno já jsem myslel, že se to zamerguje, až bude hotovej refaktoříček s tf datasetem (stejně jako hromada commitů, který si syslim ve štelářku a nedělám z toho PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants