Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnyParameterProperties refactor #4412

Merged
merged 7 commits into from Nov 15, 2018

Conversation

gf712
Copy link
Member

@gf712 gf712 commented Nov 7, 2018

This PR refactors the AnyParameterProperties class to use a mask (m_mask_attribute) to store parameter options:

  • HYPERPARAMETER: the parameter determines how the training is performed, e.g. regularisation
  • GRADIENT_PARAM: the parameter is used in gradient updates
  • MODEL_PARAM: model parameter that is trained, e.g. weight and bias
    This makes the use of Enums in AnyParameter.h not necessary anymore, and the corresponding getter methods can be replaced with bool return type.
    The mask uses bit shift operations to keep information in a hexadecimal representation.

Gil added 3 commits November 7, 2018 14:20
* added mask_attribute to keep track of parameter availabilities
* updated constructors and getters
* added some documentation
`AnyParameterProperties::HYPERPARAMETER |  AnyParameterProperties::GRADIENT_PARAM | AnyParameterProperties::MODELSELECTION_PARAM` is equivalent to all parameter property flags being false (`0x00000000`)
modelselection_param is actually hyperparameter now and model_param is a seperate definition, e.g. bias and weights
AnyParameterProperties()
: m_description(), m_model_selection(MS_NOT_AVAILABLE),
m_gradient(GRADIENT_NOT_AVAILABLE)
static const int32_t HYPERPARAMETER = 0x00000001;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making it an int enum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's neat! I'll implement that.

return m_gradient;
EGradientAvailability return_val;
if (m_mask_attribute & GRADIENT_PARAM)
return_val = EGradientAvailability::GRADIENT_AVAILABLE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to return immediately without return_val

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean with a one liner like this?

return static_cast<EGradientAvailability>((m_mask_attribute & GRADIENT_PARAM) > 0);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe with some helper function so it looks shorter? There is some repeated code in these flags.

/** @brief Class AnyParameterProperties keeps track of parameter properties.
*
* Currently there are three types of parameters:
* -# HYPERPARAMETER: the parameter determines how the training is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls don’t put such things into the docs, they outdated at light speed.
Better describe that there are attributes, maybe an example, but not losing them all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

: m_description(), m_model_selection(MS_NOT_AVAILABLE),
m_gradient(GRADIENT_NOT_AVAILABLE)
static const int32_t HYPERPARAMETER = 0x00000001;
static const int32_t GRADIENT_PARAM = 0x00000010;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use the << operator to shift the 1 a desired number of digits
See the stack overflow post

Copy link
Member Author

@gf712 gf712 Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the post, and the Enum with bit operations is pretty cool. I just thought that the hexadecimal representation is more readable than doing the bit shifts, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the end of the day, it is style. You can either define the mask as you did with hexadecimal as you did there. Or, instead, you can generate those masks on the fly using a shift operator, and then the HYPERPARAMETER constant is the number of bits that you shift, i.e.

my_mask &&
(1 << HYPERPARAMETER) &&
(1 << GRADIENT_AVAILABLE) &&

The latter has the advantage that somebody reading the code never is exposed to hexadecimal numbers, nor powers or two (which can be confusing). On the other hand, masking operations (as what I wrote above) becomes a bit more notation heavy. Idk what is best to be honest. I think the shifting is more standard and therefore I think we should follow that.

Ah one thing I think is good about the shifting approach is that you can change the size of the mask easily. You are using a 32bit int above, but you only write down the last 8 bits, which is weird. Imagine you now changed the mask to 64 bit. Then the hexadecimals would be quite long, whereas the shifts wouldnt be.

@lisitsyn @vigsterkr ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! I think you're right, I have changed the class members to use bit shifts, it is a bit cleaner.

static const int32_t GRADIENT_PARAM = 0x00000010;
static const int32_t MODEL_PARAM = 0x00000100;

/** default constructor where all parameter types are switched to false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: sentences start with capital letters

{
}
/** legacy constructor */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove that. Until we haven't removed it, it is not legacy.

}

private:
std::string m_description;
EModelSelectionAvailability m_model_selection;
EGradientAvailability m_gradient;
/** mask to store all param flags*/
int32_t m_mask_attribute;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about calling it "attribute_mask" ? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I just noticed that I switched the order...

if (hyperparameter)
m_mask_attribute |= HYPERPARAMETER;
}
/** mask constructor */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you add doxygen docs, pls document all parameters
minor: capital M

: m_description(description), m_model_selection(model_selection),
m_gradient(gradient)
{
m_mask_attribute = 0x00000000;
if (model_selection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldnt this work?

m_attribute_mask = 
(MODEL_PARAM & model_selection) |
(GRADIENT_PARAM & gradient) |
(HYPERPARAMETER & hyperparameter)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in the new style

m_attribute_mask = 
((1<< MODEL_PARAM) & model_selection) |
((1<< GRADIENT_PARAM) & gradient) |
((1<<HYPERPARAMETER) & hyperparameter)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, that would be more concise. Is it worth writing a static member that is the default, i.e. 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default is 0, then we don't need that.
I would be explicit about a non-zero default, and add a private init method that sets it and that is called by all ctors.

: m_description(), m_model_selection(MS_NOT_AVAILABLE),
m_gradient(GRADIENT_NOT_AVAILABLE)
static const int32_t HYPERPARAMETER = 0x00000001;
static const int32_t GRADIENT_PARAM = 0x00000010;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this GRADIENT_AVAILABLE

AnyParameterProperties()
: m_description(), m_model_selection(MS_NOT_AVAILABLE),
m_gradient(GRADIENT_NOT_AVAILABLE)
static const int32_t HYPERPARAMETER = 0x00000001;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the naming self-consistent
HYPER_PARAM
or
HYPER_PARAMETER
or name the other one
MODELPARAMETER
or rather
MODEL_PARAMETER

Actually, in favour of short and concise names, why dont we use
HYPER
MODEL
GRADIENT
?
And then the constant itself can have a doxygen string to say what that means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! But hyperparameter is a term that could be misinterpreted if it written as HYPER, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is pretty clear since the class is ending with "Parameter", so everything in there is one ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name MODEL causes conflicts if we use an enum, so will have to use something more specific, or just have these values as static class members of AnyParameterProperties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please don't define them globally, but as a static member

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to use an enum within the class scope?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine as well

{
}

/** description getter */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superflous doc, as the method name explains it, i would just remove it

{
}

/** description getter */
std::string get_description() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor what about returning a const string reference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we could do that. It seems to be pretty much standard now to return const references for std::string. It can sometimes lead to odd behaviours, but I don't think we need to worry about those..
http://thesyntacticsugar.blogspot.com/2011/09/evil-side-of-returning-member-as-const.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the first rule of the C++ committee is:
"The C++ language does not prevent people from shooting themselves in their feet" ;)

std::string get_description() const
{
return m_description;
}

/** model selection flag getter */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, would remove the comment

@@ -676,7 +676,10 @@ class CSGObject
void watch_param(
const std::string& name, T* value,
AnyParameterProperties properties = AnyParameterProperties(
"Unknown parameter", MS_NOT_AVAILABLE, GRADIENT_NOT_AVAILABLE))
"Unknown parameter",
AnyParameterProperties::HYPERPARAMETER |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing this here, I think it might be better to offer a constructor that does not take any properties, and then set the default values inside there. We need to discuss clever default values as well btw

@@ -715,7 +721,10 @@ class CSGObject
void watch_param(
const std::string& name, T** value, S* rows, S* cols,
AnyParameterProperties properties = AnyParameterProperties(
"Unknown parameter", MS_NOT_AVAILABLE, GRADIENT_NOT_AVAILABLE))
"Unknown parameter",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the description, I would offer a ctor that does not ask for one, and then set it to the empty string or "No description given" in the ctor, but not here in SGObject.h

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great first start. We need a few changes and clean-ups, but this is essentially doing what we want already.
Maybe we can have a test that checks the bitmasking for one of the parameters ... just to be on the safe side ... :)

Gil added 2 commits November 9, 2018 10:56
* also shorted class member names
* changed default constructor to give default description
* -# GRADIENT_PARAM: the parameter is used in gradient updates
* -# MODEL_PARAM: model parameter that is trained, e.g. weight and bias
* vectors and bias
* The parameter properties can be either true or false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like!
Maybe also mention descriptions in the first sentence " ... of parameter meta information, such as properties and descriptions"

static const int32_t MODEL_PARAM = 0x00000100;
static const int32_t HYPER = 1u << 0;
static const int32_t GRADIENT = 1u << 1;
static const int32_t MODEL = 1u << 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah wait. the thing I had in mind was

static const int HYPER = 1;

and then

mask = (1 << HYPER) | (1 << MODEL);

This is now difference, as you are effectively storing masks themselves in these const members, not the number of bit shifts. Or am I wrong? Let me stare a bit more to understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean:

static const int HYPER = 0;
static const int GRADIENT = 1;
static const int MODEL = 2;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but start at 1

Copy link
Member Author

@gf712 gf712 Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if we have:

  static const int HYPER = 1;
  static const int GRADIENT = 2;
  static const int MODEL = 3;

  bool model_selection = true;
  bool gradient = true;

  int mask = ((1 << HYPER) & model_selection) | ((1 << GRADIENT) & gradient);

Then the above mask will always be false/0 no?

Copy link
Member Author

@gf712 gf712 Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean more like this:

int mask = (model_selection << HYPER) | (gradient << GRADIENT)

which is the equivalent of: shift bit at position X (1,2 or 3 in this case) if the function parameter is true.

Copy link
Member Author

@gf712 gf712 Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I remember why I did it this way.. Basically if you want to provide an interface to create your own mask like this

AnyParameterProperties("My description",  
                       AnyParameterProperties::MODEL |
                       AnyParameterProperties::GRADIENT)

you need to have the values of the static member with the shift. Unless I am missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean, but what I have seen (and been doing) would result in

AnyParameterProperties("My description",  
                       (1 << AnyParameterProperties::MODEL) |
                       (1 << AnyParameterProperties::GRADIENT))

It is not as neat, you are totally right!. BUT it can be hidden quite easily either.
I mean, it is your code, Ill leave it to you. Not sure whether it is possible to say what is better now, we will have to commit to something and then see if it works :) So go ahead with the above solution and then we see where we get

Copy link
Member Author

@gf712 gf712 Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Btw why does it need to start at 1, instead of 0?

Copy link
Member

@karlnapf karlnapf Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, no that was nonsense (just needed to make sure that you never get an all zero mask)

BTW I also realised why the micro-controller people store the number of bits rather than the full masks: it uses less memory (which is not really relevant here, but still good to think about it for a moment).

byte HYPER = 2
...
byte BLA = 255

so with an 8bit integer, you can do 255 attributes if you generate the masks on the fly a la (1 << HYPER). If you wanted to store the mask for each attribute instead (i.e. HYPER = 0x00000010 (each mask is 8bit), you would have to store 255 * 8bit, which is quite a difference , especially if you have only 4k memory as on an AVR ;)

How many parameters are registered during a shogun run? Let's say 100 is realistic. Then we have 100 byte usage if we store only the offset, and 2.5kB if we use the masks. In a data heavy environment, and since we won't register many many more parameters this doesn't matter. For 1000 parameters it is still ok, 10^6 will result in some pretty big differences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets typedef the actual type so we can change later.

m_mask_attribute |= GRADIENT_PARAM;
if (hyperparameter)
m_mask_attribute |= HYPERPARAMETER;
m_attribute_mask = (hyperparameter << 0 & HYPER) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking
(1 << HYPER) & model_selection | (1 << GRADIENT) && gradient etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's too low-level. Can we introduce a constexpr function for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah actually good point! :)

@karlnapf
Copy link
Member

Any news here? :)

@gf712
Copy link
Member Author

gf712 commented Nov 12, 2018

I did another round of clean up, but I haven't pushed it yet. Got a bit distracted looking into constexpr, as I have never used them before in my own code.. As for using a typedef I am not sure how it would be used as we are storing data in a int mask..?
Also if it is too level couldn't we use a structwith booleans instead? It would be a bit less space efficient, but would be quite easy to expand in the future?

@karlnapf
Copy link
Member

karlnapf commented Nov 12, 2018

I googled a bit and found this, maybe interesting?
http://blog.bitwigglers.org/using-enum-classes-as-type-safe-bitmasks/
A bit less old school ...

@gf712
Copy link
Member Author

gf712 commented Nov 13, 2018

So do you think we could just copy this? https://www.justsoftwaresolutions.co.uk/files/bitmask_operators.hpp
It seems to be the original code of that post and seems to cover most operations of interest.. Otherwise he wrote this more complete example.. https://github.com/Dalzhim/ArticleEnumClass-v2

@karlnapf
Copy link
Member

Yeah ok, I think this is cool (I think the first simple version is fine)

  • We do explicitly store the mask (1,2,4,8,16, etc), rather than the offset. Easy explicit usage AnyParameterProperties::MODEL | AnyParameterProperties::HYPER
  • we make this typesafe through the code snippet

@gf712
Copy link
Member Author

gf712 commented Nov 13, 2018

Is it worth overloading the enum operators in a separate header file or does it make sense to keep it all in AnyParameter.h?

@gf712
Copy link
Member Author

gf712 commented Nov 13, 2018

Is it worth overloading the enum operators in a separate header file or does it make sense to keep it all in AnyParameter.h?

@karlnapf
Copy link
Member

As it is general purpose code, I would copy it to lib or so

@karlnapf
Copy link
Member

E.g. lib/bitmasking.h

@gf712
Copy link
Member Author

gf712 commented Nov 14, 2018

Not too happy with:

m_attribute_mask = ParameterProperties();
if (hyperparameter)
    m_attribute_mask |= ParameterProperties::HYPER;
if (gradient)
    m_attribute_mask |= ParameterProperties::GRADIENT;
if (model)
    m_attribute_mask |= ParameterProperties::MODEL;

but given the restrictions that are imposed in bitmask_operators.h it seems to me to be the best solution..

@karlnapf
Copy link
Member

This comes from the fact that since it is now typesafe, you cannot & the mask with a boolean anymore?
It is not a big issue as we want to get rid of those booleans eventually anyways

// LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN
// CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
// CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could put "modified by G.F" here if you want

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok! I wasn't sure how it works with the boost software license!

!AnyParameterProperties::MODEL);
ParameterProperties::HYPER |
ParameterProperties::GRADIENT |
ParameterProperties::MODEL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whitespaces seem weird here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, and should change it to default to false instead of all true.

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@karlnapf
Copy link
Member

Ok, we might have to touch this again at some point, but I think we can move ahead now

@karlnapf karlnapf merged commit b45243c into shogun-toolbox:develop Nov 15, 2018
@karlnapf
Copy link
Member

Shall we have a little test with a minimal mock class where we register some parameters with certain properties and make sure everything makes sense?

Next step would be: Let's observe every model parameter when put is called.

@gf712
Copy link
Member Author

gf712 commented Nov 15, 2018

This comes from the fact that since it is now typesafe, you cannot & the mask with a boolean anymore?

Yes, and adding bool to the allowed types would defeat the point of using enum class.

@gf712 gf712 deleted the parameter_refactor branch November 15, 2018 11:11
vigsterkr pushed a commit to vigsterkr/shogun that referenced this pull request Mar 9, 2019
* combined parameter flags in a single mask
* added mask_attribute to keep track of parameter availabilities
* updated constructors and getters
* added some documentation
* changed properties default
* typesafe bitmasking
* refactored code to use enum class and bitmask operators
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
* combined parameter flags in a single mask
* added mask_attribute to keep track of parameter availabilities
* updated constructors and getters
* added some documentation
* changed properties default
* typesafe bitmasking
* refactored code to use enum class and bitmask operators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants