Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working on documentation update - feedback desired #92

Closed
drallensmith opened this issue Jun 25, 2017 · 22 comments
Closed

Working on documentation update - feedback desired #92

drallensmith opened this issue Jun 25, 2017 · 22 comments

Comments

@drallensmith
Copy link
Contributor

drallensmith commented Jun 25, 2017

NOTE: I am now working on documentation in my fork's config_work branch; see drallensmith.github.io for a recent version.

Hi. I'm working on a documentation update, based on the current master files (although I'll need to pull the most recent few changes, of course...). It's nowhere near where it should be for even thinking about merging, but I'd appreciate any feedback people want to give on how it's going so far. I've added a glossary, started on putting in some module summaries, enabled an index (and a module index), and a few other things. I've put it in a docs_update branch on my fork https://github.com/drallensmith/neat-python/tree/docs_update and have also put up the sphinx-build html version at https://drallensmith.github.io/.

-Allen

@drallensmith drallensmith changed the title Working on documentation update Working on documentation update - feedback welcome Jun 25, 2017
@drallensmith drallensmith changed the title Working on documentation update - feedback welcome Working on documentation update - feedback desired Jun 28, 2017
@drallensmith
Copy link
Contributor Author

I have opened the "Issues" mechanism on both repositories, if people prefer to comment that way, BTW. It now has a rather more complete set of module summaries including interfaces (although I'm still working on expanding them further; these are linked to a highlighted copy of the source), clarified some things regarding defaults in the configuration file, and quite a few other things. I'd greatly appreciate some feedback at this point, although I'll continue to work on it on my own. As well as completing the module summaries, I'm also thinking of putting in more material in the NEAT summary, regarding speciation, genomic distance determination, and similar matters.

-Allen

@drallensmith
Copy link
Contributor Author

Going through the code for the documentation lead me to find the above-referenced bug, incidentally. "My" documentation version does not have the code correction, incidentally.

@CodeReclaimers
Copy link
Owner

Hey thanks for improving the docs! I'll try to carve out some time over the weekend to check it out.

@drallensmith
Copy link
Contributor Author

drallensmith commented Jun 30, 2017

Quite welcome; I'm partially doing it as part of teaching myself Python, and partially to keep in practice doing technical editing/writing, and partially because I like helping... plus I'm a perfectionist. :-} And I understand on time limits...

-Allen

@drallensmith
Copy link
Contributor Author

I'm now working on documentation for my fork's config_work branch (mentioned some places in the current documentation); I will work on the documents for the update by @bennr01 as part of that.

-Allen

@bennr01
Copy link
Contributor

bennr01 commented Jul 17, 2017

@drallensmith thanks for improving the documentation and for documenting #96. If you need any explanation or help on the features i added, please tell me.

@drallensmith
Copy link
Contributor Author

Well, I've added the threaded.py module to module_summaries.rst in my fork's config_work branch; I did have one question regarding _worker() - what would stop it by turning off self.working?

For distributed.py, I have the difficulty that node is extensively used in its other meaning in the rest of NEAT, and is in the glossary as that; OTOH, it does seem to be the standard term in distributed computing. None of the code itself uses node as a method/variable/whatever name, which does make it easier to substitute another term. Machine/host? Process? I'm currently favoring "process".

BTW, multiprocessing.cpu_count() can cause a NotImplementedError (I've modified it to be in a try wrapper) and can also be inaccurate - it returns 4 for my 2-core 1-processor machine (users may need to check this per-machine; if available - 3.5+ on some Unix varieties only, not including OS X apparently - len(os.sched_getaffinity(0)) might work better).

@bennr01
Copy link
Contributor

bennr01 commented Jul 19, 2017

@drallensmith the loop used by _worker() is a while self.working loop. The condition (self.working == True) is checked after one genome has been evaluated or self.inqueue.get() could not return a genome to evaluate within 0.2 seconds. This means that the worker loop checks at least once every 0.2 seconds whether it should still be working and return otherwise. You could actually simply stop the worker by setting self.working to False.

"Process" would be the most accurate term, but it still has two issues:

  1. The auto-mode determination can only check whether the node should be the master node, it can not check if a process should be the master process. This may lead to confusion.
  2. A slave node may start multiple worker processes to evaluate the genomes.
    However, both of these issues are minor and a simple note or hint in the documentation should be enough to avoid confusion.
    Another alternative would be to add these terms as (Computer)Node in the glossary and add something like Not to be confused with computernodes, which are used by... to the normal node definition in the glossary.
    But changing the term should be enough.

Regarding multiprocessing.cpu_count(): Good catch. But i think you misunderstood something about the return value of multiprocessing.cpu_count(). It does not return the number of physical cores, but the number of cores as the OS sees them. For example, when using processors with hyper-threading, each core can execute two threads. The OS thinks it has twice the number of cores it actually has. I have an I7-4770k (4-core) with hyper-threading, which means i have 4 cores which can each execute two threads. The OS think i have 8 cores.
This is probably also the case for your processor. If you have a 2 core machine with hyper-threading (most Intel processors use hyper-threading), your OS will think it has 4 cores.

@drallensmith
Copy link
Contributor Author

drallensmith commented Jul 19, 2017

Hmm... my question may not have been clear. Ah. Given that all of the implementations of Python that I'm working with use a GIL, I haven't looked at thread much; I hadn't realized that self.working would be shared between the threads. (BTW, I just modified my copy of threading.py to wrap the import threading in a try and import dummy_threading as threading instead if there's an ImportError, with a warning if someone tries to use ThreadedEvaluator. Ditto for distributed.py, without the warning - where would it be appropriate to put in?)

Perhaps "compute node"?

I suspected that hyperthreading would be why it would return more than the number of physical cores. However, hyperthreading (at least in a 2011 machine; some things apparently changed with 2015+ Intel processors) primarily benefits non-cpu-bound threads, as I understand it; it allows them to wait on I/O (usually) while being able to quickly be switched into execution when that other resource is available. In other words, whether your machine, for instance, should be doing 4 or 8 evaluation processes will depend on whether the fitness function is CPU-bound or not. I'll see if I can put something into the documentation about this. (I confess I hadn't thought about this re hyperthreading quite this explicitly before; I just know that anything above a load average of 2 radically slows down the machine.)

@drallensmith
Copy link
Contributor Author

drallensmith commented Jul 19, 2017

Oh - BTW, @bennr01, did you have a chance to take a look at #97 to make sure it's OK? I'm currently checking re Travis and OS X - it looks like it may have been a temporary error... sigh. No, "just" an intermittent one; I've put in an issue with Travis.

@bennr01
Copy link
Contributor

bennr01 commented Jul 19, 2017

@drallensmith. Yes, self._working (actually, the whole instance self) is shared between threads (BTW, this is one of the reasons why python contains a GIL).
Regarding hyper-threading: I am definitely not an expert in this field so i am probably wrong. But when i use 4 processes on a 4 core hyper-threading processor it does not seem like the processor is used efficiently (only 4 of the 8 CPUs listed seem to be used on Debian). Also, IIRC, the term I/O bound in the processor may be confusing (at least for me). I was under the impression that hypter-threading was mainly used to make RAM (and maybe cache) access more efficient by performing another instruction while waiting on the result of the last load/store instruction. Maybe that was what is meant with I/O? Its probably the best to let the user decide what value to use.

The term compute node is fine.

Regarding distributed.py: Do you want to wrap the import of multiprocessing or the import of threading in a try:... except: ... block:
In case of multiprocessing: we could also try to remove the import of neat.distributed from __init__.py. This way the distributed module would need to be loaded explicitly. Otherwise it may be better to put the try: import neat.distributed except ImportError: distributed=None in the __init__.py file. distributed is unusable without multiprocessing, so it may be better to raise an Exception than to warn the user.
In case of threading: It may be possible to remove threading completely from distributed.py and replace stop_event = threading.Event() to use an Event returned by self.manager. Maybe something like self.stop_event = None instead of stop_event = threading.Event() and callable=lambda: self.stop_event if self.stop_event is not None else self.manager.Event()? (Note: these changes are untested).

I just took a look at #97. Thanks for trying to fix/improve host_is_local(). The changes look good. Though i am a bit surprised that socket.getfqdn() returns an address for reverse DNS lookup.

@drallensmith
Copy link
Contributor Author

drallensmith commented Jul 19, 2017

I/O bound was an unfortunately confusing example - the real question, as far as I can tell, is whether the process would be sitting around waiting for something other than access to the cpu's execution machinery. However, I am most definitely not an expert in the field either. Perhaps users should be advised to experiment, and that it may be eval_function-dependent?

OK, I'm going with compute node.

Regarding distributed.py: My concern is with threading - note that it is sufficiently frequently absent to require a dummy_threading library module. I'll have to take a further look at various Event() codes to comment further...

Regarding host_is_local(): Quite welcome. I found it surprising myself (especially that it was an ip6 address on a machine used entirely for ip4... will have to check if the local ISP does ip6.). socket.getfqdn (3.5 documentation, but IIRC 2.7.13 is the same) states that it uses the hostname returned by socket.gethostbyaddr(), and selects the first name returned that contains a . - not, say, socket.getaddrinfo() with the socket.AF_CANONNAME flag and using the canonname returned (probably because that's likely to be even more os-variant...). BTW, socket.gethostname() states that it doesn't always return a fully-qualified domain name, and to use socket.getfqdn() for that... from the results so far, this does not seem advisable. (Perhaps the hostnames from both before and after socket.getfqdn should be checked, for both this and the passed-in hostname?)

@drallensmith
Copy link
Contributor Author

Re multiprocessing.managers.SyncManager.Event: It turns out that this uses threading.Event() internally anyway. It looks like substituting dummy_threading if threading is unavailable will work best. (Admittedly, I am a bit surprised that a threading Event would work on different machines...) BTW, speaking of events, given that distributed.py's evaluate() is for master mode only, I am confused as to why the if stopevent.is_set() exiting code is present.

@bennr01
Copy link
Contributor

bennr01 commented Jul 19, 2017

@drallensmith regarding multiprocessing.cpu_count(): That would probably be the best.
Regarding host_is_local(): I have to admit that some time has passed since i last read the documentation of socket so i was not aware that these functions are actually OS dependent.
Regarding (i like this word, BTW) Event: IIRC most synchronization primitives implemented by dummy_threading only work as much as is needed to make the code run somehow. For example, i think dummy_threading.Lock().acquire() does not actually wait until a release() has been called.
I just noticed that there is no real reason of using an Event anyway. The original idea was to reduce network overhead by using a Event implemented by the manager instead of an proxy of the Event, but then i used treading.Event instead. It may be possible to simply synchronize / use a proxy around a simple boolean value.
The if stopevent.is_set() check in evaluate() is for the case that some other thread (like a GUI) called stop. In this case, the slave nodes would stop evaluating, but the evaluate() call would still try to wait for the results. The check exits in case this happened.

@drallensmith
Copy link
Contributor Author

drallensmith commented Jul 19, 2017

@bennr01 Regarding host_is_local(): Now that I think about it, I'm actually now uncertain whether it's OS dependence causing problems, or dependence on whether or not the machine is behind a cable modem+router doing NAT (as in my case), with no registered domain name.
Regarding threading and Event: I am not surprised re (the short form of "regarding", I suspect... mainly used for memos and subject headers, but I like using it other places) dummy_threading. A proxied boolean variable sounds good.
Regarding evaluate(): Got it; good thinking!

@drallensmith
Copy link
Contributor Author

drallensmith.github.io has an updated version of the documentation for my fork's config_work branch, including preliminary documentation for distributed.py and threaded.py. @bennr01 , if you could take a look at that, I'd appreciate it. I've also done some preliminary pylint, etc type fixes to those files and the corresponding test files; again, please take a look!

@bennr01
Copy link
Contributor

bennr01 commented Jul 21, 2017

@drallensmith The documentation of threaded and distributed is nearly perfect, but i noticed a few minor mistakes, most of them being simple errors in the formatting.

in distributed:

  • In usage step 7: py:meth:de.stop(). Formatting error?

  • The description of MODE_ and RoleError is bold. Is this intended? This seems to cause some formatting errors. Also, ModeError instead of RoleError should be fine. Thanks for the suggestion.

  • In DistributedEvaluator:

    authkey (bytes) – The password used to restrict access to the manager. All DistributedEvaluators need to use the same authkey. Defaults to the authkey attribute of multiprocessing.current_process(); however, this will not be the same for processes on different machines, including virtual machines.

    While the authkey used by multiprocessing.managers.SyncManager defaults to the authkey attribute of multiprocessing.current_process(), The DistributedEvaluator has no default for authkey and thus expects it to be passed to DistributedEvaluator.

@drallensmith
Copy link
Contributor Author

drallensmith commented Jul 21, 2017

@bennr01 - thanks! I've corrected the above; an updated version should be up. (Regarding the last of the above, I am still getting my brain to understand the differences between Perl's undef and Python's None, I suspect.) I also noted that authkey is type bytes, not type str, although they are equivalent on 2.7. (The usage instructions thus need b'some_password', for instance.) I also have a couple of further TODOs noted - one possible one re terminology for distributed.py, for instance.

A documentation TODO that isn't for your code, but I'm curious if you - or anyone else - could help me understand, is for six_util, in which it's adding a parameter to (inside the parentheses of) keys()... This is... strange.

@bennr01
Copy link
Contributor

bennr01 commented Jul 21, 2017

@drallensmith when defining and/or calling a function in python, there are 4 different types of arguments.
Lets assume we have a function def f(arg1, arg2="abcd", *args, **kwargs). I assume you know about arg1 and arg2, so i just skip the explanation about them. *args and **kwargs are special arguments. They can have any name (as long as they start with */**), but can only be defined once in a function. They consume additional arguments. In case additional arguments are passed to f() without defining a keyword (e.g. f("a", "b", "c", "d", "e") ), they are available as args. In this example, arg1 would be "a", arg2 "b" and args would be ("c", "d", "e") while kwargs would be an empty dictionary. When additional keyword arguments are passed (e.g.g f("a", arg2="b", arg3="c", arg4="d"), they are added to the dictionary kwargs. In this example, arg1 would be "a", arg2 would be "b" and kwargs would be {"arg3": "c", "arg4": "d"}.
This also works in reverse. When you call a function, you can use f2(arg1, defined_arg="test", *more_args, **kwargs) to use the content of a tuple (in case of *args) or a dictionary (in case of **kwargs as arguments.
neat.six_utils.iterkeys(d, **kwargs) requires only one argument to be called (d), but accepts any number of keyword arguments. However, it does not accept more than one non-keyword argument.
The following calls to itekeys() are valid: iterkeys({}), iterkeys({}, test="test", "wasd"= "xyz", python="love"), iterkeys(d={}, "test"="i am running out of values"), iterkeys({}, **{"test": "???"}), iterkeys(**{"d": {}, "arg1": "..."}).
The following calls are invalid: iterkeys({}, "test"), iterkeys(), iterkeys(some_arg={}).
important: i am not realy sure if these calls are valid. They are valid for the explanation and from the look at the definition of six_utils. However, **kwargs is passed to d.iterkeys, so the kwargs defined in **kwargs have to be valid for d.iterkeys(), which i did not check. I hope you could still understand this explanation.

Regarding authkey: Sorry, my mistake. I mainly use python 2 and not python3.

I could not yet take a look at the other TODOs.

Regarding None: Should i try to explain the difference to you or did you mean that you do know the difference but are often forgetting the difference?

@drallensmith
Copy link
Contributor Author

drallensmith commented Jul 21, 2017

@bennr01: Please take a look at issue #101 if you would.

Regarding iterkeys (and similar for the other two): Thank you; I think I wasn't clear. My puzzlement is how the function is working internally - it's calling, for Python3, d.keys(**kw); for Python2, d.iterkeys(**kw). I am confused by calling, effectively, keys(self, **kw) with self being 'd', and similarly for iterkeys. I've checked the documentation for keys (Python3) and iterkeys (Python2) and none mention putting anything in the parentheses. Yes, it's apparently valid... I'm just trying to figure out what's happening (and why nothing else is mentioning it). Why is it doing it? My best guess is that it's in case it gets fed a series of key=value pairs... but as well as a dict, which still seems rather odd (the d, as you noted, should still be required; OTOH, **kw in the call as opposed to the function definition should result in the key=value pairs being converted into a dict, IIRC). I may try inquiring via the six github issues page (or equivalent elsewhere) if it keeps frustrating me. (I've actually been working with *arg and **kwarg myself in my fork's multiparam_funcs branch, in which I'm working on activation and aggregation functions that, as well as the usual inputs, take one or more evolvable parameters. The docs directory for it has some multiparam activation func images to show what sort of thing I'm meaning.)

Regarding authkey: No problem, I usually use python2 myself, but do try to look in both 2.7.13 and 3.5's libraries. (I believe it's bytes because of network issues and that it is evidently being used in a cryptographic hash procedure or equivalent - one way to do it would be for the master to send a random token and the slave to send back hash(authkey + token).)

Regarding other TODOs: Again, no problem, I understand fully.

Regarding None: I do know the difference, but am often forgetting it, or more importantly often forget how it affects, for instance, function calls. (Ironically, with Perl I set the "undefined variable" warning to a fatal error, matching Python's behavior, but am used to reading other people's code that does not.) Thanks for the offer to explain!

@CodeReclaimers
Copy link
Owner

Again big thanks for the massive update to the docs!

@drallensmith
Copy link
Contributor Author

Again, quite welcome; happy to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants