Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC: metadata implemented, REPL-help ported, user-made doc now possible (issue #3988) #5572

Closed
wants to merge 5 commits into from

Conversation

mauro3
Copy link
Contributor

@mauro3 mauro3 commented Jan 27, 2014

(updated 30 Jan)

I added basic support to associate metadata with most objects using @stevengj's suggested global dict as backend. It is really rather quite trivial but seems to work fine (except below issues). I use a function interface to the META dict so the backend could be changed. So, main thing here is whether we're happy with that function-interface or not.

As illustration of its power ;-), I ported the existing REPL-help system over into this system and it works as before, except that user-made help is now possible. I'll post a usage-demo to #3988. The help system as in this pull-request is not meant to stay like that. It's thought as a transitional state to something fancy.

So:

  1. Added basic support to associate metadata to most objects. The data is stored in the ObjectIdDict Base.META. However, interaction should be done through the functions: getmeta, setmeta! & hasmeta

  2. Ported the existing REPL-help system over to work with this system.

  • help and ? work as before (except 3-objects don't work, see test/help.jl)
  • user-made help is now possible through a @doc-macro and doc-function
  • Documentation of objects is kept in the metadata storage.
    Documentation of non-objects, e.g. keywords, and macros
    is kept in: Base.Help.NONOBJ_DICT.

Issues:

  • every now and then I get a ERROR: BoundsError() when initializing the help from helpdb.jl
  • There is a strange bug with eval not picking up its documentation, no idea why.
  • how can macros be referred to, other than representing them as a string?
    • I use now a separate Dict with string-keys for documenting macros and non-objects: keywords, &&, etc.
  • should some functions take the metadata into account? Examples are:
    • copy, deepcopy, sizeof

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 29, 2014

(Sorry the rebasing I did added lots of people to this pull-request. I'll not do that next time.)

@pao
Copy link
Member

pao commented Jan 29, 2014

Unfortunately the PR still has an unclear history and is not mergeable as-is.

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 29, 2014

@pao: Sorry, I'm a git noob: what should I do? I could restart and make a new pull request without all the history-garbage coming from the rebasing?

@pao
Copy link
Member

pao commented Jan 29, 2014

You can do that by resetting to master (git reset master), which will leave all your changes uncommitted, then add and commit your changes. Check that your patch looks right, then push it to your branch on github with the -f (force) option, and it will update the pull request. You don't need to create a new one.

I always recommend making a backup branch first (git branch my-backup-branch) just in case you need to undo (git reset --hard my-backup-branch).

(It's not the only way to cleanup history. You can probably also eliminate the extra commits by deleting them in an interactive rebase, git rebase -i master. I'm not sure which will work better here; I'd have to try it myself.)

@StefanKarpinski
Copy link
Sponsor Member

Taking a bunch of changes and sculpting them into targeted, logically separate commits is probably best done the way @pao recommends with git reset (after making a backup branch, although technically you probably don't need this since git reflog can recover where you were). One additional trick is that you can stage some changes using git add <files and dirs> and/or git add -p, which interactively asks you which parts of a change you want to stage. Once you've staged what you think is a reasonable, isolated commit, you want to test it. You can temporarily stash all the unstaged changes by doing git stash save --keep-index; after doing that, you can verify that what you plan to commit actually works. Once you've gotten it to work and committed it, you can do git stash pop to apply the rest of the changes and work on the figuring out the next commit. It's actually pretty easy once you get used to it.

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 29, 2014

Thanks for those explanations! I'll try them once I finish with the @doc macro and some tests.

@JeffBezanson
Copy link
Sponsor Member

One concern I have with this approach is that there is no clear way to clean up the dictionary --- it never releases references to anything. It also seems like we'd need to add special handling of this dictionary if we ever separately compile packages. A package would have to store its part of the dictionary, which would need to be merged in when loaded. That's doable, but why not just leave the per-package data in place instead of merging?

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 30, 2014

@pao & @StefanKarpinski : I tidied up the commits: you should be able to have a look at it now. (I did it with a git rebase -i master, thanks for the help)

I updated the original message to reflect all the features of the pull-request.

@JeffBezanson : I think that the main point is whether we are happy with the function-interface to the metadata. We can always swap the backend later if there are problems.

@stevengj and I had some discussion about the backend in #3988 along the lines of your concern. This pull-request implements more or less as @stevengj suggested. I think the good thing about this backend is that it's super trivial. So why not go with this for now?

Concerning improvement of the backend: The META dictionary could be based on a combination of a WeakKeyDict with a ObjectIdDict. This should clean up after itself, right? I could work on this if we think that this is generally the way ahead.

The nice thing about getting a metadata system into master would be that folks could start working on a proper documentation system.

@mauro3
Copy link
Contributor Author

mauro3 commented Jan 30, 2014

I just saw that the Travis-build failed for this. This seems to be a bug with evalfile not working properly when running in parallel and not a bug in the pull-request.

running in directory test/ the command:
../julia runtests.jl file help errors with the Travis error (can use any test instead of file)
whereas this runs just fine:
../julia runtests.jl help

This looks quite like this closed issue:
#3855
(and may also have something with the ERROR: BoundsError() I sometimes get?)

@mauro3
Copy link
Contributor Author

mauro3 commented Feb 2, 2014

Filed an issue for evalfile not working properly when running in parallel: #5648

@mauro3
Copy link
Contributor Author

mauro3 commented Feb 6, 2014

Even after #5648 is fixed, for reasons I could not debug, the tests for help.jl were failing in parallel. So now help.jl gets tested separately in serial.

Other than that, I would appreciate some comments on whether the metadata strategy as implemented in here has a future or not. Just say "no" if you see no future and maybe suggest/implement something else. (Again, note, my implementation of the REPL-help system is just thought as an interim solution, albeit fully functional. This is about metadata handling.)

@stevengj
Copy link
Member

stevengj commented Feb 6, 2014

@JeffBezanson, as a practical matter we are unlikely to be using the metadata for anything but documentation of const objects, for which garbage-collection is not an issue. In the longer term we can revisit the backend if needed.

@mauro3
Copy link
Contributor Author

mauro3 commented Feb 7, 2014

I cannot reproduce the Travis error, even when building from a fresh clone (gcc 4.8.2, arch linux 64-bit).

@stevengj
Copy link
Member

A WeakKeyDict sounds reasonable to me, though I'm skeptical that garbage collection of the metadata is actually necessary.

I really wish we could make some more progress on this.

@tknopp
Copy link
Contributor

tknopp commented Jun 12, 2014

Yes, lets make 0.4 the "documentation release".

@mauro3
Copy link
Contributor Author

mauro3 commented Jun 12, 2014

As @stevengj says, do we really need a weak reference dict? AFAIK modules persist once loaded, thus all their contents also persist. So the only time metadata should be garbage collected is when a user attaches it to type instances which then go out of scope. (or are there other cases?) Whilst bad, there are easier ways to shoot oneself in the foot. Plus the user could define another metadata storage for a specific type with this PR.

WeakKeyDict does not work for instances of immutable types:

julia> wd = WeakKeyDict()
WeakKeyDict{Any,Any} with 0 entries

julia> wd[pi] = 5
ERROR: objects of type MathConst{:π} cannot be finalized
 in add_weak_key at dict.jl:701
 in setindex! at dict.jl:726

Also, we probably want an object-id dict, because otherwise attaching metadata to instances of some mutable types (e.g. Dict) will not work once they change. Thus we need a WeakKeyObjectIdDict which falls back to a ObjectIdDict for immutables. The WeakKeyObjectIdDict does not exist yet, although one was started in #3002. A few month back I had a look at the weak and object-id stuff but couldn't figure it out. Maybe I'll try again.

Also, note that Jeff seemed to be opposed the other type of metadata storage, add a special field to most types as I suggested above: https://groups.google.com/d/msg/julia-dev/kBTqHsyP_0E/bgAltGuF9c4J

end

function getdoc(obj, default)
metadata = getmeta(obj, nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to simplify this pattern by defining getmeta(obj, key, default) to return getmeta(obj)[key] if it exists and default otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this! Yes that would probably be good. I tried to model the interface on associative collections, so this could be good.

@mauro3
Copy link
Contributor Author

mauro3 commented Jun 19, 2014

I put a WeakObjectIdDict together to work as back-end for metadata storage. The WeakObjectIdDict still needs a lot of work but it's here as a "proof of concept". Also I updated the function-interface to it as suggested by @stevengj.

This solves @JeffBezanson concern of releasing references to both mutable and immutables as they get garbage collected (as far as I understand it and the WeakObjectIdDict probably needs some tweaking for immutables).

This does not solve @JeffBezanson concern of keeping the metadata with the package it belongs to. When loading a package the META dict would have to be updated. Anyway, where would a package specific META go, without the package author having to provide a dictionary within his/her package? One way could be to make META a Dict{Module, WeakObjectIdDict} and add individual package-meta-dicts to that. Macros still would need special treatment though and for variables it is a bit tricky to find the module they belong to (see https://groups.google.com/d/msg/julia-users/I45YliJ34EM/5gBBkMRrFw0J).

@mauro3
Copy link
Contributor Author

mauro3 commented Jun 20, 2014

Maybe a META dict could be added to the module when it is made un-bare (http://docs.julialang.org/en/latest/manual/modules/#default-top-level-definitions-and-bare-modules).

@mauro3 mauro3 mentioned this pull request Jun 21, 2014
8 tasks
@stevengj
Copy link
Member

Honestly, keeping metadata with the package seems like backend implementation detail that could be fixed later.

   data is stored in the ObjectIdDict Base.META.  However, interaction
   should be done through the functions:
   getmeta, setmeta! & hasmeta

2) Ported the existing REPL-help system over to work with this system.
   - `help` and `?` work as before (except 3-objects don't work, see test/help.jl)
   - user-made help is now possible through a `@doc`-macro and `doc`-function
   - Documentation of objects is kept in the metadata storage.
     Documentation of non-objects, e.g. keywords, and macros
     is kept in: `Base.Help.NONOBJ_DICT`.

Tests in test/metadata.jl and test/help.jl

Issues:
- every now and then I get a `ERROR: BoundsError()` when initializing
  the help from helpdb.jl
- There is a strange bug with the help test running in parallel:
  -> fixed the test by moving the help.jl test into a serial execution at the end
currently a very crude implementation but it seems to work.

Made changes to the interface as @stevengj suggested.

Oddness: APROPOS_DICT cannot be a WeakObjectIdDict for some reason
Test are passing/failing somewhat eratically, probably due to the
WeakObjectIdDict.
@mauro3
Copy link
Contributor Author

mauro3 commented Nov 12, 2014

Superseded by #8791

@mauro3 mauro3 closed this Nov 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants