Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guided tour to the CPython source code #6

Open
gvanrossum opened this issue Jul 27, 2016 · 53 comments
Open

Guided tour to the CPython source code #6

gvanrossum opened this issue Jul 27, 2016 · 53 comments

Comments

@gvanrossum
Copy link

gvanrossum commented Jul 27, 2016

I propose that we collaborate on creating a guided tour to the CPython source code. I think such a thing is vastly overdue, and I am not aware of existing resources (though I'll gladly take pointers that prove me wrong).

I've more or less promised @emilymorehouse to help her get started with something like this, and I'm happy to put time in. @willingc I hope you think this would be a useful document to maintain in this repo? I suppose it would go under the cpython tree.

The proposed document differs from the existing guides like the devguide -- it doesn't tell you anything about the patch submission flow, nor even how to get things to build -- it should focus on how to read the CPython source code (using your editor of choice).

Some topics could include:

  • structure of the source code (what goes into Include, Objects, Python etc.)
  • bytecode (could start with @akaptur's videos)
  • reference counting
  • how to find the definition of something
  • how to find where a given error comes from
@brettcannon
Copy link

Some of this is lightly touched in the devguide (e.g. http://cpython-devguide.readthedocs.io/en/latest/setup.html#directory-structure), but in no way thoroughly.

@gvanrossum
Copy link
Author

Ah, thanks. I would like to go a step deeper for the directories containing most of the fundamental C code:

  • Include
  • Objects
  • Python
  • Modules
  • Parser

Other topics should probably include

  • how the parser works
  • how to define an object type (or is this in the C/API docs already?

@lorenanicole
Copy link

@gvanrossum are you soliciting assistance on this from others that aren't core devs too?

@brettcannon
Copy link

For documentation about defining a type, it depends. There's the C API tutorial and then there's xxsubtype.c.

@lorenanicole Guido can say if I'm wrong, but there's absolutely no reason to restrict this to core devs. Honestly, it would be best to have someone who isn't a core dev involved so no bad assumptions of pre-existing knowledge muddles the tour.

@gvanrossum
Copy link
Author

gvanrossum commented Jul 27, 2016

@lorenanicole: What Brett said. The core devs already know this and probably have huge blind spots. My own blind spots are even bigger. (I wrote some of the stuff Brett pointed to and I had forgotten about it. :-)

I also believe that a lot of the existing docs in this area assume lots of other skills that aren't so easy to come by. While I don't think we should include a C tutorial, there are a lot of C patterns that aren't unique to Python but are still worth explaining in some detail, either because Python's version has certain important details, or just because they aren't that widely known.

Some more random topics:

@brettcannon
Copy link

  • How to handle errors (e.g. when to use a goto, making sure you deref appropriately)

@willingc
Copy link
Member

@gvanrossum Love this idea! It's similar to what I want to do with notebooks and Phillip Guo's grad school course. It also captures well the spirit of our PyCon conversation.

All,

The next paragraph that I'm writing is done with gentle kindness, respect, and thoughtfulness:

I do want the contributors to the documentation developed here to primarily or at least 50% written by those that are not core developers. A few reasons:

    1. to reinforce truth that technical talent already exists in the PyLadies community;
    1. to best communicate technical information in a manner that resonates with the PyLadies interested in being mentored (of course, all information will be openly available on GitHub so no one is being excluded.
    1. to provide an opportunity for PyLadies to own some of the maintainer responsibility for the document (i.e. merge privileges)
    1. to simplify the contribution process, review, and merging of changes.

Just open a PR with the folder name and outline that we would like to iterate on over time.

Also, please let me know if you would like to be added as a maintainer on the repo. I'll be on vacation with limited internet access beginning Friday to the 2nd week of August.

@jackiekazil
Copy link
Member

IMHO, I think that it would be great to frame out by core devs and then have non-core devs tear it apart, ask questions, restructure and edit. However, I do not do work like this every day. But @Bradamant3 does! :-)

@Bradamant3, do you have thoughts on ways to approach this?

Background on @Bradamant3
Recent talks she has given...

@Bradamant3
Copy link

@jackiekazil and all -- I am honored to be mentioned here, and I'd be thrilled to be involved. I'm anything but a core dev -- I'm still pretty new to Python, still working my way through Learn Python the Hard Way, although I've taken a couple of handfuls of programming classes (no C, sadly) and write sample code for part of my living. (I'm a technical writer by trade, on the more technical side of things.) I'll need to spend some time over the next week looking through what's already in the repo and the other resources y'all have mentioned to see where I think I can best help out. I have much to learn, and if I can help others learn by documenting my own learning as I help out with this project, that would be the biggest thrill of all.

@willingc
Copy link
Member

Hi @Bradamant3, We would love for you to be part of this. There's a special place in my heart for technical writing since it's key to breaking down barriers for users to use your software and critical for onboarding developers 🌻

Please feel free to ask questions (large or small). Looking forward to you helping out.

@akaptur
Copy link

akaptur commented Jul 28, 2016

My bytecode chapter for the Architecture of Open Source Applications book might be a better example than my bytecode talk (although the content substantially overlaps): http://aosabook.org/en/500L/a-python-interpreter-written-in-python.html

One of the hardest parts of writing the chapter was getting clarity about the intended audience and the intended goal. I eventually settled on an audience of Python programmers who want to learn more about the language for its own sake. (I personally don't think knowing about the guts of the interpreter helps you write better Python, but I do think it helps you deepen your knowledge of computer science and is darn fun.)

Given that, for this project, who's the intended audience, and what's the goal? What is the reader hoping to get out of the guides?

Some possible goals I can imagine:

  • Onboarding people who want to be core developers
  • Providing shortcuts, pointers, and hints for people who want to contribute to CPython but haven't worked with a large codebase before
  • Satisfying the curiosity of people who want to know how stuff works (I heard someone describe this recently as "Be the Julia Evans you want to see in the world")
  • Teaching people fundamental programming & debugging skills (e.g. Guido's last two bullets above)

We don't need perfect agreement on the goals, but I think some clarity early on will pay off significantly later on!

With more clarity on the goals, I'd be happy to contribute to this. I think the object model and garbage collection would be particularly rich topics.

@Bradamant3
Copy link

@akaptur and all:
Just some preliminary thoughts as I look superficially through SO MANY AMAZING RESOURCES and read Allison's brilliant questions about audience:
It seems to me that Allison's goals are fundamentally all compatible. Folks who want to be core developers may not have worked with a large codebase, they may have found the Python community to be as amazingly welcome as I have and therefore find it the right place to satisfy their curiosity -- and addressing all the above can also help teach those crucial fundamental programming and debugging skills. (These are all personal goals of mine, and I have zero aspirations toward becoming a professional developers. Just sayin'. There's a huge contribution to make to the world by making it easier for as many people as possible to understand these things.)
+10K to object model and garbage collection (as someone who has far more to learn about both than she currently understands)
also +10K to @jackiekazil's suggestion about framing out by core devs and then letting the rest of us adjacently-obsessed have at it.

@lorenanicole
Copy link

+10k as well to all the above :-) once a more definitive outline is settled I would love to step in and help. @jackiekazil + @willingc I'll defer to you two for directions.

@willingc
Copy link
Member

Nice to see @akaptur here. Hope to see you at PyBay.

One of the things that I have wanted to do for over a year is to edit down @pgbovine's great 10 hour course http://www.pgbovine.net/cpython-internals.htm that walked through the Python2 codebase. I wanted to create a series of Jupyter notebooks that would correspond to each of his videos and link to source code. Here's a link to one of the notebooks that I started last year: https://github.com/willingc/pyladies-cpython/blob/master/Notes%20on%20Lecture%201.ipynb

I'm wondering (if @pgbovine is cool with us referring to the content and @gvanrossum and @brettcannon see value) if we could use the @pgbovine course as a starting point and outline linking out to other resources and docs as appropriate. Perhaps creating a study group to do two lectures a month or something.

This is everyone's group so no need to defer to me. If you have a good idea, please run with it. Personally, I want to see something that digs into the guts and I see some value in the video lecture combined with written word.

Overall, I'm better on code guts than usability so @Bradamant3's talent and insights are very valued. Thoughts?

P.S. @akaptur I think your bullet list covers the audience.

@willingc
Copy link
Member

Also leaving for Japan for a long overdue vacation. @jackiekazil, would you mind working with @estherbester and @audreyr to get yourself and others push access to this repo? I tried to add folks last night but needed org access not just repo admin access.

@annakoppad
Copy link

I am wiling to contribute! Thanks

@brettcannon
Copy link

My only worry with videos that are tied to something like the internals of CPython is they will become outdated and videos are nowhere near as easy to update as written documentation. It's one thing for a professor to record lectures he's going to give anyway, and it's another to get a volunteer to re-record an entire video for some key type just because we tweaked some detail in a release. Now that's not to say people couldn't do videos, but I think they should be entirely ancillary to anything written and simply something that could be pointed to instead of building something around the videos.

@pgbovine
Copy link

Thanks for the kind words! Yes, you have my permission to use and remix the content as you wish. Please add a link back to the original source material webpage ( http://www.pgbovine.net/cpython-internals.htm ) in the appropriate place(s). Best wishes.

@gvanrossum
Copy link
Author

gvanrossum commented Jul 29, 2016 via email

@jackiekazil
Copy link
Member

@gvanrossum it depends.

@lorenanicole
Copy link

Video sounds quite exciting however I think that's a stretch goal? Let's start with text then we can regroup and see about video.

@willingc
Copy link
Member

Text is fine by me. Let's proceed forward with an outline. I'm really pleased to see the interest building 😄

If I have time when I return from Japan, I may do the Jupyter notebooks as a work related thing since there is no additional video production needed. Something that could be used with some Jupyter/JupyterHub work that I'm doing for education.

@brettcannon
Copy link

OK, so with a general agreement to start out with a tutorial document, what's the next step? Outline? Basically this sounds like it's going to be a mix of "this is how Python works to help you debug problems", and "this is how Python is structured to help you find your way around".

If we need a jumping-off point then simply starting with explaining what's in all the top-level directories of a source checkout is as good a place as any. That can feed into navigating around, e.g. you can always use a ^ regex anchor on a function name in C when searching in the source to find its definition thanks to how we format C code. Then once we think we're done explaining how Python is structured code-wise we can then start talking about how stuff works to help with debugging.

@gvanrossum
Copy link
Author

I just got off the phone with @emilyemorehouse who is also interested in this issue. I promised her one specific deliverable: an overview of what's in the most important top-level directories and the main entry points underneath there. I will spend some time today writing up what I can.

While we're collecting useful links, maybe my old Python history blogs could be of some use. There's a wealth of technical information there: http://python-history.blogspot.com/. A few highlights:

@emilyemorehouse
Copy link
Collaborator

@brettcannon I think that an outline is a great next step. What's the best way for us to do this?

I've started on a set of docs based on some of @gvanrossum's direction that I've divided into a few sections -- resources (with tracking on what has been scoured for information), notes on meetings/project goals, notes from resources, and notes from digging into CPython itself. I've currently only gone through @akaptur's Bytecode talk but I've got a decent list of resources.

My personal goal from this is to be able to gain a strong understanding of the codebase in order to work towards becoming a core contributor, so I'm certainly going to lean more towards documenting how Python and its code works.

I'm more than happy to share all of my docs, both here and from a personal repo as not all notes will be entirely relevant. @jackiekazil, I can start adding if you give me access.

@brettcannon
Copy link

@emilyemorehouse Without settling on an overall direction exactly, I don't know if we could do an outline, hence why I suggested we just start with an overview of the directories and see where it takes us. Part of the problem is I don't know what does or doesn't need to be covered for a new contributor as I have 13 years of knowledge which completely separates me from a beginner's perspective of what's not obvious. Otherwise I would just start from what it takes for someone to fix a bug and trying to cover each of the steps (navigating the code, how Python works to understand how to diagnose, and how best to write C code).

@Bradamant3
Copy link

As the newest newcomer, but with some background in helping get similar projects off the ground, may I make a suggestion?
It seems to me that some of us could work from the docs that @emilymorehouse has offered to share, to come up with an outline (and as we see fit, work from other resources already listed as well). One item in that outline would, of course, be the overview of the directories that @brettcannon and @gvanrossum (and really everyone) have called for.
So ... consider the directories overview and the outline perhaps as two branches of the project? (Not suggesting a repo structure, just a metaphor :-) )
Again as that newcomer, I'd look for more than one entry point. My tendency is to go for the big picture -- give me an overview of the code structure and I'll start rummaging (looking only) to familiarize myself with the forest map as a whole. Only later will I start looking for information about "what it takes for someone to fix a bug" (after all, someone has to learn to identify and prioritize said bugs first, right?). But others might want to dive in and find a nice tree or two to work with first. I think there's enough direction in this issue discussion already for work to begin on both outline and overview.

@gvanrossum
Copy link
Author

Ladies and gentlemen, I have something to share.

I spent a few hours writing a rambling draft that walks you through what happens when Python starts up, from main() to the >>> prompt. It's a work in progress, but I'm sharing it here early in the hope that it's already useful. I suppose I should eventually post this as a series of blog posts -- or we can work together on turning this into a useful and more structured document to be committed in the pyladies-maintainers repo. Thoughts?

For now I wrote it using a new Dropbox product, "Paper".

READ THIS BEFORE YOU CLICK ON THE LINK: Because Paper lets you comment and edit, it will also reveal your identity to others viewing the document, even if you're just viewing yourself, unless you use anonymous browsing.

https://paper.dropbox.com/doc/Yet-another-guided-tour-of-CPython-XY7KgFGn88zMNivGJ4Jzv

@willingc
Copy link
Member

willingc commented Aug 1, 2016

@gvanrossum Thanks for sharing this early. It's wonderful. I really like the conversational style. I love the look and readability in Paper too.

Huge +1 to adding this to this repo in any form that you and others like. Thank you for doing this, Guido.

@matrixise
Copy link

Hello,

I have received a notification from @lorenanicole about this thread because I am interested with this topic.

In fact, I have presented my talk https://speakerdeck.com/matrixise/exploring-our-python-interpreter at EuroPython 2016, PythonFOSDEM 2016, PyCon.CA 2015 and PyCon.IE 2015. This talk has been shared on the core-mentorship mailing list. If you are on this mailing list, here is the link to my message: https://mail.python.org/mailman/private/core-mentorship/2015-November/003274.html

I have already discussed with Victor Stinner (@Haypo) about an eventual book on the topic where I would like to present the internals parts of CPython, and he is interested.

In fact, with my talk, I have observed that some people ask me how to start with the contribution to CPython (ok, I am not a very active contributor, but I try in function of my time) and they think that a good introduction or just a good "book" is a good starter.

For my part, I wanted to start the project during August, just after EuroPython with some articles or maybe with a GitBook, trying to work on a Table of Content and after that the content.

So, I am interested because I want to become a contributor of CPython, fix some issues in the interpreter, learn for me and explain to everybody.

@brettcannon
Copy link

Just an FYI, @matrixise , the core-mentorship mailing list is private so only members of that list can see your link.

@matrixise
Copy link

maybe a good reason to subscribe to this mailing list (not you, I know you are already subscribed 😉) but you are right

@gvanrossum
Copy link
Author

By the way, I'm looking for help on the topic of tools for spelunking C code. Emily started reading my guided tour and discovered that she could use a better editor to navigate such a large code base. My own setup is actually quite crusty (Emacs and make TAGS, then ESC-period and ESC-comma) and I have no idea what would be the state of the art. Eclipse? IntelliJ? Vim plus extensions? What do other people use? Is there useful lore about this topic that is helpful if you haven't been writing Linux kernel code for 30 years?

@emilyemorehouse
Copy link
Collaborator

I actually found a great plugin for Sublime using CTags and it's a huge improvement. Goto definitions actually work the way they should now.

I think the bottom line is that if you're staying away from a full IDE, tags are the way to go.

@brettcannon
Copy link

I don't know if we want to go down the route of IDE/editor-specific recommendations w/o being very selective of only editors core devs actively use in case someone asks for help (suggesting you find an editor/IDE that supports "goto definition" seems entirely reasonable to me, though).

If we're talking about finding definitions of functions using just standard CLI tools, I just do, e.g. ack "^PyImport_Import" Modules/ and that will find the function definition thanks to our C style (which can probably be turned into a grep command using Modules/*.c).

@emilyemorehouse
Copy link
Collaborator

@brettcannon I definitely agree, staying editor/IDE agnostic is definitely important for use in the guide, this was more of a general question of "how do people do this, and how can I improve my methods". I don't necessarily see this information as the highest value to include in these efforts (yet), but knowing that it's a question that people may have is at least something to keep in mind later.

On another note, I've pushed most of my notes to a github repo here as promised. It's very casual and some of it (probably) isn't relevant, but I've at least starting compiling a some lists of resources and content that we can pull from.

@willingc
Copy link
Member

willingc commented Aug 5, 2016

@emilyemorehouse Good stuff in your repo. I return from vacation on Tuesday and would be happy to merge anything that you would like into this repo.

@emilyemorehouse Good tip on sublime. Do you know if Atom supports CTags? I'm assuming it does too.

One of the nice things about Atom, Sublime, or even Visual Studio Code (thanks @brettcannon for the tip at PyCon sprints) are the ability to easily visualize a large codebase.

@emilyemorehouse
Copy link
Collaborator

@willingc I believe that CTag support is native in Atom, it will detect any tag files and use it automatically.

@brettcannon
Copy link

VS Code supports GTAGS (which can use CTAGS). I suspect any of the mature editors will support tags in some fashion and recommending one that uses them as a way to navigate source is fine.

@gvanrossum
Copy link
Author

These days, is there a difference between make tags vs. make TAGS?

@emilyemorehouse
Copy link
Collaborator

@gvanrossum from what I can gather, yes, there's still a difference. make tags generates a richer, more universal tags file using ctags (whichever variant the user has installed), whereas make TAGS generates an Emacs-specific tag file using etags.

@gvanrossum
Copy link
Author

gvanrossum commented Aug 8, 2016 via email

@nanjekyejoannah
Copy link
Collaborator

Am wondering about the status on this. I was looking for documentation on the Cpython source code and found myself on this thread.

Being a new contributor on Cpython, I am interested to help so I can also understand in the process of helping on this.

So, what is the status? cc @gvanrossum @emilyemorehouse @willingc @brettcannon

I see that the last discussion was 2 almost 3 years ago so i was wondering if this is still relevant.

@matrixise
Copy link

Hello @nanjekyejoannah,

I propose to you to read the devguide on https://devguide.python.org and try to find the "easy" issues on the bug tracker.

I could help you for these issues.

You can also use the Zulip channel and discuss with us on #core/help.

You have also the core-mentorship mailing

Have a nice day and see you later.

@nanjekyejoannah
Copy link
Collaborator

@matrixise thanks for this info.

I also wanted to know the status on this.

@emilyemorehouse
Copy link
Collaborator

Hi @nanjekyejoannah! Welcome, and thanks for your contributions to CPython 🎉

I definitely think this issue is still relevant! I haven't had the time to formalize much documentation on this; I have numerous notes that I need to translate from my shorthand into something others can understand 😓

@gvanrossum and I's approach was originally going to be to write out information in Dropbox Paper (markdown support with collaboration/comments) and then figure out what tool/framework would be best to publish it in. We could also use markdown files in a GitHub repository. Personally, I was leaning towards a static site and also have the url cpython.guide that I still have hopes to use one day. But again, that's irrelevant without something to publish.

I do have a couple of things that may be useful, though very rough:

I'd love to collaborate and synthesize the ideas in this thread, then divide and conquer. There are so many parts of CPython that a thorough guide will need the efforts of many people.

@willingc
Copy link
Member

@nanjekyejoannah Thank you for rebooting this thread.

Adding @Mariatta @csabella who may have additional thoughts/content and @vstinner who has some handy docs.

One of the things that I did last year was come up with some Minimalist docs for JupyterHub to simplify the content from the main JupyterHub docs. I started doing something similar for the devguide while on a flight, Minimalist CPython but got distracted with other projects.

@csabella
Copy link
Collaborator

@willingc Thank you for pointing out this project! I haven't even tried to look at the C code yet, so, no I don't have any content. You wrote:

I do want the contributors to the documentation developed here to primarily or at least 50% written by those that are not core developers.

Even though I'm a core dev (😱), I would classify myself as part of the other 50% for this project. But, to be honest, I wouldn't even know where to begin helping out at this point. This post regarding the scope here makes a lot of sense. It seems like there's so much that can possibly be done, it's hard to know where to start, although @emilyemorehouse and @nanjekyejoannah are headed in a good direction.

@nanjekyejoannah
Copy link
Collaborator

@emilyemorehouse Well, IMO this : https://paper.dropbox.com/doc/CPython-Guide-m7BQyPth6AIDUdZ6EmBNM draft has a draft scope or something pointing in the direction of what I think should be done. Point is we need to start somewhere.

I suggest we build on from there taking advantage of collaboration from the efforts of many people.

The next logical thing would be @emilyemorehouse moving that draft to like a Github repository/Dropbox Paper/static site etc. You can panel bit it to something we can start with. We can use any form that allows others to collaborate. I would lean more towards a github repository but may be the core devs know what has worked better in similar situations.

I just think we need to find a way of starting.

@vstinner
Copy link

I saw my name mentioned here. There are multiple existing documentations:

You are free to create a new site if you prefer. My collection of links, "Documentations of CPython internals":
https://pythondev.readthedocs.io/internals.html

@willingc
Copy link
Member

Great suggestions @nanjekyejoannah. Do you and @emilyemorehouse want me to open a new repo here on the pyladies org for work to begin? The drafts could begin and once it's further along perhaps move to the python org. Thoughts?

@willingc
Copy link
Member

Actually, I added @nanjekyejoannah @emilyemorehouse @csabella @Mariatta as maintainers on this repo. Feel free to use it for drafts or whatever else related to this.

@scotchka
Copy link

scotchka commented Apr 15, 2019

another resource:
https://leanpub.com/insidethepythonvirtualmachine/read

and
https://docs.google.com/document/d/1zrRTahXojd1gUGxK16Iwcqs0LUivqXK659hn4h9tOVw/edit#heading=h.14npr8gc6oo6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests