Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter notebook (.ipynb) is blowing the number of lines out of proportion #1072

Open
nfx opened this issue Mar 6, 2024 · 2 comments
Open

Comments

@nfx
Copy link

nfx commented Mar 6, 2024

Jupyter Notebooks are JSON-serialized lines of code, though they are producing incorrect code size estimates with this great tooling. See this example - tokei counts it as 496 lines of JSON code, but in fact it's 60 python code lines and 19 markdown lines.

import requests
nb = requests.get("https://raw.githubusercontent.com/databrickslabs/mosaic/2ec5d9da032db0d8209e910d4378c959c8fc7ddc/docs/source/usage/grid-indexes.ipynb").json()
markdown_lines = sum(sum(len(line.split("\n")) for line in cell['source']) for cell in nb['cells'] if cell['cell_type'] == 'markdown')
code_lines = sum(sum(len(line.split("\n")) for line in cell['source']) for cell in nb['cells'] if cell['cell_type'] == 'code')
print(markdown_lines + code_lines)
76 # 15% of 496

Why should we care? There are 5M+ Jupyter notebook files on github

@XAMPPRocky
Copy link
Owner

Thank you for your issue! Are you using an old version of tokei? Tokei has support for reading notebooks.

@nfx
Copy link
Author

nfx commented Mar 8, 2024

@XAMPPRocky that's from the github badge then - eg couple of projects misreport millions of lines, whereas those are mostly notebook output.

P.S. brazing fast tool for counting XX million lines of code 🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants