Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide file statistics per programming language #171

Open
pulkomandy opened this issue May 15, 2020 · 3 comments · Fixed by #172
Open

Provide file statistics per programming language #171

pulkomandy opened this issue May 15, 2020 · 3 comments · Fixed by #172
Labels
help wanted Extra attention is needed
Milestone

Comments

@pulkomandy
Copy link
Contributor

  • I have a repo (github.com/haikuports/haikuports) where most files have the extension ".recipe", in the current version of repostat they show as "no extension"
  • Also things lke Makefile have usually no extension but could be considered

I think the max length for extensions was to avoid silly results if you have files without extensions and with dots in the filename? I would do this by collecting all extensions (anything after a '.', and maybe consider the full name for files with no '.' at all, but that should probably be optional). And then if one of these extensions is used only once or only a few times, put it in an "other" group instead.

The idea about Makefiles also raises the question, are extension what really matter here? I think the interesting data would be stats per programming language or something like that. Which means grouping .c and .h for C, but maybe differenciating C from C++ in .h files. That can get heavier to compute, however, if we need to look in the file contents.

There is ohcount which can even handle mixed languages in a single file (for example php/html/javascript/css): https://github.com/blackducksoftware/ohcount

@vifactor
Copy link
Owner

For the first part of the issue I'll take a look what simplest I could do to have it soonish. Perhaps, fix of functionality with max_ext_length.

interesting data would be stats per programming language or something like that

This indeed is interesting and I know that pydriller uses lizard for something like that which is in contrast to ohcount is python package. Would also be useful for counting LOCs instead of total lines. But so far do not see when it can be implemented.

vifactor added a commit that referenced this issue May 16, 2020
- For files with no extension basename is returned as extension
- In file types table extensions sorted by files count in group
@vifactor vifactor added this to the v3 milestone May 16, 2020
@vifactor vifactor added the help wanted Extra attention is needed label May 16, 2020
@vifactor
Copy link
Owner

@pulkomandy , there is a PR which fixes first part, I suppose. Please, take a look if you have time.

@vifactor vifactor linked a pull request May 16, 2020 that will close this issue
@pulkomandy
Copy link
Contributor Author

Yes, that solves my main problem I think, thanks :)

vifactor added a commit that referenced this issue May 16, 2020
- For files with no extension basename is returned as extension
- In file types table extensions sorted by files count in group
@vifactor vifactor reopened this May 16, 2020
@vifactor vifactor changed the title "file extensions" list is limited to 5 char per extension Provide file statistics per programming language May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants