Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traversing unrelated histories? #212

Open
GunArm opened this issue Aug 5, 2021 · 1 comment
Open

Traversing unrelated histories? #212

GunArm opened this issue Aug 5, 2021 · 1 comment

Comments

@GunArm
Copy link

GunArm commented Aug 5, 2021

Looking for a little info on how it traverses the tree and/or how it would react to unrelated/unconnected histories within in a repo.

I'm trying to get cumulative statistics on a series of repos shared within a team. I made a dummy repo and added all the other repos as remotes, and fetched all their content so I have one repo with multiple unrelated histories. I tried to run repostat on it, but it seems to only be traversing the tree that a branch is checked out on. Is my inference about what is happening likely? Is there any way (flag etc) to force it to traverse disconnected histories/trees?

When I get some time I will try to make a junk merge between some random branches on each of the seperate trees, just to connect them and see if it works that way

@pulkomandy
Copy link
Contributor

Currently repostat is designed to work with a mostly linear history. It will not work even with a big merge commit, as in merge commits, only one of the parents is explored:

elif len(commit.parents) == 1:

I think the best way to do what you want is creating multiple instances of GitHistory for each repo to analyze (https://github.com/vifactor/repostat/blob/master/analysis/gitrepository.py#L16) and then aggregate the stats for all of them to produce a single report? But currently that's not possible, some code changes will be needed.

Or maybe it's possible to do it inside the GitRepository class by modifying the creation of self.whole_history_df, self.linear_history_df, self._head_revision (not sure what you'd put there if there are multiple heads), and self._tags (probably doesn't make a lot of sense in your case either).

I'm not sure how easy it is to generate whole history and linear history for multiple starting points.

One condition for this to work is to be sure that the different heads don't end up merging to some common history, otherwise, the commits before that splitting point would be counted twice. But in your case of independant repositories, this should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants