Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to get the last commit that affected a given file #89

Closed
fbezdeka opened this issue Nov 18, 2011 · 38 comments
Closed

Question: How to get the last commit that affected a given file #89

fbezdeka opened this issue Nov 18, 2011 · 38 comments
Labels
Milestone

Comments

@fbezdeka
Copy link
Contributor

Hi @ALL.

I'm searching for a method that will return the last commit that affected a given file.

I looked through the API but I did not find a method that seems to do it.

First question: Was the solution overlooked?
Second question: How to get the last commit?

Thanks.

@nulltoken
Copy link
Member

@Flonix,

Hey. It looks like you after something like git log -1 --follow FILENAME

First question: Was the solution overlooked?

No. This isn't implemented yet.

Second question: How to get the last commit?

Easy peasy. It's a two steps process:

Step One:

  • Create failing tests you'd like to see pass

Step Two:

  • From the filename, retrieve the Sha1 of the matching Blob in the object database
  • ... some code...
  • Recursively walk the parent commits of the Head (beware of the merges)
    • Detect renames (same Sha, different names, (beware of file duplication use case))
    • ... more code ...
    • Identify changes (same name, different Shas)
    • ... find some new corner case tests to handle. Add them...
    • ... still coding ...
  • Find a nice API name to wrap this in :)
  • Push to a new branch and open a Pull Request
  • Unlock the achievement "Awesome contribution!" 🆒

@fbezdeka
Copy link
Contributor Author

@nulltoken,

That is exactly what I am searching for.
I hoped that it is already implemented...

I will look at it in a few days. Maybe I am able to solve that.

Thanks so far...

@shytikov
Copy link

Hey, @Flonix please take a look on our discussion with @nulltoken on StackOverflow. I'm plan to implement approach I mentioned there and share Gist with you.

@fbezdeka
Copy link
Contributor Author

Hey @shytikov,

so we have the same intention ;-)
I'm currently reading (and finishing) my GIT book. I had some gaps regarding the GIT internal Tree and Blob entries.

Please let me know when you start implementing.
I hope I have time to implement a (failing) test within the next few days.

Some very first thoughts about the API:

  • Parameter Commit startCommit, defines where to start the history walking
  • Parameter string filePath, defines for which file we will search for
  • Parameter int maxCommits, defines when to stop history walking
  • Result: A list of (affected) Commits, maximum legth of the resulting list is maxCommits

Still missing: A good name for the method name...

@shytikov
Copy link

@Flonix, @nulltoken

I've spent evening analyzing the code (it was marvelous time, it really was!) and come up with following questions to you:

  • To do everything right, we need to implement this on C level an commit to libgit2 repository, but this will take time (and walking on thin ice of C). Is this ideologically right to make updates to libgit2sharp library only?
  • Even if we decide to update libgit2sharp only there is a question, should we implement filtering on file name as part of commit filter class (used in QueryBy method of Commits collection) which is more close to libgit2 ideology, or put filtering by file name as property History or Commits (for example) of IndexEntry object, which will be more like object-oriented approach?

I'm confused a lot, since updating QueryBy method leads either to ugly code (implementation IEnumerable just passes calls to C backend and it will be hard to change its behavior to meet our needs), or IndexEntry updating will make libgit2sharp not so thin proxy to C backend...

Can you share your thought?

@shytikov
Copy link

Ahhh!!!

Somebody said, no matter how cool your code is, you will shame you wrote it six month later. It applies to messages in Issue Tracker too. Except the fact you feel shame much faster :)

We got RepositoryExtensions class!!! Perfect place for such code!

@fbezdeka
Copy link
Contributor Author

@shytikov, @nulltoken

I think we should keep libgit2sharp as thin as possible.
But this does not exclude that we have to update/modify libgit2sharp as well...

I'm not sure if libgit2 has already some kind of history walking support. I'm not yet very familar with the code of libgit2.
So I have to take a look at it...

@shytikov
Copy link

@Flonix, it has revison walking api allowing you to limit your commits collection by date, branch, tag and maybe something else. But not file name.

As I get from @nulltoken answer, it's OK, to 'pre-limit' your commit selection by revison walking and then search in this collection with Linq for example.

At the moment I don't think I can come up with C code limiting commits selection by file name (need spent some time on libgit2 hacking), but it's easy to add new method ViewHistory in RepositoryExtensions class.

Is this approach libgit2sharp way?

@shytikov
Copy link

@Flonix, please take a look on the first draft in this gist

The work is still in progress, since it does not matches file if it was renamed and modified in the same commit. And ti does not count number of modifications.

@fbezdeka
Copy link
Contributor Author

@shytikov,

some very simple thoughts after a very short look at your code:

  • Looks good
  • To speed up and not walking all the (maybe very long) history you may limit the maximum commits in the result

(If there are more then n commits you may start a further search at the last commit in the result, so you do not have to walk through the already found commits...)

Question 1: You are walking thorugh the all commits, yes, but are they ordered?

Question 2: Not sure what about merges...
This depends on question 1...
If they are ordered... Which commit-parent will be taken to continue the search?
If they are not ordered... We may have to order them ;-)

@shytikov
Copy link

@Flonix, thanks!

A lot of open questions remains and they are mostly facing C backend...

To speed up the code two things need to be done:

  • Add maxCommits variable;
  • Stop iterating if either matching SHA or Path not found (the file is not present in the repository — will be added in next commit);

Regarding ordering commits... This is easily can be done by using following syntax:

 foreach (Commit c in repository.Commits.QueryBy(filter))
 { ... }

But this means we need to pass this filter to the method. And as I see an libgit2sharp design it would be better to implement this as a part of the Filter. Add two more fields to this class: IndexEntry (object holding both file's SHA and Path) and MaxCommits to limit their number. And make results dependent of their values.

But to do it right we need to modify C code, because IQueryableCommitCollection is linked to native calls pretty tight.

Regarding merges I don't know... this is the thing definitely worth checking :)

@shytikov
Copy link

@Flonix, @nulltoken,

could you please advice me, how to skip merges while retrieving commits history for file. It seems, the SHA of file changes despite fact it was not changed actually. Is there any common approach on this? I don't want to run binary comparison for file blobs...

@shytikov
Copy link

@Flonix, one 'nice' surprise this morning: the code I wrote invalid for files stored in sub-folders :) The Tree object holds files from root-folder only. For browsing other folders we need to analyze collection of trees attached to current tree! 🆒

@fbezdeka
Copy link
Contributor Author

@shytikov

Yes, that is true. There is one tree for each (sub)directory.
I looked into the libgit2 C code yesterday evening. There is a kind of revision walking implemented.
I will look to it closer on the weekend.

@shytikov
Copy link

@Flonix, I'm also hacking around. For now what I found that it's possible to hide a pack of commits from being returned to user. But before that the system should determine list of commits where given file was not changed, and pass this information as argument.

Looking forward to hearing from you on Monday.

@shytikov
Copy link

@Flonix, I've raised this question on libgit2 issue tracker.

@nulltoken
Copy link
Member

@Flonix @shytikov Wow this thread is busy :)

In order to make sure we share a common understanding, I've setup a quick test repository @ https://github.com/nulltoken/follow-test. A wiki page describes some potential use cases. Feel free to add yours.

  • The first one will require some content analysis feature which is currently lacking.
  • The second and third one should not require anything particular beside what is already exposed by libgit2/libgit2sharp.

Once you get the C# filtering code returns the correct commits, it would be nice to repackage it into the CommitCollection.QueryBy() method.

Eventually, from an API perspective, the filename should be an additional optional property of the Filter type. The test repo will be moved into the LibGit2Sharp repo under the Resources directory.

Ping me if you need any help.

@fbezdeka
Copy link
Contributor Author

@shytikov

What is your current status?
Have you implemented something?

@shytikov
Copy link

@Flonix yes, I'm hacking around git internals. I'm trying to understand how I might retrieve needed information from Git repository. Actually it turned to a big fun for me: https://github.com/toolchain/IronGit (don't be afraid: it's pre-alfa quality code yet).

As soon as understand what's the optimal way to create log following the file name I will integrate it to libgit2sharp.

@nulltoken
Copy link
Member

@yorah I'm not sure, but I think the Diff API might ease fulfilling this task... We might need the Renamed/Copied status, though. What's your opinion?

@balneaves
Copy link

New to this whole git thing, but started working on this issues as a way to learn libgit2sharp. Not finished/commented/etc but meets 2 of the 3 uses cases on @nulltoken follow-test use cases

https://github.com/salerth/libgit2sharp/tree/follow

(I should add I have a console app there as a test harness at the moment... that'll go when I do some tests!)

@Folcon
Copy link

Folcon commented Dec 10, 2012

Hi Everyone,

Has there been any development on this feature? Where are we at with this. If I have time this/next week, I won't mind putting something together.

Regards,
Folcon

@nulltoken
Copy link
Member

Hey @Folcon,

Has there been any development on this feature?

Although it surely will at some point, libgit2 doesn't implement this feature.

Where are we at with this.

Nothing has been merged regarding this topic

If I have time this/next week, I won't mind putting something together.

Amazing!

@balneaves
Copy link

I got sidetracked by work projects so never finished this off. Ended up with it mostly working, including diff percentage on files to detect renaming.

@Folcon
Copy link

Folcon commented Dec 11, 2012

@nulltoken Thanks for the updates.

@salerth I'm assuming it's here? https://github.com/salerth/libgit2sharp/tree/follow.

@balneaves
Copy link

@Folcon Yes indeed, mostly in LibGit2Sharp/RepositoryExtensions.cs

@Folcon
Copy link

Folcon commented Jan 7, 2013

Hi Everyone,

Sorry I appeared to disappear there, I've finally gotten some time this week and I'll be taking a look over things as I said earlier.

Kind Regards,
Folcon

@Folcon
Copy link

Folcon commented Feb 10, 2013

Hi Everyone,

Ok so I spent some time looking over everything that's here.

@salerth Which use case are you missing? Your output for follow-test matches @nulltoken https://github.com/nulltoken/follow-test/wiki expected results.

It appears that you've done it, at least at first glance. I'm going to do some more tests to check that it works as I expect, but where did you observe it failing? Otherwise the most I can offer is to clean it up a bit ;)...

Kind Regards,
Folcon

@nulltoken
Copy link
Member

Hey guys,

If you feel this is the right moment, how about rebasing it onto the latest vNext, migrating the test cases to a xUnit fixture (FollowFixture.cs maybe?) and opening a PR?

@Folcon
Copy link

Folcon commented Feb 10, 2013

I have no issues doing so, I would like @salerth to chime in just in case there is a use case we've missed?

@balneaves
Copy link

No, as far as I know it worked on all the use cases I could throw at it. All I had left on my todo list for this was the unit tests, then clean-up, ensuring it was in the right place, etc. I also only tried with fairly small repos so not sure on the performance on a much larger history.

Mostly I wanted someone to just review it as it was my first time with the internals of git so wasn't sure it was the best way to skin this particular cat.

@Folcon
Copy link

Folcon commented Feb 11, 2013

Actually I'm wondering, is there a hook to update the repo object before doing a history lookup? Unless that happens already?

I'm getting a "throw new LibGit2SharpException(String.Format("Can not find file named '{0}' in the current index.", filePath));" if the repo has recently been updated on the filesystem when the history command gets called and I'm not certain that it's not being caused by the object being out of date if that's possible? That's the only explanation that comes to mind. I'll start packaging the code :)...

Basically the index is out of date, it does not contain entries for newly added objects, even though they appear in the commits.

@nmartin867
Copy link

Did anyone implement the original feature that @Flonix was asking for? If not, I'd like to help. I need this as well.

@nulltoken
Copy link
Member

Did anyone implement the original feature that @Flonix was asking for? If not, I'd like to help. I need this as well.

@nmartin867 Not that I know of.

@nulltoken
Copy link
Member

@Folcon It looks like we may benefit from a native support from @arrbee's work in libgit2. See this comment for more information.

@Aimeast
Copy link
Contributor

Aimeast commented Mar 23, 2014

The following code can get the last commit that modified a GitObject as well as Blob/Tree/GitLink

using (var repo = new Repository(@"path\to\libgit2"))
{
    var path = @"src/blob.c";
    var commit = repo.Head.Tip;
    var gitObj = commit[path].Target;

    var set = new HashSet<string>();
    var queue = new Queue<Commit>();
    queue.Enqueue(commit);
    set.Add(commit.Sha);

    while (queue.Count > 0)
    {
        commit = queue.Dequeue();
        var go = false;
        foreach (var parent in commit.Parents)
        {
            var tree = parent[path];
            if (tree == null)
                continue;
            var eq = tree.Target.Sha == gitObj.Sha;
            if (eq && set.Add(parent.Sha))
                queue.Enqueue(parent);
            go = go || eq;
        }
        if (!go)
            break;
    }

    // output is: 49781a0  Blame: minor cleanup
    Console.WriteLine("{0}  {1}", commit.Sha.Substring(0, 7), commit.MessageShort);
}

As discussed at libgit2/libgit2#495, also we can implement the last effected commit for tree view which like the tree view of github.

I'd proposal implement the two functions in libgit2 level rather than libgit2sharp, because libgit2 has a object pool which more effected than libgit2sharp.

@nulltoken
Copy link
Member

I'd proposal implement the two functions in libgit2 level rather than libgit2sharp, because libgit2 has a object pool which more effected than libgit2sharp.

👍

ThomasBarnekow added a commit to ThomasBarnekow/libgit2sharp that referenced this issue Feb 24, 2015
This commit basically implements the git log --follow command. To do that, it
implements the FileHistory, FileHistoryEntry, and FileHistoryExtensions classes.
The associated FileHistoryFixture implements a number of tests, covering the
essential cases (e.g., following renames, dealing with branches).

Related to topics libgit2#893 and libgit2#89
ThomasBarnekow added a commit to ThomasBarnekow/libgit2sharp that referenced this issue Feb 24, 2015
This commit basically implements the git log --follow command. To do that, it
implements the FileHistory, FileHistoryEntry, and FileHistoryExtensions classes.
The associated FileHistoryFixture implements a number of tests, covering the
essential cases (e.g., following renames, dealing with branches).

Related to topics libgit2#893 and libgit2#89
ThomasBarnekow added a commit to ThomasBarnekow/libgit2sharp that referenced this issue Mar 19, 2015
This commit basically implements the git log --follow command. To do that, it
implements the FileHistory, FileHistoryEntry, and FileHistoryExtensions classes.
The associated FileHistoryFixture implements a number of tests, covering the
essential cases (e.g., following renames, dealing with branches).

Related to topics libgit2#893 and libgit2#89
ThomasBarnekow added a commit to ThomasBarnekow/libgit2sharp that referenced this issue Apr 2, 2015
This commit basically implements the git log --follow command. To do that, it
implements the FileHistory, FileHistoryEntry, and FileHistoryExtensions classes.
The associated FileHistoryFixture implements a number of tests, covering the
essential cases (e.g., following renames, dealing with branches).

Related to topics libgit2#893 and libgit2#89
ThomasBarnekow added a commit to ThomasBarnekow/libgit2sharp that referenced this issue Apr 11, 2015
This commit basically implements the git log --follow command. To do that, it
implements the FileHistory, FileHistoryEntry, and FileHistoryExtensions classes.
The associated FileHistoryFixture implements a number of tests, covering the
essential cases (e.g., following renames, dealing with branches).

Related to topics libgit2#893 and libgit2#89
ThomasBarnekow added a commit to ThomasBarnekow/libgit2sharp that referenced this issue Apr 13, 2015
This commit basically implements the git log --follow <path> command. It adds
the following two methods to the IQueryableCommitLog interface:

IEnumerable<LogEntry> QueryBy(string path);
IEnumerable<LogEntry> QueryBy(string path, FollowFilter filter);

The corresponding implementations are added to the CommitLog class. The actual
functionality is implemented by the FileHistory class that is part of the
LibGit2Sharp.Core namespace.

Related to topics libgit2#893 and libgit2#89
@nulltoken
Copy link
Member

#963 should fix this

@nulltoken nulltoken added this to the v0.22 milestone Apr 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants