Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove author attribution in file headers #377

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

rgov
Copy link
Member

@rgov rgov commented Aug 2, 2020

This change removes author attribution from the tops of files. The intention is not to deny anyone recognition for the work, but I will let this sit for a little bit to see if anyone feels differently.

  • The AUTHORS file remains and contributors are invited to add themselves to it. (It does need an update.)

  • The git log already details exactly who contributed what to which file.

  • We no longer have to keep the file headers up-to-date, or feel compelled to add attribution every time a change is made.

  • The headers are not particularly accurate. Many were bulk-added to include me (or my alter ego, Ryan Gvostes), even though I'm sure I did not modify most of those files; other contributors whose names deserve to be up there are not.

Thoughts?

@rgov
Copy link
Member Author

rgov commented Aug 2, 2020

I am also happy to remove the "begin date" in the headers which doesn't serve much use. The earliest commit imported into GitHub is from 2008 and some of the headers do go back further to 2005. But it is mostly a point of historical curiosity than useful information in the source file. The git log provides much richer information here as well.

@msoos
Copy link
Member

msoos commented Aug 2, 2020

These were added because someone who was using STP was very adamant about them. I don't remember exactly who it was, but it seemed really important for them. Ah, I found it, here:

#199

For Debian apparently. Debian takes copyright super-seriously so it makes sense. Please check that before changing authors. Looks pretty serious.

@rgov
Copy link
Member Author

rgov commented Aug 2, 2020

Eh ok. I have never heard of another project that requires a list of authors in every header, ever. I am pretty sure 90% of the projects in Debian do not do this. LLVM does not, for example. I would be fine with a general "(c) STP contributors, see AUTHORS file" if we really need it.

I can understand the issue described in #199 where the AUTHORS file / LICENSE file did not match up with some headers we imported from other projects, as we've recently discussed. I also can understand that that could spook Debian, if it seems like the project is claiming credit for other peoples' work.

A copy of the license or a reference to the license file is common. This way contributors understand their modifications are licensed under the project license. Some projects require signing a Contributor License Agreement before a pull request will be accepted, which basically just makes me confirm I understand that the project uses an open source license. There are GitHub bots for that, if we want.

@msoos
Copy link
Member

msoos commented Aug 2, 2020 via email

@aytey
Copy link
Member

aytey commented Aug 2, 2020

So Michael Tautschnig is a prominent Debian contributor https://qa.debian.org/developer.php?login=mt@debian.org and packages CBMC for Debian -- however, even he has commits against CBMC that do not adhere to that rule: https://github.com/diffblue/cbmc/blob/develop/src/util/allocate_objects.cpp (file chosen totally at random).

@hellovijay
Copy link
Member

hellovijay commented Aug 2, 2020 via email

@rgov
Copy link
Member Author

rgov commented Aug 2, 2020

@hellovijay Understood. This pull request won't go forward if you are not satisfied.

Note that you are listed as the first author in the AUTHORS file and you should and will remain there.

The reasons for suggesting the removal are highlighted in my original post, such as having poor consistency throughout the project. A changelog / author block at the top of source files is uncommon nowadays with version control that tracks this information much better.

The issue came to mind as @andrewvaughanj is contributing to many files and the author list in each file will grow longer. Also my name is spelled wrong in almost every file :)

Perhaps this would be better:

(c) 2005-2012 Vijay Ganesh [or however you please]
(c) 2005- STP contributors (see AUTHORS file)

@rgov
Copy link
Member Author

rgov commented Aug 2, 2020

What is wrong with allowing everyone who has contributed to a file in STP's source code to add their names to the file header they contributed to? It is up to each author to make sure that their contributions get recognized in this way. This seems fair to me.

There's certainly nothing wrong with this, but note that this information is already in the git history and surfaced in the GitHub website (see the avatars across the top on this page for example).

Appending a name to the list though doesn't convey how much that person contributed. For example, I wouldn't want to put my name on a file that you wrote the vast majority of simply because I fixed a small bug.

@rgov
Copy link
Member Author

rgov commented Aug 2, 2020

Sorry to triple-post, but in the example file I linked to in the previous message, note there are 3 people credited for the file (including me, who did not contribute to that file), while there are 5 other contributors who are not credited. This is why it is a bit of a mess.

@hellovijay
Copy link
Member

hellovijay commented Aug 2, 2020 via email

@aytey
Copy link
Member

aytey commented Aug 3, 2020

On the overall copyright

I have a few contributions here and there to STP, and nowhere near the number of that of Trevor/Dan/Mate, however I feel that this:

(c) 2015-2020    Vijay Ganesh, David L. Dill, Mate Soos

Is a bit disingenuous to the other major contributors to STP during that five year window (especially people like Trevor and Dan). I've omitted Ryan here, because if I'm not careful, we'll end-up with an ever growing list, and I'm bound to forget someone ... so I'll start as I mean to go! Sorry, Ryan 😬

Anyway, in 2015, and on #199 you wrote:

I am quite happy to share the copyright. STP wouldn't be where it is without
the contributions of Trevor, Mate, Dew, Ryan, Dill, Khoo, and others.

If I were to highlight the most important contributors, I would say they
would be myself, Trevor Hansen, Mate Soos, and David L. Dill.

If we're agreeable, could we just retroactively assign the copyright to the maintainers starting in 2015 (which you and David would be one of)? That way, no-one's nose gets out of shape.

We can just ensure that the current AUTHORS file lists everyone who has contributed at all, but make it clear that between 2005 and 2015, the copyright was the sole ownership of Vijay Ganesh and David L. Dill.

Was there actually an official assignment of copyright of STP to Mate during that window? If we're retroactively assigning copyright to Mate, I would vote for retroactively assigning copyright to the "STP maintainers" (whoever they may be).

On 'Author' vs. 'Original Author'

I actually think the idea of "original author" is worse -- what happens if someone rewrites 99% of the file (but, say, leaves the header block + doesn't actually change the API); does that really mean that the original author applies here, and that the new contributor feels they can't add their name because they are not the "original author"?

When to update the copyright block

We also need to juggle Ryan's point about when you add copyright to the top of a file (I know that Mate and I had this with CMS, and Mate rightly removed me when I naively added myself thinking I was "doing the right thing").

Even if someone makes a 10 line change in a 1000 line file, surely they do actually retain "artistic copyright" over those 10 lines? The question we need to decide if that is enough to put the name at the top of the file.

Personal view

My personal view is that we should strip all of these copyrights (mainly because they're wrong in a lot of places, which is arguably even worse), ensure we maintain an up-to-date AUTHORS file and then require DCO + squashed and signed-off commits (the same approach as Boolector and CVC4) for contributions going forwards.

This means that it is clear that anyone contributing to STP has assigned their copyright over to the STP project and that they have the authority to do so.

In case it helps, here is CVC4's guide to contributing: https://github.com/CVC4/CVC4/blob/master/CONTRIBUTING.md

@rgov
Copy link
Member Author

rgov commented Aug 3, 2020

After sleeping on this, I would be fine with closing the pull request to avoid bruising anyone's feelings for attribution being removed, though I also recognize this shortchanges contributors who were not credited this way. I was raising a minor but thorny point that has the potential to grow into an outsized distraction.

I agree with @andrewvaughanj that the three-line copyright isn't ideal. I don't want to start making judgments about the relative value of contributions. Also any sort of "retroactive" modification of copyright strikes me as ripe for creating conflict (contributors already expressed their feelings about this before), while simultaneously probably not being valid (can we query STP?).

The exact dates don't really matter in my opinion, but the ones that make the most sense to me are either (a) when Vijay stopped working substantially on the project, and/or (b) when the project became open source.

My view is that "first author" in a file header is also not ideal, for a number of philosophical and practical reasons. A large file can represent small contributions from many people. Files get rewritten, split up, and combined. Ponder the Ship of Theseus. We have git history, merged pull requests, etc. that are more accurate and don't require human judgment.

% git shortlog -n -s -- appveyor.yml
    49  Mate Soos
     9  Felix Kutzner

@rgov rgov closed this Aug 4, 2020
@hellovijay
Copy link
Member

hellovijay commented Aug 4, 2020 via email

@TrevorHansen
Copy link
Member

TrevorHansen commented Aug 4, 2020

I feel like we should be able to agree on some of these changes. As a start, can anyone see issues if we:

  • As @rgov mentioned, remove the "begin date" from files.
  • Have some bot (or something) to confirm with contributors (before applying their patches) that they know STP is MIT licensed.
  • Fix the spelling of contributors' names.

@TrevorHansen TrevorHansen reopened this Aug 4, 2020
@aytey
Copy link
Member

aytey commented Aug 4, 2020

Authors

I (genuinely) think the best way to resolve this issue is to drop the authors bit from the top of each file and do this:

  1. If the file existed before 2015, then we add (c) 2005-2015 Vijay Ganesh, David L. Dill

  2. If the file has been changed after 2015, we add (c) 2015-present STP contributors (see AUTHORS file)

I'm actually fine to have a more sweeping change that just does 1. in every file (irrespective of its genus, apart from the ext files) and 2. as above.

If we do this, then we just need to make sure that Ryan's name is spelled correctly in the AUTHORS file.

After doing this everyone appears in the git log and in the AUTHORS file, and we have no issues going forward.

Begin date

Remove it ✅

Copyright agreement

I'm happy to take this as a separate action not as part of this PR -- I will set up DCO + write a contributors guide (and see if we can have a GitHub Action that confirms commits are signed-off).

@hellovijay
Copy link
Member

hellovijay commented Aug 4, 2020 via email

@TrevorHansen
Copy link
Member

I (genuinely) think the best way to resolve this issue is to drop the authors bit from the top of each file and do this:
If the file existed before 2015, then we add (c) 2005-2015 Vijay Ganesh, David L. Dill
If the file has been changed after 2015, we add (c) 2015-present STP contributors (see AUTHORS file)

I don't think it's reasonable to transfer copyright over from other contributors to Vijay and David without the other contributor's permission. In the past some contributors have insisted on maintaining their copyright.

Transferring copyright to David L. Dill seems a bit weird to me, too. Was his last contribution really in 2015?

To sort this out, I suspect we're going to need something more nuanced.

@aytey
Copy link
Member

aytey commented Aug 4, 2020

I don't think it's reasonable to transfer copyright over from other contributors to Vijay and David without the other contributor's permission. In the past some contributors have insisted on maintaining their copyright.

Okay, maybe it should just be all files that existed before 2015 get Vijay+David, and any file that was changed by anyone else (ever!) gets STP contributors (and the year it was first changed by someone who wasn't Vijay+David)?

@msoos
Copy link
Member

msoos commented Aug 5, 2020

I agree with Trevor here. For example, my first commit is from 2009 and so is Trevor's. Here is some food for thought:

$ git log | grep "Author:" | sed "s/Author: //" | sed "s/<.*//" | sort | uniq -c | sort -n
      1 Artur Dmowski 
      1 Cristian Cadar 
      1 Gleb Popov 
      1 John Regehr 
      1 Jonas Zaddach 
      1 Li-yao Xia 
      1 Lukas Prokop 
      1 Marek Chalupa 
      1 Ondřej Súkup 
      1 Philip Guo 
      1 Raphael Michel 
      1 Reini Urban 
      1 Samuel Neves 
      1 sneves 
      1 Will Glynn 
      2 Jonas Wagner 
      2 Max Zinkus 
      2 petercol 
      2 Xi Wang 
      3 Andrew Chi 
      3 Austin Clements 
      3 Carsten Sinz 
      3 Jerry James 
      3 junhee cho 
      3 Martin Nowack 
      3 TrevorHansen 
      3 vganesh2013 
      4 Alexander Jesner 
      5 Stephan Falke 
      6 Robbepop 
      7 Brian Foley 
      7 Jiri Slaby 
      8 Edward J. Schwartz 
      8 Matt Arsenault 
     10 Michael Katelman 
     15 Trevor hansen 
     17 Florian Merz 
     20 Stephen McCamant 
     22 Norbert Manthey 
     29 Jurriaan Bremer 
     29 Khoo Yit Phang 
     38 Felix Kutzner 
     54 Andrew V. Jones 
     66 Ryan Govostes 
    176 Dan Liew 
    272 Vijay Ganesh 
   1045 Mate Soos 
   1088 Trevor Hansen 

Note that number of commits is not equivalent to the amount of work. In particular, often the starting of a project is already a substantial amount of work without any commits. But it does show that a lot of people contributed to the project.

Between project inception to 2014 (inclusive) we have:

      1 Artur Dmowski 
      1 John Regehr 
      1 Jonas Zaddach 
      1 Philip Guo 
      1 Raphael Michel 
      1 Reini Urban 
      1 Samuel Neves 
      1 sneves 
      1 Will Glynn 
      2 Jerry James 
      2 Jonas Wagner 
      2 petercol 
      2 Xi Wang 
      3 junhee cho 
      3 vganesh2013 
      4 Alexander Jesner 
      5 Stephan Falke 
      8 Edward J. Schwartz 
      8 Matt Arsenault 
     10 Michael Katelman 
     17 Florian Merz 
     20 Stephen McCamant 
     29 Jurriaan Bremer 
     29 Khoo Yit Phang 
     31 Ryan Govostes 
    154 Dan Liew 
    271 Vijay Ganesh 
    272 Mate Soos 
    912 Trevor Hansen 

Sorry for going for the data-driven approach :) Again, there is lots of work that does not show in the logs. Another example is the work that goes into responding to customer queries and aligning within the team. None of these are trivial work and they don't (or barely) show up in the the above data.

Just my 2 cents :)

@rgov
Copy link
Member Author

rgov commented Aug 5, 2020

Perhaps we are overthinking this. We can have overlapping copyright assignments, see for instance OpenCV: https://github.com/opencv/opencv/blob/master/LICENSE

Copyright (C) 2000-2020, Intel Corporation, all rights reserved.
Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
Copyright (C) 2009-2016, NVIDIA Corporation, all rights reserved.
Copyright (C) 2010-2013, Advanced Micro Devices, Inc., all rights reserved.
Copyright (C) 2015-2016, OpenCV Foundation, all rights reserved.
Copyright (C) 2015-2016, Itseez Inc., all rights reserved.
Copyright (C) 2019-2020, Xperience AI, all rights reserved.

@hellovijay
Copy link
Member

hellovijay commented Aug 6, 2020 via email

@aytey
Copy link
Member

aytey commented Aug 6, 2020

I've been trying to make a more polished attempt to process the commit history; I wrote a script here:

https://github.com/andrewvaughanj/parse_stp_history/blob/master/parse_stp_history.py

This script walks all commits from the start of the repo and tracks adds/deletes/renames. Importantly, the files that are part of the initial commit are marked as owned by Vija and David, and not by Michael Katelman.

Here are the current statistics:

  • We have 541 files in total

  • There are 28 files that have only ever been touched by Vijay (and therefore are also owned by David); these are mostly .cvc files

  • There are 119 files that have been directly modified by Vijay and other people

  • There are 312 files that existed before Fri Sep 18 00:14:22 2015 -0400 (the date of Vijay's last commit), but for which Vijay did not directly commit to (it is hard to say which of these don't have "inherited" copyright)

  • There are 82 files that were created after Fri Sep 18 00:14:22 2015 -0400

You can find the complete output here:

https://github.com/andrewvaughanj/parse_stp_history/blob/master/stp_history.txt

My suggestion here is:

  • The 28 exclusive files get (c) 2005-2015 Vijay Ganesh and David L. Dill

  • The 119+312 "mixed" files get (c) 2005-2015 Vijay Ganesh and David L. Dill and (c) 2008-2020 STP contributors

  • The 82 exclusively STP files get (c) 2008-2020 STP contributors

Does this seem reasonable to everyone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants