Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back-up repository data #76

Open
Zimmi48 opened this issue Jul 12, 2019 · 1 comment
Open

Back-up repository data #76

Zimmi48 opened this issue Jul 12, 2019 · 1 comment
Labels
automation meta To ask questions / discuss about the organization / process of coq-community.

Comments

@Zimmi48
Copy link
Member

Zimmi48 commented Jul 12, 2019

Meta-issue

This issue is extracted from an off-topic discussion in #2.

@palmskog on 2019-05-03

Let's say Coq-community grows to dozens of projects with as many or more maintainers. It may happen that someone, adversarially or not, does something unwanted to a repository, such as removing it, moving it, corrupting it, etc.

Is there any periodic snapshotting being done of our repositories to restore from? I know various efforts try to archive open source code, but is there an easily accessible one with frequent updates we can use to restore repos from? Arguably we should document this somewhere.

@Zimmi48

If it is code that you are talking about, I could easily set up mirrors of the coq-community repositories on gitlab.com. That wouldn't be sufficient to preserve meta-data such as issues though.

@palmskog

I'm primarily concerned with the code and commit metadata, but obviously issues and wikis matter as well, even though GitHub seems to keep a lot of history on those. It should be possible to script periodic dumping and copying of metadata using GitHub's API, right? Maybe something to work on at an upcoming workshop. Is this being done for Coq repos, by the way?

I'm all for mirroring at GitLab, but does that cover the "snapshotting" part of the problem? For example, if a repo gets corrupted in some way, the mirror could soon contain only the corrupt version, depending on how it's set up.

@Zimmi48 on 2019-05-04

GitLab's mirroring feature includes options to mirror even force-pushes and deletions, or to only mirror normal pushes and never delete anything. In this latter case, all the information is there to recover in case of accident. However, that could produce wrong alerts if people push topic branches and force-push to them.
GitHub's wikis are also git repositories so it is easy to setup a similar mirror.

GitHub does indeed keep a lot of history, in particular in its timeline, but it also allows repository administrators to delete previous edits, comments, issues, and repositories themselves. That's why I was asking whether we should restrict coq-community members' default privileges from admin to write.

Copying issue data using GitHub's API is possible and there are actually already a few services that do it for a fee (e.g. https://github.com/marketplace/backhub). I could also extend @coqbot to do it, but we would need to discuss the design (what to save, how to react to edits, deletions...).

@palmskog on 2019-05-12

For reference, one kind of situation I had in mind for backing up repos is this.

I'm fine with GitLab mirroring, even if it doesn't capture topic branches. But I think it should be complemented by repo tarballs, e.g., once for every 30 days back.

@Zimmi48 on 2019-05-13

Why tarballs?

A good point that I read through your comment is that the more people have write-access to coq-community repositories, the more chances we take that they will be compromised if one user leaks their credentials one way or another.

@palmskog

At least with tarballs one would know for sure: this is what the repository looked like at some specific time. With mirrors, I think one would need deep knowledge of git semantics and implementation to say something similar. For example, can't some just rewrite the reflog?

@Zimmi48

I don't see what risk there would be if the mirror refuses to update if it's not a fast-forward. Then, you can only add stuff on top, not delete it.

@palmskog

I see the point, but one of my points with tarballs is that it removes git from the trusted base (and I don't particularly trust git and definitely not its implementation). In any case, I don't have anything against mirrors.

@Zimmi48

OK now I see your point.

@Zimmi48 on 2019-07-12

FTR I have created the GitLab coq-community organization and the mirrors for all the current repositories, as a temporary solution while waiting for a better one.

@palmskog palmskog added automation meta To ask questions / discuss about the organization / process of coq-community. labels Jul 12, 2019
@palmskog
Copy link
Member

@Zimmi48 I see that our GitLab organization currently does not mirror many repos (partly due to the considerable growth). Maybe we want to document the process for getting a repo mirrored and add it to some checklist in repo transfer issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automation meta To ask questions / discuss about the organization / process of coq-community.
Projects
None yet
Development

No branches or pull requests

2 participants