Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitlab-mirrors alternative/replacement #105

Open
Salamek opened this issue Jun 7, 2018 · 18 comments
Open

gitlab-mirrors alternative/replacement #105

Salamek opened this issue Jun 7, 2018 · 18 comments

Comments

@Salamek
Copy link

Salamek commented Jun 7, 2018

Hi, i hope this will not make someone angry posting this here.
For a while i was using this project, but in the end it was inappropriate for the job:

  • Cron mirroring is just not a best way, i need mirroring start ASAP push is done
  • Codebase is just kinda weird mix of bash+python
  • No simple release -> update system

So i created my ~much better project, doing same stuff properly and more:

  • Supports pull mirroring (Git, SVN, Mercurial,...)
  • Supports push mirroring (yes GitLab 10.8 added this to GitLab CE but it was already done, so why remove it...)
  • No cron, all tasks run on background (celery) triggered by web hook
  • Build on flask+python3
  • Supports mutiple users
  • Uses GitLab OAuth2 for sign in
  • It connect to GitLab API to fetch groups and projects
  • Supports nested groups (Solves nested group support #99 and Allow mirroring on different groups #73)
  • Installable via APT from my deb repository
  • Installable via Archlinux pacman from my repositry
  • It is web APP so nice UI to manage your mirrors (Solves ability to add mirrors via a mirror_list file #62 and add_remote.sh script should be created #44)
  • Manages fingerprints properly for SSH, so no ugly hacks to ignore fingerprint checks
  • Autocreates repository and writtable deploy key for pull mirrors in GitLab
  • Autocreates webhook and deploy key for push mirrors in GitLab

Only disadvantage is, that it requires working database server:

  • PostgreSQL (recommended, tested)
  • MySQL (Should be OK to use)
  • SQLite (Works but i'm unable to provide database migrations in future versions)

Url to project with more info and some docs is on https://github.com/Salamek/gitlab-tools

@samrocketman
Copy link
Owner

Hi @Salamek, no anger from me. Thanks for sharing. That’s a pretty cool project.

Though if you were concerned about upsetting someone (me or anybody), I recommend a more neutral approach as to the challenges of the existing project rather than saying it’s “wrong”.

In any case, it is probably worth mentioning your project in my README after I try out your instructions and validate your project works.

@samrocketman
Copy link
Owner

samrocketman commented Jun 7, 2018

How do you handle projects in which you have no control to set up webhooks? For example, let’s say you want to mirror an upstream project like the Linux Kernel.

@Salamek
Copy link
Author

Salamek commented Jun 7, 2018

@samrocketman Hi, yea you are right about that, i modified it to be less agressive.
Mirrors where you have no control to set webhook can by mirrored via celerybeat (something like integrated cron into application) currently it is disabled in config and no mentions in doc, but it is there.
I may add option to set cron like syntax to project to by mirrored via celery beat and not by hook.

Or you can click on button "Trigger sync" in row of project you want to sync in mirror overview

Or you can call that webhook from cron via curl 😄

I created issue for it Salamek/gitlab-tools#4, i will look on it ASAP

@samrocketman
Copy link
Owner

samrocketman commented Jun 7, 2018

I created gitlab-mirrors to specifically mirror readonly projects which is why it runs on cron. I imagine much of the user base of gitlab-mirrors uses it for this purpose or has upgraded to GitLab Enterprise for this feature.

Note: I imagine because I did not survey anybody but that’s basically the only use case gitlab-mirrors is meant to solve.

If you ever implement a cron-like mirror capability in your software, then I highly recommend using some kind of task/worker queue with thread locking and parallelism. cron definitely has limits and imagine trying to mirror 10,000 repositories.

The eventlet and greenlet libraries are pretty good for parallelism.

@Salamek
Copy link
Author

Salamek commented Jun 7, 2018

@samrocketman i use celery for background tasks in Gitlab-tools, celery supports sheduler (implementig it right now)

@Salamek
Copy link
Author

Salamek commented Jun 11, 2018

@samrocketman hi, Issue Salamek/gitlab-tools#4 (Periodical sync) solved in version 1.0.13, PullMirror: New optional field where user can specifiy a cron expression to run mirror periodicaly:
screenshot from 2018-06-11 13-05-03

@Salamek
Copy link
Author

Salamek commented Jun 11, 2018

@samrocketman also verison 1.0.14 solves #90

@logicminds
Copy link
Contributor

@Salamek thanks for sharing. When @samrocketman created this, his intention was probably to solve a problem in the simplest way possible. Pretty sure he never intended this project to last thing long and be as popular as it is. Cron and bash is simple though, and is way more reliable than having a daemon listening on a port. Getting all the python stuff working is not ideal. However, it just works once setup and has been working for the last 4 years for me.

I welcome anything that can replace this and make it better. A few years ago I added a mirror list functionality with another cron job to easily sync new mirrors.

I don't think adding a database and webserver is warranted in your project. That adds many new security risks, maintenance, dependencies and an entire webstack vs just having bash, python and cron. KISS.

I am for remaking this tool, as it is showing signs of age. It should be in all ruby (or all python) though.

Maybe some of the ex-Githubbers will create such a tool.

@samrocketman
Copy link
Owner

samrocketman commented Jun 11, 2018

👍 to trying to make gitlab-mirrors as simple as possible. Honestly, I thought GitLab would eventually release mirroring support built-in. They did eventually but only for the EE version unfortunately. So instead, community efforts I kind of split on this repository mirroring topic.

If only mirroring was a part of GitLab CE 🤔 .

@samrocketman
Copy link
Owner

Now a days, I mostly use gitlab-mirrors to keep offline copies of my GitHub profile. ref: https://github.com/samrocketman/github-backups ; that one is in Ruby :). If I start using GitLab again I'll probably get an itch to overhaul this project.

@logicminds
Copy link
Contributor

Btw, I work in a secure environment and standing up new web services is frowned upon. So I have special needs. Thus bash and cron are easy to get going and don't require regulatory approval. Personally, I think Gitlab should build this into Gitlab and make the worker queue distributed so we can designate workers that have internet access and be able to mirror from public git repos.

@logicminds
Copy link
Contributor

M$ just bought Github. Expect more users to gitlab.

@Salamek
Copy link
Author

Salamek commented Jun 11, 2018

@logicminds

  1. This project (gitlab-mirrors) is ok only when GitLab is used only by one user + has control over VM/Server where it is installed. GitLab tools is intended to be used by more than one GitLab user who can have their own mirrors etc. Personaly i find web UI much more simple to use than some bash script (i already forgot how to use) hiding on some server.
    Yes, gitlab-mirrors can be used by multiple ppl, but everyone has to install && maintain their own copy...
  2. I use my private Gitlab CE installation CI builds for my public projects hosted on GitHub, waiting for cron to start a mirror is just useless for me (that's why i need web hooks)
  3. I find running few apt and psql commands far more simple than setting up this project + having gitlab-tools installed as apt package is more maintainable and UPDATABLE solution. Personaly i dont like having 3rd party non packaged software running on my server, it makes ppl skip updates and finally forgot they had that software installed in first place.
  4. You dont like to have web UI ? Well teoreticaly you dont need to use it!
    We (You 😄 ) can just add new cli commands to gitlab-tools cli to manage mirrors from CLI!
    Only thing you will lose is web hook support (no webserver)...

PS: It would be great if GitLab team just added pull mirror functionality to Gitlab CE, i was partially hoping that releasing gitlab-tools will "force" them to do that.

@logicminds
Copy link
Contributor

  1. UI is always easier, however I fixed this by making gitlab-mirrors consume a mirror list. So all that is needed is editing a yaml file in a gitlab repo. This is done through the Gitlab WEB UI in gitlab (edit file).
  2. Yea. webhooks would be better. We have our cron syncing every hour. Not ideal, but I am patient.
  3. Yup. Packages are way better. Do you include python 3?. If not you should since it doesn't come with RHEL 6,7.
  4. We never login to our mirroring server. The mirror list is a repo in gitlab that gets synced to the server under a non-privileged user. To add new repos we just update the yaml list in the web UI, then one hour later the synced project ends up in our gitlab instance. It is extremely simple and awesome because we can wrap the same gitlab user controls around it.
  • Multi-threaded syncing would be nice when syncing the 50+ repos we have setup.

Not saying your stuff isn't awesome as it sounds pretty sweet already. I just have unique needs that most people don't care about.

@Salamek
Copy link
Author

Salamek commented Jun 11, 2018

@logicminds

  1. && 4. That is iteresting and working solution. But i would still go with web UI (since i wanted a gitlab-tools to have sync logs so ppl can see what went wrong with mirror task - just like Gitlab CI 😺 )

  2. I only support python3 for whole applications and python2/3 for libraries (like https://github.com/Salamek/cron-descriptor) so no python2 support in gitlab-tools, if someone needs python2 support badly, they will need to create an issue and buy me a beer (or 10) because that will need lot of "useless" work 😈 getting os with python3 in repos seems like cheaper solution.

@Salamek
Copy link
Author

Salamek commented Jun 11, 2018

@samrocketman
BTW, issue #85 solved in gitlab-tools 1.0.15
And thats it for today...

@samrocketman
Copy link
Owner

samrocketman commented Jun 11, 2018

@Salamek awesome. It's nice of you to account for needs from people who have opened issues with this project.

@logicminds

Multi-threaded syncing would be nice when syncing the 50+ repos we have setup.

Can be easily parallelized with xargs.

while read mirror
do
if ! ./update_mirror.sh "${mirror}" >> ${git_mirrors_dir}/cron.log 2>&1 ;then
red_echo "Error: ./update_mirror.sh ${mirror} (more information in ${git_mirrors_dir}/cron.log)" 1>&2
STATUS=1
fi
done <<< "$(ls -1 "${repo_dir}/${gitlab_namespace}")"

Example:

ls -1 "${repo_dir}/${gitlab_namespace}" | xargs -n1 -P0 -I '{}' ./update_mirror.sh '{}' >> "${git_mirrors_dir}/cron.log" 2>&1

The log will likely look pretty ugly due to the parallelism. Probably better to add a logging option to update_mirror.sh which would utilize mktemp for outputting the log and flock to coordinate writing to cron.log in order.

@samrocketman
Copy link
Owner

@logicminds if you wanted to trigger a gitlab-mirror sync on merge you could make use of authorized_keys command to launch a script (instead of allowing a shell) when a specific SSH key connects. See the authorized_keys man page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants