Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests spawn unlimited gpg-agents #1928

Closed
ingwinlu opened this issue Apr 19, 2018 · 36 comments
Closed

tests spawn unlimited gpg-agents #1928

ingwinlu opened this issue Apr 19, 2018 · 36 comments
Assignees
Milestone

Comments

@ingwinlu
Copy link
Contributor

ingwinlu commented Apr 19, 2018

Steps to Reproduce the Problem

  • build elektra for example in a docker container, or check the v2 server
  • run tests make run_nokdbtests
  • ps -ef
  • run tests make run_nokdbtests
  • ps -ef
  • ????
  • wonder where all your pid's went

Expected Result

tests should stop gpg-agents after they are finished

Actual Result

each test run spawns more gpg-agents

System Information

  • Elektra Version: master

Further Log Files and Output

+ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 05:57 pts/0    00:00:00 bash
root     11296     1  0 07:01 pts/0    00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py 
root     11297 11296  0 07:01 pts/0    00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write 
root     28509     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root     28519     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root     28539     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root     30656     1  0 08:00 pts/0    00:00:00 ps -ef
+ make run_nokdbtests
+ ps -ef
+ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 05:57 pts/0    00:00:00 bash
root     11296     1  0 07:01 pts/0    00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py 
root     11297 11296  0 07:01 pts/0    00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write 
root     28509     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root     28519     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root     28539     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root     30778     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.GZbzqb/.gnupg --use-standard-soc
root     30788     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.PEjcKs/.gnupg --use-standard-soc
root     30808     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.d6yL2g/.gnupg --use-standard-soc
root     30923     1  0 08:02 pts/0    00:00:00 ps -ef
@markus2330
Copy link
Contributor

markus2330 commented Apr 19, 2018

Thank you for reporting the problem!

@petermax2 Is it possible that the gpg commands during the tests spawn up gpg-agents?

@petermax2
Copy link
Member

Ooops I thought that gpg would always connect to the same agent. I will investigate.

@ingwinlu
Copy link
Contributor Author

@markus2330 this is also the reason why there are so many gpg agents on v2 reported with your userid, as the docker container runs with 1000:1000.

but the problem is not restricted to docker: debian-stretch-minimal has > 250 of them as well

@ingwinlu
Copy link
Contributor Author

some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)

@markus2330
Copy link
Contributor

Thank you both for looking into this!

some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)

If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).

@petermax2
Copy link
Member

Maybe the gpg agent is not required to start at all and we can suppress it during the tests. But I have to have a look at it in the evening.

@ingwinlu
Copy link
Contributor Author

mh usually GPG_AGENT_INFO should be set when one is started, in the past we cleaned out environment variables so that might have explained the multiple starts in the past. No idea why it is still happening right now though...

@petermax2 the tests that require gpg-agent (found by renaming gpg-agent to gpg-agent.bak ;)):

  • testmod_fcrypt
  • testmod_crypto_openssl
  • testmod_crypto_gcrypt

@petermax2
Copy link
Member

petermax2 commented Apr 19, 2018

testmod_crypto_botan should run exactly like testmod_crypto_gcrypt and testmod_crypto_openssl. Is the Botan test running on the server?

@ingwinlu
Copy link
Contributor Author

@petermax2 probably yes. in the environment where i tested there was no botan installed. it is running however here and probably also spawning agents.

@petermax2
Copy link
Member

It's not that simple. I tried to invoke gpg with the --no-autostart argument during the unit tests, however gpg still starts the agent. --no-use-agent is a funny one. The man page reads:

--no-use-agent 
              This is dummy option. gpg2 always requires the agent.

If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).

Could we give this a shot?

@petermax2
Copy link
Member

Or have a cron-job like

pgrep gpg-agent | xargs -d "\n" kill

or something similar on the build servers/containers?

@ingwinlu
Copy link
Contributor Author

I would have the test check if an agent is available, if not start it and retain it's pid. in the test cleanup stop the agent. everything else is a hack.

@markus2330
Copy link
Contributor

You are right, the only question is where the start and stop should happen. Doing this within our agents/dockers seems to be easier than in our unit tests written in C.

@petermax2
Copy link
Member

petermax2 commented Apr 21, 2018

Here is what I learned so far:

It is possible to suppress the auto-start of the gpg-agent with the --no-autostart option, if consistently used with all gpg calls. However, without a gpg-agent gpg2 can not perform any operations, that require the private key (i.e. decryption, signatures).

It is also possible to fork gpg-agent --server but then gpg2 can not connect to the agent. The environment variable GPG_AGENT_INFO is deprecated and is not considered any longer by gpg2.

I will try to fork and execv gpg-agent --daemon. I just need a way to find out the PID of the started gpg-agent so that I can SIGTERM when the tests are done.

@petermax2
Copy link
Member

Doing this within our agents/dockers seems to be easier than in our unit tests written in C.

Much easier, I guess :-)

@markus2330
Copy link
Contributor

I think your decision was right to simply use the default-way of gpg to connect to agents.

As alternative to starting/stopping gpg-agent, we can also disable the "use-agent" in .gnupg/gpg.conf

@ingwinlu
Copy link
Contributor Author

i have no problem with one agent autostarting (and even have it running). I have a problem with subsequent tests starting a new one

@petermax2
Copy link
Member

I think your decision was right to simply use the default-way of gpg to connect to agents.

In a production environment it is the better option. On my machine crypto and fcrypt always connect to the same agent and the integration with my Yubikey works very well.

in our test environments we mus keep a single instance of the agent up and running before starting the tests. I think the problem is that we clear the environemnt, as @ingwinlu mentioned before.

@ingwinlu
Copy link
Contributor Author

I think the problem is that we clear the environemnt

we shouldn't anymore. but the issue persists

@markus2330
Copy link
Contributor

If gpg-agent tries to communicate via environment it obviously cannot work, the next test run would never get the environment set by a test run before.

I like following two options best:

  1. we properly start/stop a gpg agent within the containers and document in TESTING.md that gpg agent needs to be running (see document required environment for running tests #1888).
  2. we disable startup of gpg agents (disable the "use-agent" in .gnupg/gpg.conf should work, did not test it though) and document this in TESTING.md (see document required environment for running tests #1888).

A setup, where daemons get started on-demand without a global way to know if the daemon has been started already (and env vars are not global but process-specific), seems to be broken. We should not try to fix this within the tests.

@ingwinlu
Copy link
Contributor Author

https://stackoverflow.com/questions/27459869/how-to-stop-gpg-2-1-spawning-many-agents-for-unit-testing

The reason you're spawning lots of agents is the different home directory using the --homedir option, otherwise a single one would have been used. From GnuPG 2.1, all communication with the agent is performed through a socket in the GnuPG homedirectory.

@markus2330
Copy link
Contributor

markus2330 commented Apr 21, 2018

We do not use the homedir option. And https://dev.gnupg.org/T3218 describes the workaround of stackoverflow as "a (very awkward) workaround".

Maybe simply starting the gpg-agent is the most future-proof variant (in a controlled way within our environment). Seems like they in recent versions the startup of gpg-agent is not optional anymore. (which makes my option 2. above nonsensical)

@ingwinlu
Copy link
Contributor Author

ingwinlu commented Apr 21, 2018

We do not use the homedir option.

Yeah I have not found where it comes from but it matches the problem (see op) as all the agents spawned with a different one.

@markus2330
Copy link
Contributor

It was a good hint, I learned that startup of gpg-agent is not optional anymore.

Which makes it very clear that we need to start and stop it. And not try to avoid the starting.

@petermax2
Copy link
Member

We do not use the homedir option.

Yeah I have not found where it comes from but it matches the problem (see op)

We don't use the --home-dir option explicitly, but ps -ef revelas that gpg somehow sets it anyway.

@ingwinlu
Copy link
Contributor Author

ingwinlu commented Apr 21, 2018

https://wiki.archlinux.org/index.php/GnuPG

$GNUPGHOME is used by GnuPG to point to the directory where its configuration files are stored. By default $GNUPGHOME is not set and your $HOME is used instead; thus, you will find a ~/.gnupg directory right after installation.
To change the default location, either run gpg this way $ gpg --homedir path/to/file or set the GNUPGHOME environment variable.

@petermax2 can you check if HOME is available in your testsuite? 

@ingwinlu
Copy link
Contributor Author

ingwinlu commented Apr 21, 2018

also interesting https://www.gnupg.org/documentation/manuals/gnupg/Ephemeral-home-directories.html:

Create a temporary directory, create (or copy) a configuration that meets your needs, make gpg use this directory either using the environment variable GNUPGHOME, or the option --homedir. GPGME supports this too on a per-context basis, by modifying the engine info of contexts. Now execute whatever operation you like, import and export key material as necessary. Once finished, you can delete the directory. All GnuPG backend services that were started will detect this and shut down

Tested this in my container and it cleaned up the process automatically as promised.

@petermax2
Copy link
Member

@petermax2 can you check if HOME is available in your testsuite?

Yes, HOME is available:

HOME = /tmp/elektra-test.3vLR4L

@ingwinlu
Copy link
Contributor Author

OK so something in the test suite is overriding HOME into a tmp directory (which is good). If that is still available during cleanup it should just be removed to stop the agent. That would be an ideal fix.

@petermax2
Copy link
Member

If we simply set GNUPGHOME only one instance of gpg-agent is spawned. GNUPGHOME is not overwritten before the test starts.

With GNUPGHOME set, only one single gpg-agent is running after mulitple test runs.

I think this is the simplest solution.

@ingwinlu
Copy link
Contributor Author

keep in mind that if you share your home directory you might not be able to run tests parallely.
And you would still need to delete GNUPGHOME afterwards (you don't want a lingering pgp-agent answering calls for the logged in user right?).

And what would happen if the target system relays on GNUPGHOME, so you would need to save existing env and restore it manually after tests.

I would appreciate if we could take a step back and look at how those tests might influence user machines, not just the test server environment.

@petermax2
Copy link
Member

you might not be able to run tests parallely.

I ran the script:

#!/bin/bash
mkdir /tmp/x
export GNUPGHOME=/tmp/x
for run in {1..1000000}
do
	ctest -R crypto_openssl &
done

without any problems. GPG should handle locking, etc.

you don't want a lingering pgp-agent answering calls for the logged in user right?

This is the way gpg-agent was designed: it is running forever until the user session ends. It does not write out its PID to some place, there are no commands to quit it. It only reacts to SIGTERM.

I tried to fork the gpg-agent from within the unit test with the --server option, so we would have a PID to kill afterwards. But then gpg-agent does not open the required sockets at $GNUPGHOME and the unit tests re-open another instance of the agent (which is running in --daemon mode). Also there is no way of making gpg-agent opening any sockets when in --server mode (I checked this with the source code of gpg-agent).

gpg-agent is hard to control and hardly documented. I was even reading the source code of gpg-agent. Our use case is not covered. The only option is SIGTERM.

@ingwinlu
Copy link
Contributor Author

parallelism

I was more thinking about you want to separate gpg-agents that should not influence each other. i.e. you only want agent a to have key of test a, and agent b to have key for test b. If that is not needed then a hardcoded tmp home is ok.

killing gpg-agent

When first investigating the issue I came across a website (linked above) that stated that the expected way to shut down a temp gpg-agent is to delete its gpg home directory.

So if you set GNUPGHOME to /tmp/elektra_tests/gpg and during test cleanup delete this tmp directory it should be fine.

@petermax2
Copy link
Member

So if you set GNUPGHOME to /tmp/elektra_tests/gpg and during test cleanup delete this tmp directory it should be fine.

It works! I will integrate this fix into the crypto and fcrypt test cases. Thank you for the tip!

@petermax2 petermax2 added this to the 0.8.24 milestone May 17, 2018
@petermax2
Copy link
Member

I have a working prototype. PR is coming tomorrow.

petermax2 added a commit to petermax2/libelektra that referenced this issue May 17, 2018
petermax2 added a commit to petermax2/libelektra that referenced this issue May 19, 2018
@petermax2
Copy link
Member

Should be fixed with #2056 . Please re-open if the problem still occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants