tests spawn unlimited gpg-agents #1928

ingwinlu · 2018-04-19T08:05:50Z

Steps to Reproduce the Problem

build elektra for example in a docker container, or check the v2 server
run tests make run_nokdbtests
ps -ef
run tests make run_nokdbtests
ps -ef
????
wonder where all your pid's went

Expected Result

tests should stop gpg-agents after they are finished

Actual Result

each test run spawns more gpg-agents

System Information

Elektra Version: master

Further Log Files and Output

+ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 05:57 pts/0    00:00:00 bash
root     11296     1  0 07:01 pts/0    00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py 
root     11297 11296  0 07:01 pts/0    00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write 
root     28509     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root     28519     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root     28539     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root     30656     1  0 08:00 pts/0    00:00:00 ps -ef
+ make run_nokdbtests
+ ps -ef
+ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 05:57 pts/0    00:00:00 bash
root     11296     1  0 07:01 pts/0    00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py 
root     11297 11296  0 07:01 pts/0    00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write 
root     28509     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root     28519     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root     28539     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root     30778     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.GZbzqb/.gnupg --use-standard-soc
root     30788     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.PEjcKs/.gnupg --use-standard-soc
root     30808     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.d6yL2g/.gnupg --use-standard-soc
root     30923     1  0 08:02 pts/0    00:00:00 ps -ef

The text was updated successfully, but these errors were encountered:

markus2330 · 2018-04-19T08:28:35Z

Thank you for reporting the problem!

@petermax2 Is it possible that the gpg commands during the tests spawn up gpg-agents?

petermax2 · 2018-04-19T08:30:26Z

Ooops I thought that gpg would always connect to the same agent. I will investigate.

ingwinlu · 2018-04-19T08:40:41Z

@markus2330 this is also the reason why there are so many gpg agents on v2 reported with your userid, as the docker container runs with 1000:1000.

but the problem is not restricted to docker: debian-stretch-minimal has > 250 of them as well

ingwinlu · 2018-04-19T08:41:15Z

some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)

markus2330 · 2018-04-19T08:48:59Z

Thank you both for looking into this!

some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)

If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).

petermax2 · 2018-04-19T08:50:59Z

Maybe the gpg agent is not required to start at all and we can suppress it during the tests. But I have to have a look at it in the evening.

ingwinlu · 2018-04-19T09:50:48Z

mh usually GPG_AGENT_INFO should be set when one is started, in the past we cleaned out environment variables so that might have explained the multiple starts in the past. No idea why it is still happening right now though...

@petermax2 the tests that require gpg-agent (found by renaming gpg-agent to gpg-agent.bak ;)):

testmod_fcrypt
testmod_crypto_openssl
testmod_crypto_gcrypt

petermax2 · 2018-04-19T11:16:10Z

testmod_crypto_botan should run exactly like testmod_crypto_gcrypt and testmod_crypto_openssl. Is the Botan test running on the server?

ingwinlu · 2018-04-19T11:32:06Z

@petermax2 probably yes. in the environment where i tested there was no botan installed. it is running however here and probably also spawning agents.

petermax2 · 2018-04-19T19:44:27Z

It's not that simple. I tried to invoke gpg with the --no-autostart argument during the unit tests, however gpg still starts the agent. --no-use-agent is a funny one. The man page reads:

--no-use-agent 
              This is dummy option. gpg2 always requires the agent.

If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).

Could we give this a shot?

petermax2 · 2018-04-19T19:47:19Z

Or have a cron-job like

pgrep gpg-agent | xargs -d "\n" kill

or something similar on the build servers/containers?

ingwinlu · 2018-04-19T20:13:00Z

I would have the test check if an agent is available, if not start it and retain it's pid. in the test cleanup stop the agent. everything else is a hack.

markus2330 · 2018-04-21T06:41:47Z

You are right, the only question is where the start and stop should happen. Doing this within our agents/dockers seems to be easier than in our unit tests written in C.

petermax2 · 2018-04-21T06:42:00Z

Here is what I learned so far:

It is possible to suppress the auto-start of the gpg-agent with the --no-autostart option, if consistently used with all gpg calls. However, without a gpg-agent gpg2 can not perform any operations, that require the private key (i.e. decryption, signatures).

It is also possible to fork gpg-agent --server but then gpg2 can not connect to the agent. The environment variable GPG_AGENT_INFO is deprecated and is not considered any longer by gpg2.

I will try to fork and execv gpg-agent --daemon. I just need a way to find out the PID of the started gpg-agent so that I can SIGTERM when the tests are done.

petermax2 · 2018-04-21T06:43:23Z

Doing this within our agents/dockers seems to be easier than in our unit tests written in C.

Much easier, I guess :-)

markus2330 · 2018-04-21T06:44:59Z

I think your decision was right to simply use the default-way of gpg to connect to agents.

As alternative to starting/stopping gpg-agent, we can also disable the "use-agent" in .gnupg/gpg.conf

ingwinlu · 2018-04-21T06:47:06Z

i have no problem with one agent autostarting (and even have it running). I have a problem with subsequent tests starting a new one

petermax2 · 2018-04-21T06:47:54Z

I think your decision was right to simply use the default-way of gpg to connect to agents.

In a production environment it is the better option. On my machine crypto and fcrypt always connect to the same agent and the integration with my Yubikey works very well.

in our test environments we mus keep a single instance of the agent up and running before starting the tests. I think the problem is that we clear the environemnt, as @ingwinlu mentioned before.

ingwinlu · 2018-04-21T06:48:22Z

I think the problem is that we clear the environemnt

we shouldn't anymore. but the issue persists

markus2330 · 2018-04-21T07:00:08Z

If gpg-agent tries to communicate via environment it obviously cannot work, the next test run would never get the environment set by a test run before.

I like following two options best:

we properly start/stop a gpg agent within the containers and document in TESTING.md that gpg agent needs to be running (see document required environment for running tests #1888).
we disable startup of gpg agents (disable the "use-agent" in .gnupg/gpg.conf should work, did not test it though) and document this in TESTING.md (see document required environment for running tests #1888).

A setup, where daemons get started on-demand without a global way to know if the daemon has been started already (and env vars are not global but process-specific), seems to be broken. We should not try to fix this within the tests.

ingwinlu · 2018-04-21T07:09:23Z

https://stackoverflow.com/questions/27459869/how-to-stop-gpg-2-1-spawning-many-agents-for-unit-testing

The reason you're spawning lots of agents is the different home directory using the --homedir option, otherwise a single one would have been used. From GnuPG 2.1, all communication with the agent is performed through a socket in the GnuPG homedirectory.

markus2330 · 2018-04-21T07:17:09Z

We do not use the homedir option. And https://dev.gnupg.org/T3218 describes the workaround of stackoverflow as "a (very awkward) workaround".

Maybe simply starting the gpg-agent is the most future-proof variant (in a controlled way within our environment). Seems like they in recent versions the startup of gpg-agent is not optional anymore. (which makes my option 2. above nonsensical)

ingwinlu · 2018-04-21T07:18:01Z

We do not use the homedir option.

Yeah I have not found where it comes from but it matches the problem (see op) as all the agents spawned with a different one.

markus2330 · 2018-04-21T07:19:01Z

It was a good hint, I learned that startup of gpg-agent is not optional anymore.

Which makes it very clear that we need to start and stop it. And not try to avoid the starting.

petermax2 · 2018-04-21T07:19:13Z

We do not use the homedir option.

Yeah I have not found where it comes from but it matches the problem (see op)

We don't use the --home-dir option explicitly, but ps -ef revelas that gpg somehow sets it anyway.

ingwinlu · 2018-04-21T07:36:56Z

https://wiki.archlinux.org/index.php/GnuPG

$GNUPGHOME is used by GnuPG to point to the directory where its configuration files are stored. By default $GNUPGHOME is not set and your $HOME is used instead; thus, you will find a ~/.gnupg directory right after installation.
To change the default location, either run gpg this way $ gpg --homedir path/to/file or set the GNUPGHOME environment variable.

@petermax2 can you check if HOME is available in your testsuite?

ingwinlu · 2018-04-21T07:42:13Z

also interesting https://www.gnupg.org/documentation/manuals/gnupg/Ephemeral-home-directories.html:

Create a temporary directory, create (or copy) a configuration that meets your needs, make gpg use this directory either using the environment variable GNUPGHOME, or the option --homedir. GPGME supports this too on a per-context basis, by modifying the engine info of contexts. Now execute whatever operation you like, import and export key material as necessary. Once finished, you can delete the directory. All GnuPG backend services that were started will detect this and shut down

Tested this in my container and it cleaned up the process automatically as promised.

petermax2 · 2018-04-21T07:46:46Z

@petermax2 can you check if HOME is available in your testsuite?

Yes, HOME is available:

HOME = /tmp/elektra-test.3vLR4L

ingwinlu · 2018-04-21T07:48:08Z

OK so something in the test suite is overriding HOME into a tmp directory (which is good). If that is still available during cleanup it should just be removed to stop the agent. That would be an ideal fix.

petermax2 · 2018-04-21T07:54:42Z

If we simply set GNUPGHOME only one instance of gpg-agent is spawned. GNUPGHOME is not overwritten before the test starts.

With GNUPGHOME set, only one single gpg-agent is running after mulitple test runs.

I think this is the simplest solution.

ingwinlu · 2018-04-21T08:43:57Z

keep in mind that if you share your home directory you might not be able to run tests parallely.
And you would still need to delete GNUPGHOME afterwards (you don't want a lingering pgp-agent answering calls for the logged in user right?).

And what would happen if the target system relays on GNUPGHOME, so you would need to save existing env and restore it manually after tests.

I would appreciate if we could take a step back and look at how those tests might influence user machines, not just the test server environment.

petermax2 · 2018-05-13T15:59:03Z

you might not be able to run tests parallely.

I ran the script:

#!/bin/bash
mkdir /tmp/x
export GNUPGHOME=/tmp/x
for run in {1..1000000}
do
	ctest -R crypto_openssl &
done

without any problems. GPG should handle locking, etc.

you don't want a lingering pgp-agent answering calls for the logged in user right?

This is the way gpg-agent was designed: it is running forever until the user session ends. It does not write out its PID to some place, there are no commands to quit it. It only reacts to SIGTERM.

I tried to fork the gpg-agent from within the unit test with the --server option, so we would have a PID to kill afterwards. But then gpg-agent does not open the required sockets at $GNUPGHOME and the unit tests re-open another instance of the agent (which is running in --daemon mode). Also there is no way of making gpg-agent opening any sockets when in --server mode (I checked this with the source code of gpg-agent).

gpg-agent is hard to control and hardly documented. I was even reading the source code of gpg-agent. Our use case is not covered. The only option is SIGTERM.

ingwinlu · 2018-05-14T14:35:22Z

parallelism

I was more thinking about you want to separate gpg-agents that should not influence each other. i.e. you only want agent a to have key of test a, and agent b to have key for test b. If that is not needed then a hardcoded tmp home is ok.

killing gpg-agent

When first investigating the issue I came across a website (linked above) that stated that the expected way to shut down a temp gpg-agent is to delete its gpg home directory.

So if you set GNUPGHOME to /tmp/elektra_tests/gpg and during test cleanup delete this tmp directory it should be fine.

petermax2 · 2018-05-17T19:42:16Z

So if you set GNUPGHOME to /tmp/elektra_tests/gpg and during test cleanup delete this tmp directory it should be fine.

It works! I will integrate this fix into the crypto and fcrypt test cases. Thank you for the tip!

petermax2 · 2018-05-17T21:04:25Z

I have a working prototype. PR is coming tomorrow.

See ElektraInitiative#1928 for discussion.

petermax2 · 2018-06-18T17:58:32Z

Should be fixed with #2056 . Please re-open if the problem still occurs.

ingwinlu added the bug label Apr 19, 2018

markus2330 assigned petermax2 Apr 19, 2018

ingwinlu mentioned this issue Apr 27, 2018

dbusrecv: build server #1945

Closed

petermax2 mentioned this issue May 12, 2018

Tests behave oddly in certain environments (debian packages) #1973

Closed

petermax2 added the help wanted label May 13, 2018

petermax2 added work in progress and removed help wanted labels May 17, 2018

petermax2 added this to the 0.8.24 milestone May 17, 2018

petermax2 added a commit to petermax2/libelektra that referenced this issue May 17, 2018

crypto: clean up environment in tests

5c341fc

See ElektraInitiative#1928 for discussion.

petermax2 mentioned this issue May 17, 2018

crypto: stop gpg-agents after unit test #2008

Closed

9 tasks

petermax2 added a commit to petermax2/libelektra that referenced this issue May 19, 2018

crypto: clean up environment in tests (part II)

7ead198

See ElektraInitiative#1928 for discussion.

petermax2 closed this as completed Jun 18, 2018

tests spawn unlimited gpg-agents #1928

tests spawn unlimited gpg-agents #1928

Comments

ingwinlu commented Apr 19, 2018 • edited

Steps to Reproduce the Problem

Expected Result

Actual Result

System Information

Further Log Files and Output

markus2330 commented Apr 19, 2018 • edited

petermax2 commented Apr 19, 2018

ingwinlu commented Apr 19, 2018

ingwinlu commented Apr 19, 2018

markus2330 commented Apr 19, 2018

petermax2 commented Apr 19, 2018

ingwinlu commented Apr 19, 2018

petermax2 commented Apr 19, 2018 • edited

ingwinlu commented Apr 19, 2018

petermax2 commented Apr 19, 2018

petermax2 commented Apr 19, 2018

ingwinlu commented Apr 19, 2018

markus2330 commented Apr 21, 2018

petermax2 commented Apr 21, 2018 • edited

petermax2 commented Apr 21, 2018

markus2330 commented Apr 21, 2018

ingwinlu commented Apr 21, 2018

petermax2 commented Apr 21, 2018

ingwinlu commented Apr 21, 2018

markus2330 commented Apr 21, 2018

ingwinlu commented Apr 21, 2018

markus2330 commented Apr 21, 2018 • edited

ingwinlu commented Apr 21, 2018 • edited

markus2330 commented Apr 21, 2018

petermax2 commented Apr 21, 2018

ingwinlu commented Apr 21, 2018 • edited

ingwinlu commented Apr 21, 2018 • edited

petermax2 commented Apr 21, 2018

ingwinlu commented Apr 21, 2018

petermax2 commented Apr 21, 2018

ingwinlu commented Apr 21, 2018

petermax2 commented May 13, 2018

ingwinlu commented May 14, 2018

petermax2 commented May 17, 2018

petermax2 commented May 17, 2018

petermax2 commented Jun 18, 2018

ingwinlu commented Apr 19, 2018 •

edited

markus2330 commented Apr 19, 2018 •

edited

petermax2 commented Apr 19, 2018 •

edited

petermax2 commented Apr 21, 2018 •

edited

markus2330 commented Apr 21, 2018 •

edited

ingwinlu commented Apr 21, 2018 •

edited

ingwinlu commented Apr 21, 2018 •

edited

ingwinlu commented Apr 21, 2018 •

edited