Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to run customized image #58

Open
ScottLiao920 opened this issue May 20, 2022 · 18 comments
Open

unable to run customized image #58

ScottLiao920 opened this issue May 20, 2022 · 18 comments

Comments

@ScottLiao920
Copy link

Description of the problem

I am trying to run a modified PostgreSQL (with customized executors and so on) inside an enclave. Hence, I pulled the ubuntu18.04 image from dockerhub, built the modified PostgreSQL inside the ubuntu docker, and then docker commit the modified image.
After I have signed the modified image, it failed to load and get error code 6 when I hit docker run.

System information

My system configuration:
Ubuntu 18.04 with kernel 5.9.0
Docker version 20.10.16, build aa7e414
gramine built on branch v1.1

Steps to reproduce

  1. I pulled the ubuntu18.04 image from dockerhub
  2. built it inside the ubuntu docker and then docker committed the modified image.
  3. Sign the image
    I've modified my config.yaml file with
SGXDriver:
    Repository: "https://github.com/intel/linux-sgx-driver.git"
    Branch:     "sgx_driver_2.11"

Then I createad the gsc-signed image using:

./gsc build --insecure-args myImage test/generic.manifest
./gsc sign-image myImage  enclave-key.pem
./gsc info-image gsc-myImage

and I got output from gsc info-image as

mr_enclave = "2454c58cafad79b1ded05a276bef96ccff8b77dbca61071928da014e6183d4e9"
mr_signer = "5416a28ebb3a9ebd0bef05431b2c4ea9eccaec008d7691ef772fa12c2d045bec"
isv_prod_id = 0
isv_svn = 0
date = "2022-05-20"
flags = "0400000000000000"
xfrms = "0300000000000000"
misc_select = "00000000"
debug = false
  1. Run the signed container
docker run --device=/dev/isgx \
   -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
   -it gsc-myImage

Expected results

enter the container successfully with everything protected by SGX

Actual results

++ find /gramine/meson_build_output/lib -type d -path '*/site-packages'
+ export PYTHONPATH=:/gramine/meson_build_output/lib/python3.6/site-packages
+ PYTHONPATH=:/gramine/meson_build_output/lib/python3.6/site-packages
++ find /gramine/meson_build_output/lib -type d -path '*/pkgconfig'
+ export PKG_CONFIG_PATH=:/gramine/meson_build_output/lib/x86_64-linux-gnu/pkgconfig
+ PKG_CONFIG_PATH=:/gramine/meson_build_output/lib/x86_64-linux-gnu/pkgconfig
+ '[' -z '' ']'
+ gramine-sgx-get-token --sig /entrypoint.sig --output /entrypoint.token
Attributes:
    mr_enclave:  2454c58cafad79b1ded05a276bef96ccff8b77dbca61071928da014e6183d4e9
    mr_signer:   5416a28ebb3a9ebd0bef05431b2c4ea9eccaec008d7691ef772fa12c2d045bec
    isv_prod_id: 0
    isv_svn:     0
    attr.flags:  0000000000000004
    attr.xfrm:   0000000000000007
    mask.flags:  ffffffffffffffff
    mask.xfrm:   fffffffffff9ff1b
    misc_select: 00000000
    misc_mask:   ffffffff
    modulus:     63b8dd6ab325beb315c5828b811f983e...
    exponent:    3
    signature:   f4459c545e01a11c46f7f4a7c50b8dd0...
    date:        2022-05-20
Traceback (most recent call last):
  File "/gramine/meson_build_output/bin/gramine-sgx-get-token", line 20, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/gramine/meson_build_output/bin/gramine-sgx-get-token", line 16, in main
    token = get_token(sig, verbose=verbose)
  File "/gramine/meson_build_output/lib/python3.6/site-packages/graminelibos/sgx_get_token.py", line 151, in get_token
    token = connect_aesmd(sig['enclave_hash'], sig['modulus'], sig['attribute_flags'], xfrms)
  File "/gramine/meson_build_output/lib/python3.6/site-packages/graminelibos/sgx_get_token.py", line 88, in connect_aesmd
    raise Exception(f'Failed. (Error Code = {ret_msg.ret.error})')
Exception: Failed. (Error Code = 6)
@dimakuv
Copy link
Contributor

dimakuv commented May 20, 2022

@ScottLiao920
Copy link
Author

I've added -d flag during ./gsc build, now when I docker run the debug docker, I get

[P1:T1:bash] trace: ---- shim_newfstatat(AT_FDCWD, "/usr/local/sbin/", 0xca798940, 0) = 0x0
[P1:T1:bash] trace: ---- shim_newfstatat(AT_FDCWD, "/usr/local/bin/", 0xca798940, 0) = 0x0
[P1:T1:bash] trace: ---- shim_newfstatat(AT_FDCWD, "/usr/sbin/", 0xca798940, 0) = 0x0
[P1:T1:bash] trace: ---- shim_newfstatat(AT_FDCWD, "/usr/bin/", 0xca798940, 0) = 0x0
[P1:T1:bash] trace: ---- shim_newfstatat(AT_FDCWD, "/sbin/", 0xca798940, 0) = 0x0
[P1:T1:bash] trace: ---- shim_newfstatat(AT_FDCWD, "/bin/", 0xca798940, 0) = 0x0
[P1:T1:bash] trace: ---- shim_newfstatat(2, "", 0xca798310, 4096) = 0x0
[P1:T1:bash] trace: ---- shim_ioctl(2, TCGETS, 0xca798280) ...
[P1:T1:bash] trace: ---- return from shim_ioctl(...) = -38
[P1:T1:bash] trace: ---- shim_write(2, 0xcc53b530, 0x22) ...
bash: : No such file or directory
[P1:T1:bash] trace: ---- return from shim_write(...) = 0x22
[P1:T1:bash] debug: ---- shim_exit_group (returning 127)
[P1:T1:bash] debug: clearing POSIX locks for pid 1
[P1:T1:bash] debug: sync client shutdown: closing handles
[P1:T1:bash] debug: sync client shutdown: waiting for confirmation
[P1:T1:bash] debug: sync client shutdown: finished
[P1:shim] debug: IPC worker: exiting worker thread
[P1:T1:bash] debug: process 1 exited with status 127
debug: DkProcessExit: Returning exit code 127

@dimakuv
Copy link
Contributor

dimakuv commented May 20, 2022

Well, something tries to open the "" (empty) path, which obviously fails. Looks like you have some bash script in your Postgres workload? Are you sure the bash script is written correctly?

Anyway, the AESM error seems to be resolved.

@ScottLiao920
Copy link
Author

Yeah... Tks for your help.
Just one little question: can gsc run in an interactive manner?

@dimakuv
Copy link
Contributor

dimakuv commented May 20, 2022

I'm not sure what you mean by "interactive manner", but after you created a "graminized" Docker image (the one with the gsc- prefix), you can do whatever you want with this Docker image. Including running a corresponding Docker container interactively. See Step 8 in https://gramine.readthedocs.io/projects/gsc/en/latest/#example.

@ScottLiao920
Copy link
Author

Tks a lot for your timely response!!!!

@ScottLiao920
Copy link
Author

I'm not sure what you mean by "interactive manner", but after you created a "graminized" Docker image (the one with the gsc- prefix), you can do whatever you want with this Docker image. Including running a corresponding Docker container interactively. See Step 8 in https://gramine.readthedocs.io/projects/gsc/en/latest/#example.

Sorry to reopen this issue again. Something weird happened.
I tried to add -it --entrypoint /bin/bash in docker run, and then I performed several experiments to measure the additional overhead caused by GSC. However, I observed that there was nearly no performance drop compared to experiments ran outside of GSC (i.e. without protection from sgx). Intuitively, as the entire PostgreSQL is ran inside the enclave, there must be some noticeable performance drop but sadly I failed to find any.
I think those experiments are not performed inside the enclave so I'm wondering whether there's something wrong with my setup.
To help you understand this better, here's my entire workflow:

  • Pull a ubuntu image from the docker hub, run the image and build my customized PostgreSQL inside.
  • docker commit the modified image
  • Use GSC to build & sign the image with the generic manifest.
loader.pal_internal_mem_size = "128M"

sgx.enclave_size = "4G"
sgx.thread_num = 8

sgx.trusted_files = [
  "file:/entrypoint.manifest",  # unused entry, only to test merging of manifests
]
  • run the image with
docker run --device=/dev/isgx \
   -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
   -it --entrypoint /bin/bash gsc-myImage
  • Use the command-line interface to issue SQL queries for PostgreSQL to run.

Could you please kindly point out the problem?

@dimakuv
Copy link
Contributor

dimakuv commented May 23, 2022

tried to add -it --entrypoint /bin/bash in docker run, and then I performed several experiments to measure the additional overhead caused by GSC.

So at this point you enter the "graminized" Docker container based on the image gsc-myImage.

Then you run your experiments. Can you show the command lines -- how exactly you run the experiments? (I have a suspicion that you run native applications instead of gramine-sgx <your app>).

@ScottLiao920
Copy link
Author

You are right... I did run those native applications instead of graminized ones.
However, there're many command-line applications related to PostgreSQL (pg_ctl, initdb, psql, createdb, etc.). So shall I write manifest files for them separately or shall I just use a docker file for all the experiments and then use gsc to sign it?

@dimakuv
Copy link
Contributor

dimakuv commented May 23, 2022

However, there're many command-line applications related to PostgreSQL (pg_ctl, initdb, psql, createdb, etc.).

Ideally, you would write manifest files for each one of them -- well, only for those that need to run inside the SGX enclaves. But this would also mean that GSC doesn't suit your purpose -- GSC is not tailored to run several applications in the same Docker image. GSC wraps only one application in the Docker image.

However, many users of GSC do a trick: they write a Bash/Python script that chooses which application to invoke, based on the command line argument or environment variable. Then this Bash/Python script is marked as the ENTRYPOINT in the Docker image. Thus, GSC wraps this Bash/Python script and all the applications that this script invokes.

I suggest you to use this trick.

@ScottLiao920
Copy link
Author

Thanks for your reply. Just a silly question: suppose I put those native commands in the bash script, mark it as the entrypoint inside dockerfile, build & sign the image, run the "graminized" container, and then all native commands are running inside an enclave right?
Do please correct me if I am wrong.

@dimakuv
Copy link
Contributor

dimakuv commented May 23, 2022

Yes, you are absolutely right.

Think of it this way: Gramine just needs an "entry" program as a hint. Then Gramine starts this entry program inside the SGX enclave, and all child programs that this entry program spawns are also automatically started by Gramine inside SGX enclaves.

If it helps, make an analogy with Docker containers: Docker runtime just needs an ENTRYPOINT in the Docker image -- this entrypoint is a hint which program to start inside the Docker container, and all child programs that this entrypoint program spawns are also running in the same Docker container. Same with Gramine.

@ScottLiao920
Copy link
Author

ScottLiao920 commented May 24, 2022

Tks a lot. I am now trying to run all those experiments in a bash script and mark it as the entry point of my image. This bash script consists of several parts in this order:

  • initdb -d db_directory to initialize a folder for database cluster
  • pg_ctl -D db_directory start to start a database server, which will invoke serveral processes for server backend
  • createdb db_name to create a new database instance
  • several psql dbname -f xxx.sql commands to load data and run experiment

However, when I run the docker, all psql dbname -f xxx.sql commands failed to connect to the server initialized by the previous command. Normally it's caused by an uninitialized server or improper socket settings. But in GSC, I am not able to determine which one caused this problem. Based on the output in terminal, initdb, pg_ctl, createdb get exit code 1 and psql commands get exit code 2. Could you please tell me the meaning of those exit codes? Also, is there any restriction from GSC on socket communication within the docker? As I got many warning: shim_socket: unknown socket domain 16 during execution.

@dimakuv
Copy link
Contributor

dimakuv commented May 24, 2022

Could you please tell me the meaning of those exit codes?

These are the exit codes reported by these PostgreSQL commands. These exit codes have nothing to do with Gramine (at least I think so). Please enable debug info, to get more logs from Gramine (for this, rebuild your Docker image with ./gsc build --debug ...) and analyze these logs.

Also, is there any restriction from GSC on socket communication within the docker?

No, GSC poses no restrictions on socket communication.

As I got many warning: shim_socket: unknown socket domain 16 during execution.

16 is the AF_NETLINK type of sockets. These kinds of sockets are not implemented in Gramine. I'm unsure if PostgreSQL really wants to use NETLINK sockets, or PostgreSQL just probes them and falls back to other sockets when NETLINK is not found.

Could you please take a deeper look at the debug logs of Gramine? Analyzing them may give you a better understanding of what exactly is failing. Also, do Postgres programs themselves print any information?

@ScottLiao920
Copy link
Author

Could you please take a deeper look at the debug logs of Gramine? Analyzing them may give you a better understanding of what exactly is failing

There're System error[P2:T2:su] trace: ---- return from shim_write(...) = 0xc error messages for initdb, pg_ctl, and createdb. As those commands are required to be issued be non-root user, they are wrapped by su - postgres -c .
Also, other than the socker warning, I got these following warnings:

  • warning: Disallowing access to file '/lib/x86_64-linux-gnu/security/pam_xxxx.so'; file is not trusted or allowed. where xxxx can be umask, limits, mail, etc.
  • warning: DkVirtualMemoryProtect is unimplemented in Linux-SGX PAL
  • warning: Disallowing access to file '/var/log/btmp'; file is not trusted or allowed. But I've included /var/log in my dockerfile VOLUME configuration and I think it should be included in trusted files.

Also, do Postgres programs themselves print any information?

Sadly, no except psql: could not connect to server: Connection refused Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432". Also, no postgres logfile is generated and the db_directory is not initilized. So I believe it's something wrong about the sytem error in initdb, pg_ctl, and createdb commands.

@dimakuv
Copy link
Contributor

dimakuv commented May 24, 2022

As those commands are required to be issued be non-root user, they are wrapped by su - postgres -c .

You meant "to be issues by non-root user"? For this, we have the special manifest options: https://gramine.readthedocs.io/en/latest/manifest-syntax.html#user-id-and-group-id. By default, Gramine uses uid = gid = 0, so the application inside Gramine thinks it is a root. You can change this behavior by changing these manifest options. But also looks like you changes Postgres behavior by an explicit Postgres command-line switch -c, so maybe this is not needed.

warning: Disallowing access to file '/lib/x86_64-linux-gnu/security/pam_xxxx.so'; file is not trusted or allowed. where xxxx can be umask, limits, mail, etc.

This looks very serious. For some reason, GSC didn't mark these files as trusted. I don't know why -- are you installing some packages after GSC generated the final graminized Docker image? Why would GSC not include these files as trusted? It's surprising to me.

Anyway, you can mark these files as sgx.trusted_files = ["/lib/x86_64-linux-gnu/security/"] in your manifest file. Then GSC/Gramine will surely mark them as trusted.

  • warning: DkVirtualMemoryProtect is unimplemented in Linux-SGX PAL

You can ignore this. It is harmless.

  • warning: Disallowing access to file '/var/log/btmp'; file is not trusted or allowed. But I've included /var/log in my dockerfile VOLUME configuration and I think it should be included in trusted files.

You also need to include it in your manifest file. The easiest is to add sgx.allowed_files = ["/var/log/"].

@ScottLiao920
Copy link
Author

ScottLiao920 commented May 24, 2022

  • warning: Disallowing access to file '/lib/x86_64-linux-gnu/security/pam_xxxx.so'; file is not trusted or allowed. where xxxx can be umask, limits, mail, etc.
  • warning: Disallowing access to file '/var/log/btmp'; file is not trusted or allowed. But I've included /var/log in my dockerfile VOLUME configuration and I think it should be included in trusted files.

These two warning doesn't exist anymore after add sgx.trusted_files = ["file:/lib/x86_64-linux-gnu/security/"] and sgx.allowed_files = ["file:/var/log/"] in my manifest file.

There're System error[P2:T2:su] trace: ---- return from shim_write(...) = 0xc error messages for initdb, pg_ctl, and createdb

This error still exists. Any idea for this system error?

@dimakuv
Copy link
Contributor

dimakuv commented May 24, 2022

The System error is not an error from Gramine. This is something from Postgres, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants