Skip to content

AESM Error while executing Confidential PyTorch Example in Docker #82

@asim29

Description

@asim29

Hi, I am trying to run the end-to-end confidential pytorch example from this tutorial. I was able to run the non-confidential part of the tutorial using gramine-sgx, but I am running into the following error when trying to run the confidential example:

root@xyz:pytorch-confidential# gramine-sgx ./pytorch pytorchexample.py
Gramine is starting. Parsing TOML manifest file, this may take some time...
error: Cannot connect to AESM service (tried sgx_aesm_socket_base and /var/run/aesmd/aesm.socket UNIX sockets).
Please check its status! (`service aesmd status` on Ubuntu)
error: load_enclave() failed with error: No such file or directory (ENOENT)

When I try to run service aesmd status I get the following output:

root@xyz:pytorch-confidential# service aesmd status
aesmd: unrecognized service

I followed the tutorial and I can see that the sgx-aesm-service service is installed. The docker file I am using to run Gramine is:

FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive
ENV LC_ALL=C.UTF-8 LANG=C.UTF-8

# Main Dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        wget \
        gnupg \
        ca-certificates \
        software-properties-common \
        libnss-mdns \
        libnss-myhostname \
        git \
        curl \
        linux-headers-5.15.0-52-generic \
        openssh-client \
        screen \
        && apt-get clean && rm -rf /var/lib/apt/lists/*

# Gramine Dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        autoconf \ 
        bison \ 
        gawk \
        nasm \
        ninja-build \
        pkg-config \
        python3 \
        python3-click \
        python3-jinja2 \
        python3-pip \
        python3-pyelftools 

RUN python3 -m pip install 'meson>=0.56' 'tomli>=1.1.0' 'tomli-w>=0.4.0'

# Intel SGX-related Dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        libprotobuf-c-dev \
        protobuf-c-compiler \
        protobuf-compiler \
        python3-cryptography \
        python3-protobuf

# Intel SGX SDK/PSW
RUN ["/bin/bash", "-c", "set -o pipefail && echo 'deb [trusted=yes arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main' | tee /etc/apt/sources.list.d/intel-sgx.list"]
RUN ["/bin/bash", "-c", "set -o pipefail && wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -"]

RUN apt-get update && apt-get install -y --no-install-recommends \
        libsgx-epid \
        libsgx-quote-ex \
        libsgx-dcap-ql \
        libsgx-quote-ex \
        libsgx-quote-ex-dev \
        libsgx-qe3-logic \
        sgx-aesm-service 

# DCAP 
RUN curl -fsSLo /usr/share/keyrings/intel-sgx-deb.asc https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key
RUN apt-get update && apt-get install -y --no-install-recommends \
        libsgx-dcap-ql-dev \
        libsgx-dcap-quote-verify-dev \
        libsgx-dcap-default-qpl \
        libsgx-dcap-default-qpl-dev \
        && apt-get clean && rm -rf /var/lib/apt/lists/*

# Build and Install Gramine
ENV HOMEDIR=/home
ENV GRAMINEDIR=${HOMEDIR}/gramine

WORKDIR ${GRAMINEDIR}
RUN git clone https://github.com/gramineproject/gramine.git ${GRAMINEDIR} 

RUN meson setup build/ --buildtype=release -Ddirect=enabled -Dsgx=enabled -Ddcap=enabled 
RUN ninja -C build/ 
RUN ninja -C build/ install 
RUN gramine-sgx-gen-private-key

# Install PyTorch
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

The manifest template (edited as shown in the tutorial):

# SPDX-License-Identifier: LGPL-3.0-or-later

# PyTorch manifest template

loader.entrypoint = "file:{{ gramine.libos }}"
libos.entrypoint = "{{ entrypoint }}"

loader.log_level = "{{ log_level }}"

loader.env.LD_LIBRARY_PATH = "/lib:/usr/lib:{{ arch_libdir }}:/usr/{{ arch_libdir }}"
loader.env.HOME = "{{ env.HOME }}"

# Restrict the maximum number of threads to prevent insufficient memory
# issue, observed on CentOS/RHEL.
loader.env.OMP_NUM_THREADS = "8"

loader.insecure__use_cmdline_argv = true

fs.mounts = [
  { path = "{{ entrypoint }}", uri = "file:{{ entrypoint }}" },
  { path = "/lib", uri = "file:{{ gramine.runtimedir() }}" },
  { path = "/usr/lib", uri = "file:/usr/lib" },
  { path = "{{ arch_libdir }}", uri = "file:{{ arch_libdir }}" },
  { path = "/usr/{{ arch_libdir }}", uri = "file:/usr/{{ arch_libdir }}" },
{% for path in python.get_sys_path(entrypoint) %}
  { path = "{{ path }}", uri = "file:{{ path }}" },
{% endfor %}

  { type = "tmpfs", path = "/tmp" },

  { path = "/classes.txt", uri = "file:classes.txt", type = "encrypted" },
  { path = "/input.jpg", uri = "file:input.jpg", type = "encrypted" },
  { path = "/alexnet-pretrained.pt", uri = "file:alexnet-pretrained.pt", type = "encrypted" },
  
  { path = "/result.txt", uri = "file:result.txt", type = "encrypted" },
]

sgx.enclave_size = "4G"
sgx.max_threads = 32
sgx.edmm_enable = {{ 'true' if env.get('EDMM', '0') == '1' else 'false' }}

sgx.trusted_files = [
  "file:{{ entrypoint }}",
  "file:{{ gramine.libos }}",
  "file:{{ gramine.runtimedir() }}/",
  "file:/usr/lib/",
  "file:{{ arch_libdir }}/",
  "file:/usr/{{ arch_libdir }}/",
{% for path in python.get_sys_path(entrypoint) %}
  "file:{{ path }}{{ '/' if path.is_dir() else '' }}",
{% endfor %}

  "file:pytorchexample.py",

]

sgx.allowed_files = [
  "file:ssl/ca.crt",
]

sys.enable_extra_runtime_domain_names_conf = true

sgx.remote_attestation = "dcap"

loader.env.LD_PRELOAD = "libsecret_prov_attest.so"
loader.env.SECRET_PROVISION_CONSTRUCTOR = "1"
loader.env.SECRET_PROVISION_SET_KEY = "default"
loader.env.SECRET_PROVISION_CA_CHAIN_PATH = "ssl/ca.crt"
loader.env.SECRET_PROVISION_SERVERS = "localhost:4433"

# Gramine optionally provides patched OpenMP runtime library that runs faster inside SGX enclaves
# (add `-Dlibgomp=enabled` when configuring the build). Uncomment the line below to use the patched
# library. PyTorch's SGX perf overhead decreases on some workloads from 25% to 8% with this patched
# library. Note that we need to preload the library because PyTorch's distribution renames
# libgomp.so to smth like libgomp-7c85b1e2.so.1, so it's not just a matter of searching in the
# Gramine's Runtime path first, but a matter of intercepting OpenMP functions.
# loader.env.LD_PRELOAD = "/lib/libgomp.so.1"

I launch the provisioning server before I run the gramine commands and I can see it running in the background using the top command.

I am unsure why the service command cannot find the aesmd service. I can see that the container does indeed contain the following files:

  • /lib/systemd/system/aesmd.service
  • /etc/aesmd.conf
  • /opt/intel/sgx-aesm-service/aesm/aesm_service

The aesmd.conf file looks like this:

#Line with comments only

	  #empty line with comment
#proxy type    = direct #direct type means no proxy used
#proxy type    = default #system default proxy
#proxy type    = manual #aesm proxy should be specified for manual proxy type
#aesm proxy    = http://proxy_url:proxy_port
#whitelist url = http://sample_while_list_url/
#default quoting type = ecdsa_256
#default quoting type = epid_linkable
#default quoting type = epid_unlinkable
#qpl log level = error
#qpl log level = infocat: n: No such file or directory

Have I done something wrong in the installation process, or is something extra required to make this work within a Docker container?

I appreciate any help you can provide.

Best,
Asim.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions