Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 33 additions & 2 deletions core/scripts/generate-token.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,38 @@ export KEYCLOAK_PASSWORD=changeme # The password for Keycloak admin login
export KEYCLOAK_CLIENT_ID=my-client-id # The client ID to be created in Keycloak

export KEYCLOAK_CLIENT_SECRET=$(bash "${SCRIPT_DIR}/keycloak-fetch-client-secret.sh" ${BASE_URL} ${KEYCLOAK_ADMIN_USERNAME} ${KEYCLOAK_PASSWORD} ${KEYCLOAK_CLIENT_ID} | awk -F': ' '/Client secret:/ {print $2}')
export TOKEN=$(curl -k -X POST https://$BASE_URL/token -H 'Content-Type: application/x-www-form-urlencoded' -d "grant_type=client_credentials&client_id=${KEYCLOAK_CLIENT_ID}&client_secret=${KEYCLOAK_CLIENT_SECRET}" | jq -r .access_token)

# Set token lifespan on the client (in seconds)
# 3600 = 1 hour, 86400 = 24 hours, 604800 = 7 days
TOKEN_LIFESPAN=${TOKEN_LIFESPAN:-3600} # default 1 hour, override via env var

# Get admin token first
ADMIN_TOKEN=$(curl -k -s -X POST \
https://${BASE_URL}/realms/master/protocol/openid-connect/token \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=password" \
-d "client_id=admin-cli" \
-d "username=${KEYCLOAK_ADMIN_USERNAME}" \
-d "password=${KEYCLOAK_PASSWORD}" | jq -r '.access_token')

# Get the client UUID
CLIENT_UUID=$(curl -k -s \
"https://${BASE_URL}/admin/realms/master/clients?clientId=${KEYCLOAK_CLIENT_ID}" \
-H "Authorization: Bearer ${ADMIN_TOKEN}" | jq -r '.[0].id')

# Update token lifespan for the client
curl -k -s -X PUT \
"https://${BASE_URL}/admin/realms/master/clients/${CLIENT_UUID}" \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"attributes\": {\"access.token.lifespan\": \"${TOKEN_LIFESPAN}\"}}"

export TOKEN=$(curl -k -s -X POST \
https://$BASE_URL/token \
-H 'Content-Type: application/x-www-form-urlencoded' \
-d "grant_type=client_credentials&client_id=${KEYCLOAK_CLIENT_ID}&client_secret=${KEYCLOAK_CLIENT_SECRET}" \
| jq -r .access_token)

echo "BASE_URL=${BASE_URL}"
echo "TOKEN=${TOKEN}"
echo "TOKEN=${TOKEN}"
echo "TOKEN_LIFESPAN=${TOKEN_LIFESPAN} seconds ($(( TOKEN_LIFESPAN / 60 )) minutes)"
Original file line number Diff line number Diff line change
Expand Up @@ -256,17 +256,22 @@ chmod +x generate-token.sh
After the script completes successfully, confirm that the token is available in your shell:

```bash
echo $BASE_URL
echo $TOKEN
```

If a valid token is returned (long JWT string), the environment is ready for inference testing.

Set the DNS used to deploy Enterprise Inference:
```bash
export BASE_URL=https://api.example.com
```

Change the model as needed. Note that with Keycloak/APISIX, the model name is included in the URL path. This must match one of the routes from the command `kubectl get apisixroutes`. Run **ONE** of the following commands depending on the hardware platform Enterprise Inference is deployed on.

**Run a test query for Gaudi:**
> Note: Replace ${BASE_URL} with your DNS

```bash
curl -k https://${BASE_URL}/Llama-3.1-8B-Instruct/v1/completions \
curl -k ${BASE_URL}/Llama-3.1-8B-Instruct/v1/completions \
-X POST \
-d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "prompt": "What is Deep Learning?", "max_tokens": 25, "temperature": 0}' \
-H 'Content-Type: application/json' \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -218,14 +218,16 @@ Expected States:

### 4. Test the Inference

Set the DNS used to deploy Enterprise Inference:
```bash
export BASE_URL=https://api.example.com
```
Reference the litellm_master_key file under core/inventory/metadata/vault.yml for master-key

Reference the litellm_master_key file under core/inventory/metadata/vault.yml for master-key. Change the model as needed. Run **ONE** of the following commands depending on the hardware platform Enterprise Inference is deployed on.

**Run a test query for Gaudi:**
```bash
curl -k https://${BASE_URL}/v1/completions \
curl -k ${BASE_URL}/v1/completions \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <<master-key>>" \
Expand All @@ -239,7 +241,7 @@ curl -k https://${BASE_URL}/v1/completions \

**Run a test query for CPU:**
```bash
curl -k https://${BASE_URL}/v1/completions \
curl -k ${BASE_URL}/v1/completions \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <<master-key>>" \
Expand Down
44 changes: 29 additions & 15 deletions third_party/Dell/ubuntu-22.04/iac/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ This script mounts or unmounts the **Ubuntu 22.04.5 live server ISO** using the
```bash
export IDRAC_IP=100.67.x.x
export IDRAC_USER=root
export IDRAC_PASS=calvin
export IDRAC_PASS=your-idrac-password
```
**Specify Custom ISO URL**

Expand Down Expand Up @@ -102,10 +102,10 @@ Example (terraform.tfvars):
```bash
idrac_endpoint = "https://100.67.x.x"
idrac_user = "root"
idrac_password = "calvin"
idrac_password = "your-idrac-password"
idrac_ssl_insecure = true
ubuntu_username = "user"
ubuntu_password = "password"
ubuntu_username = "your-username"
ubuntu_password = "your-password"
```

### Apply Terraform
Expand Down Expand Up @@ -136,8 +136,8 @@ chmod +x deploy-enterprise-inference.sh

```bash
sudo ./deploy-enterprise-inference.sh \
-u user \
-p Linux123! \
-u your-username \
-p your-password \
-t hf_xxxxxxxxxxxxx \
-g gaudi3 \
-a cluster-url \
Expand All @@ -149,7 +149,7 @@ sudo ./deploy-enterprise-inference.sh \
|--------|----------|----------|-------------|
| `-u, --username` | Yes (deploy & uninstall) | (none) | Enterprise Inference owner username. Must match the invoking (sudo) user. |
| `-t, --token` | Yes (deploy only) | (none) | Hugging Face access token used to validate and download selected models. |
| `-p, --password` | No | `Linux123!` | User sudo password used for Ansible become operations. |
| `-p, --password` | No | (none) | User sudo password used for Ansible become operations. |
| `-g, --gpu-type` | No | `gaudi3` | Deployment target type: `gaudi3` or `cpu`. |
| `-m, --models` | No | `""` (interactive mode) | Choose model ID from [Pre-Integrated Models List](#pre-integrated-models-list) , based on your deployment type (gaudi or cpu) . If not provided, deployment runs interactively. |
| `-b, --branch` | No | `release-1.4.0` | Git branch of the Enterprise-Inference repository to clone. |
Expand All @@ -167,8 +167,8 @@ sudo ./deploy-enterprise-inference.sh \
The deployment script is resume-safe. If a failure occurs, simply rerun the script with the -r flag:
```bash
sudo ./deploy-enterprise-inference.sh \
-u user \
-p Linux123! \
-u your-username \
-p your-password \
-t hf_XXXXXXXXXXXX \
-g gaudi3 \
-a cluster-url \
Expand Down Expand Up @@ -272,27 +272,41 @@ Expected:

### 5. API Health Check
Validate the inference gateway is reachable.

If deployed with **Keycloak (APISIX)**, first obtain a token:
```bash
curl -k https://api.example.com/health
source <path-to-Enterprise-Inference>/core/scripts/generate-token.sh
```
`TOKEN` will be set automatically.

If deployed with **GenAI Gateway**, use the litellm_master_key from Enterprise-Inference/core/inventory/metadata/vault.yml:
```bash
export TOKEN=<your-genai-api-key>
```

Then run the health check to get the number of healthy and unhealthy model endpoints:
```bash
curl -k -s -L https://api.example.com/health \
-H "Authorization: Bearer $TOKEN" | jq '{healthy: .healthy_count, unhealthy: .unhealthy_count}'
```
Expected:
{"status":"ok"}

---

### 6. Test Model Inference

if EI is deployed with apisix, follow [Testing EI model with apisix](../EI/single-node/user-guide-apisix.md#5-test-the-inference) for generating token and testing the inference
If EI is deployed with Keycloak/APISIX, follow [Testing EI model with Keycloak/APISIX](../EI/single-node/user-guide-apisix.md#5-test-the-inference) for generating token and testing the inference.

if EI is deployed with genai, follow [Testing EI model with genai](../EI/single-node/user-guide-genai.md#5-test-the-inference) for generating api-key and testing the inference
If EI is deployed with GenAI Gateway, follow [Testing EI model with GenAI Gateway](../EI/single-node/user-guide-genai.md#5-test-the-inference) for acquiring the API key and testing the inference.

---

## Additional Information

### Pre-Integrated Models List

Enterprise Inference provides a set of pre-integrated and validated models optimized for performance and stability. These models can be deployed directly using the Enterprise Inference catalog.
Enterprise Inference provides a set of pre-integrated and validated models optimized for performance and stability. These models can be deployed directly using the Enterprise Inference catalog.

> **Note**: this list is accurate as of `release-1.4.0`.

**Pre-Integrated Gaudi Models**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# Options:
# -u, --username Enterprise Inference owner username (required)
# -t, --token Hugging Face token (required)
# -p, --password User sudo password for Ansible (default: Linux123!)
# -p, --password User sudo password for Ansible
# -g, --gpu-type GPU type: 'gaudi3' or 'cpu' (default: gaudi3)
# -m, --models Model IDs to deploy, comma-separated (default: "5")
# -b, --branch Git branch to clone (default: dell-deploy)
Expand Down Expand Up @@ -212,7 +212,7 @@ Required Options (uninstall):
-u, --username Enterprise Inference owner username

Optional Options:
-p, --password User sudo password for Ansible (default: Linux123!)
-p, --password User sudo password for Ansible
-g, --gpu-type GPU type: 'gaudi3' or 'cpu' (default: gaudi3)
-m, --models Model IDs to deploy, comma-separated (default: empty)
-b, --branch Git branch to clone (default: dell-deploy)
Expand Down
1 change: 0 additions & 1 deletion third_party/Dell/ubuntu-22.04/iac/verify-installation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ echo ""
echo "4. Network connectivity check..."
echo " Note: Try to SSH to the server if you know the IP:"
echo " ssh user@<server-ip>"
echo " Password: Linux123!"

echo ""
echo "=========================================="
Expand Down