File size: 9,269 Bytes
819c358 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
## Instructions to run end-to-end demo
## Chapters
[I. Installation of KServe & its dependencies](#installation-of-kserve--its-dependencies)
[II. Setting up local MinIO S3 storage](#setting-up-local-minio-s3-storage)
[III. Setting up your OpenShift AI workbench](#setting-up-your-openshift-ai-workbench)
[IV. Train model and evaluate](#train-model-and-evaluate)
[V. Convert model to Caikit format and save to S3 storage](#convert-model-to-caikit-format-and-save-to-s3-storage)
[V. Deploy model onto Caikit-TGIS Serving Runtime](#deploy-model-onto-caikit-tgis-serving-runtime)
[VI. Model inference](#model-inference)
**Prerequisites**
* To support training and inference, your cluster needs a node with CPUS, 4 GPUs, and GB memory. Instructions to add GPU support to RHOAI can be found [here](https://docs.google.com/document/d/1T2oc-KZRMboUVuUSGDZnt3VRZ5s885aDRJGYGMkn_Wo/edit#heading=h.9xmhoufikqid).
* You have a cluster administrator permissions
* You have installed the OpenShift CLI (`oc`)
* You have installed the `Red Hat OpenShift Service Mesh Operator`
* You have installed the `Red Hat OpenShift Serverless Operator`
* You have installed the `Red Hat OpenShift AI Operator` and created a **DataScienceCluster** object
### Installation of KServe & its dependencies
Instructions adapted from [Manually installing KServe](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2-latest/html/serving_models/serving-large-models_serving-large-models#manually-installing-kserve_serving-large-models)
1. Git clone this repository
```
git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git
```
2. Login to your OpenShift cluster as a cluster adminstrator
```
oc login --token=<token>
```
2. Create the required namespace for Red Hat OpenShift Service Mesh
```
oc create ns istio-system
```
3. Create a `ServiceMeshControlPlane` object
```
oc apply -f manifests/kserve/smcp.yaml -n istio-system
```
4. Sanity check to verify creation of the service mesh instance
```
oc get pods -n istio-system
```
Expected output:
```
NAME READY STATUS RESTARTS AGE
istio-egressgateway-7c46668687-fzsqj 1/1 Running 0 22h
istio-ingressgateway-77f94d8f85-fhsp9 1/1 Running 0 22h
istiod-data-science-smcp-cc8cfd9b8-2rkg4 1/1 Running 0 22h
```
5. Create the required namespace for a `KnativeServing` instance
```
oc create ns knative-serving
```
6. Create a `ServiceMeshMember` object
```
oc apply -f manifests/kserve/default-smm.yaml -n knative-serving
```
7. Create and define a `KnativeServing` object
```
oc apply -f manifests/kserve/knativeserving-istio.yaml -n knative-serving
```
8. Sanity check to validate creation of the Knative Serving instance
```
oc get pods -n knative-serving
```
Expected output:
```
NAME READY STATUS RESTARTS AGE
activator-7586f6f744-nvdlb 2/2 Running 0 22h
activator-7586f6f744-sd77w 2/2 Running 0 22h
autoscaler-764fdf5d45-p2v98 2/2 Running 0 22h
autoscaler-764fdf5d45-x7dc6 2/2 Running 0 22h
autoscaler-hpa-7c7c4cd96d-2lkzg 1/1 Running 0 22h
autoscaler-hpa-7c7c4cd96d-gks9j 1/1 Running 0 22h
controller-5fdfc9567c-6cj9d 1/1 Running 0 22h
controller-5fdfc9567c-bf5x7 1/1 Running 0 22h
domain-mapping-56ccd85968-2hjvp 1/1 Running 0 22h
domain-mapping-56ccd85968-lg6mw 1/1 Running 0 22h
domainmapping-webhook-769b88695c-gp2hk 1/1 Running 0 22h
domainmapping-webhook-769b88695c-npn8g 1/1 Running 0 22h
net-istio-controller-7dfc6f668c-jb4xk 1/1 Running 0 22h
net-istio-controller-7dfc6f668c-jxs5p 1/1 Running 0 22h
net-istio-webhook-66d8f75d6f-bgd5r 1/1 Running 0 22h
net-istio-webhook-66d8f75d6f-hld75 1/1 Running 0 22h
webhook-7d49878bc4-8xjbr 1/1 Running 0 22h
webhook-7d49878bc4-s4xx4 1/1 Running 0 22h
```
9. From the web console, install KServe by going to **Operators -> Installed Operators** and click on the **Red Hat OpenShift AI Operator**
10. Click on the **DSC Intialization** tab and click on the **default-dsci** object
11. Click on the **YAML** tab and in the `spec` section, change the `serviceMesh.managementState` to `Unmanaged`
```
spec:
serviceMesh:
managementState: Unmanaged
```
12. Click **Save**
12. Click on the **Data Science Cluster** tab and click on the **default-dsc** object
13. Click on the **YAML** tab and in the `spec` section, change the `components.kserve.managementState` and the `components.kserve.serving.managementState` to `Managed`
```
spec:
components:
kserve:
managementState: Managed
serving:
managementState: Managed
```
15. Click **Save**
### Setting up local MinIO S3 storage
1. Create a namespace for your project called "detoxify-sft"
```
oc create namespace detoxify-sft
```
2. Set up your local MinIO S3 storage in your newly created namespace
```
oc apply -f manifests/minio/setup-s3.yaml -n detoxify-sft
```
3. Run the following sanity checks
```
oc get pods -n detoxify-sft | grep "minio"
```
Expected output:
```
NAME READY STATUS RESTARTS AGE
minio-7586f6f744-nvdl 1/1 Running 0 22h
```
```
oc get route -n detoxify-sft | grep "minio"
```
Expected output:
```
NAME STATUS LOCATION SERVICE
minio-api Accepted https://minio-api... minio-service
minio-ui Accepted https://minio-ui... minio-service
```
4. Get the MinIO UI location URL and open it in a web browser
```
oc get route minio-ui -n detoxify-sft
```
5. Login using the credentials in `manifests/minio/setup-s3.yaml`
**user**: `minio`
**password**: `minio123`
6. Click on **Create a Bucket** and choose a name for your bucket and click on **Create Bucket**
### Setting up your OpenShift AI workbench
1. Go to Red Hat OpenShift AI from the web console
2. Click on **Data Science Projects** and then click on **Create data science project**
3. Give your project a name and then click **Create**
4. Click on the **Workbenches** tab and then create a workbench with a Pytorch notebook image, set the container size to Large, and select a single NVIDIA GPU. Click on **Create Workbench**
5. Click on **Add data connection** to create a matching data connection for MinIO
6. Fill out the required fields and then click on **Add data collection**
7. Once your workbench status changes from **Starting** to **Running**, click on **Open** to open JupyterHub in a web browser
8. In your JupyterHub environment, launch a terminal and clone this project
```
git clone https://github.com/trustyai-explainability/trustyai-detoxify-sft.git
```
8. Go into the `notebooks` directory
### Train model and evaluate
1. Open the `01-sft.ipynb` file
2. Run each cell in the notebook
3. Once the model trained and uploaded to HuggingFace Hub, open the `02-eval.ipynb` file and run each cell to compare the model trained on raw input-output pairs vs. the one trained on detoxified prompts
### Convert model to Caikit format and save to S3 storage
1. Open the `03-save_convert_model.ipynb` and run each cell in the notebook to convert the model Caikit format and save it to a MinIO bucket
### Deploy model onto Caikit-TGIS Serving Runtime
1. In the OpenShift AI dashboard, navigate to the project details page and click the **Models** tab
2. In the **Single-model serving platform** tile, click on deploy model. Provide the following values:
**Model Name**: `opt-350m-caikit`
**Serving Runtime**: `Caikit-TGIS Serving Runtime`
**Model framework**: `caikit`
**Existing data connection**: `My Storage`
**Path**: `models/opt-350m-caikit`
3. Click **Deploy**
4. Increase the `initialDelaySeconds`
```
oc patch template caikit-tgis-serving-template --type=='merge' -p '{"spec":{"containers":[{"readinessProbe":"initialDelaySeconds":300, "livenessProbe":"initialDelaySeconds":300}]}}'
```
5. Wait for the model **Status** to show a green checkmark
### Model inference
1. Return to the JupyterHub environment to test out the deployed model
2. Click on `03-inference_request.ipynb` and run each cell to make an inference request to the detoxified model
|