soutrik commited on
Commit
40bd351
1 Parent(s): 4c0bcea

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -277
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: My Gradio App Mnist Classifier
3
  emoji: 🚀
4
  colorFrom: blue
5
  colorTo: green
@@ -8,279 +8,3 @@ sdk_version: "5.7.1"
8
  app_file: app.py
9
  pinned: false
10
  ---
11
-
12
- # aws_ec2_automation
13
- Here’s a detailed explanation of the GitHub Actions (GHA) pipeline in **raw Markdown format**:
14
-
15
- ---
16
-
17
- # GitHub Actions Pipeline Documentation
18
-
19
- ## Name: Deploy PyTorch Training with EC2 Runner and Docker Compose
20
-
21
- This pipeline automates the following tasks:
22
- 1. Starts an EC2 instance as a self-hosted GitHub runner.
23
- 2. Deploys a PyTorch training pipeline using Docker Compose.
24
- 3. Builds, tags, and pushes Docker images to Amazon ECR.
25
- 4. Stops the EC2 instance after the job is completed.
26
-
27
- ---
28
-
29
- ### Workflow Triggers
30
-
31
- ```yaml
32
- on:
33
- push:
34
- branches:
35
- - main
36
- ```
37
-
38
- - **Trigger**: This workflow runs whenever a push is made to the `main` branch.
39
-
40
- ---
41
-
42
- ## Jobs Overview
43
-
44
- ### 1. **start-runner**
45
- Starts a self-hosted EC2 runner using the GitHub Actions Runner.
46
-
47
- #### Steps:
48
- 1. **Configure AWS Credentials**:
49
- ```yaml
50
- - name: Configure AWS credentials
51
- uses: aws-actions/configure-aws-credentials@v4
52
- with:
53
- aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
54
- aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
55
- aws-region: ${{ secrets.AWS_REGION }}
56
- ```
57
- - Authenticates with AWS using access keys and the region specified in the secrets.
58
- - Required for creating and managing the EC2 instance.
59
-
60
- 2. **Start EC2 Runner**:
61
- ```yaml
62
- - name: Start EC2 runner
63
- id: start-ec2-runner
64
- uses: machulav/ec2-github-runner@v2
65
- with:
66
- mode: start
67
- github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
68
- ec2-image-id: ami-044b0717aadbc9dfa
69
- ec2-instance-type: t2.xlarge
70
- subnet-id: subnet-024811dee81325f1c
71
- security-group-id: sg-0646c2a337a355a31
72
- ```
73
- - Starts an EC2 instance with the specified AMI, instance type, subnet, and security group.
74
- - Outputs:
75
- - `label`: A unique label for the EC2 runner.
76
- - `ec2-instance-id`: The ID of the created EC2 instance.
77
-
78
- ---
79
-
80
- ### 2. **deploy**
81
- Deploys the PyTorch training pipeline using the EC2 runner started in the previous step.
82
-
83
- #### Dependencies:
84
- ```yaml
85
- needs: start-runner
86
- runs-on: ${{ needs.start-runner.outputs.label }}
87
- ```
88
- - **Depends on** the `start-runner` job and runs on the newly created EC2 instance.
89
-
90
- #### Steps:
91
- 1. **Checkout Repository**:
92
- ```yaml
93
- - name: Checkout repository
94
- uses: actions/checkout@v4
95
- ```
96
- - Clones the current repository to the runner.
97
-
98
- 2. **Set Up Docker Buildx**:
99
- ```yaml
100
- - name: Set up Docker Buildx
101
- uses: docker/setup-buildx-action@v3
102
- ```
103
- - Configures Docker Buildx for building multi-platform Docker images.
104
-
105
- 3. **Configure AWS Credentials**:
106
- ```yaml
107
- - name: Configure AWS credentials
108
- uses: aws-actions/configure-aws-credentials@v4
109
- with:
110
- aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
111
- aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
112
- aws-region: ${{ secrets.AWS_REGION }}
113
- ```
114
- - Reconfigures AWS credentials for Docker ECR authentication and resource management.
115
-
116
- 4. **Log in to Amazon ECR**:
117
- ```yaml
118
- - name: Log in to Amazon ECR
119
- id: login-ecr
120
- uses: aws-actions/amazon-ecr-login@v2
121
- ```
122
- - Logs into Amazon ECR for pushing and pulling Docker images.
123
-
124
- 5. **Create `.env` File**:
125
- ```yaml
126
- - name: Create .env file
127
- run: |
128
- echo "AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }}" >> .env
129
- echo "AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }}" >> .env
130
- echo "AWS_REGION=${{ secrets.AWS_REGION }}" >> .env
131
- ```
132
- - Generates a `.env` file for the application with AWS credentials and region.
133
-
134
- 6. **Run Docker Compose for Train and Eval Services**:
135
- ```yaml
136
- - name: Run Docker Compose for train and eval service
137
- run: |
138
- docker-compose build
139
- docker-compose up --build
140
- docker-compose logs --follow
141
- docker-compose down --remove-orphans
142
- ```
143
- - **Build**: Builds all services defined in the `docker-compose.yml` file.
144
- - **Up**: Runs all services, including training and evaluation.
145
- - **Logs**: Outputs logs for debugging purposes.
146
- - **Down**: Stops all services and removes orphaned containers.
147
-
148
- 7. **Build, Tag, and Push Docker Image to Amazon ECR**:
149
- ```yaml
150
- - name: Build, tag, and push Docker image to Amazon ECR
151
- env:
152
- REGISTRY: ${{ steps.login-ecr.outputs.registry }}
153
- REPOSITORY: soutrik71/mnist
154
- IMAGE_TAG: ${{ github.sha }}
155
- run: |
156
- docker build -t $REGISTRY/$REPOSITORY:$IMAGE_TAG .
157
- docker push $REGISTRY/$REPOSITORY:$IMAGE_TAG
158
- docker tag $REGISTRY/$REPOSITORY:$IMAGE_TAG $REGISTRY/$REPOSITORY:latest
159
- docker push $REGISTRY/$REPOSITORY:latest
160
- ```
161
- - **Build**: Creates a Docker image with the repository and tag.
162
- - **Push**: Pushes the image to Amazon ECR.
163
- - **Tag**: Updates the `latest` tag.
164
-
165
- 8. **Pull and Verify Docker Image from ECR**:
166
- ```yaml
167
- - name: Pull Docker image from ECR and verify
168
- env:
169
- REGISTRY: ${{ steps.login-ecr.outputs.registry }}
170
- REPOSITORY: soutrik71/mnist
171
- IMAGE_TAG: ${{ github.sha }}
172
- run: |
173
- docker pull $REGISTRY/$REPOSITORY:$IMAGE_TAG
174
- docker images | grep "$REGISTRY/$REPOSITORY"
175
- ```
176
- - **Pull**: Pulls the built image from ECR.
177
- - **Verify**: Ensures the image exists locally.
178
-
179
- 9. **Clean Up Environment**:
180
- ```yaml
181
- - name: Clean up environment
182
- run: |
183
- rm -f .env
184
- docker system prune -af
185
- ```
186
- - Deletes the `.env` file and removes unused Docker resources.
187
-
188
- ---
189
-
190
- ### 3. **stop-runner**
191
- Stops and terminates the EC2 runner created in the `start-runner` job.
192
-
193
- #### Dependencies:
194
- ```yaml
195
- needs:
196
- - start-runner
197
- - deploy
198
- ```
199
-
200
- #### Steps:
201
- 1. **Configure AWS Credentials**:
202
- ```yaml
203
- - name: Configure AWS credentials
204
- uses: aws-actions/configure-aws-credentials@v4
205
- with:
206
- aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
207
- aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
208
- aws-region: ${{ secrets.AWS_REGION }}
209
- ```
210
-
211
- 2. **Stop EC2 Runner**:
212
- ```yaml
213
- - name: Stop EC2 runner
214
- uses: machulav/ec2-github-runner@v2
215
- with:
216
- mode: stop
217
- github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
218
- label: ${{ needs.start-runner.outputs.label }}
219
- ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }}
220
- ```
221
- - Stops the EC2 runner instance created in the first job.
222
-
223
- 3. **Validate EC2 Termination**:
224
- ```yaml
225
- - name: Validate EC2 termination
226
- run: aws ec2 describe-instances --instance-ids ${{ needs.start-runner.outputs.ec2-instance-id }}
227
- ```
228
- - Ensures the EC2 instance has been properly terminated.
229
-
230
- ---
231
-
232
- ### Key Highlights
233
- 1. **Sequential Execution**:
234
- - The `start-runner`, `deploy`, and `stop-runner` jobs are executed sequentially.
235
-
236
- 2. **Error Handling**:
237
- - The `stop-runner` job runs even if previous jobs fail (`if: ${{ always() }}`).
238
-
239
- 3. **Efficiency**:
240
- - Docker layer caching speeds up builds.
241
- - Cleanup steps maintain a clean environment.
242
-
243
- 4. **Security**:
244
- - Secrets are masked and removed after use.
245
- - Proper resource cleanup ensures cost efficiency.
246
-
247
- ---
248
-
249
- This pipeline ensures robust deployment with error handling, logging, and cleanup mechanisms. So far we have discussed the GitHub Actions pipeline , the basic structure of the pipeline, and the steps involved in the pipeline.
250
- Next we will have an interdependent pipeline where the output of one job will be used as input for the next job.
251
-
252
- ---
253
- ## Advanced Pipeline with
254
- * Sequential Flow: Each job has clear dependencies, ensuring no step runs out of order.
255
- * Code Checkout: Explicit repository checkout in each job ensures consistent source code.
256
- * Secure Credential Handling: Sensitive credentials are masked and stored securely.
257
- * Resource Cleanup: Includes Docker clean-up and EC2 instance termination validation.
258
- * Logging: Added detailed logs to improve debugging and monitoring.
259
-
260
-
261
- Step 1: Start EC2 Runner
262
- Purpose: Initializes a self-hosted EC2 runner for running subsequent jobs.
263
- Key Actions:
264
- Configures AWS credentials.
265
- Launches an EC2 instance using specified AMI, instance type, and networking configurations.
266
- Outputs the runner label and instance ID for downstream jobs.
267
- Step 2: Test PyTorch Code Using Docker Compose
268
- Purpose: Tests the PyTorch training and evaluation services.
269
- Key Actions:
270
- Checks out the repository.
271
- Sets up Docker Buildx for advanced build capabilities.
272
- Configures AWS credentials and creates a masked .env file for secure credential sharing.
273
- Runs all services (train, eval) using Docker Compose, monitors logs, and cleans up containers.
274
- Step 3: Build, Tag, and Push Docker Image
275
- Purpose: Builds a Docker image, tags it, and pushes it to Amazon ECR after successful tests.
276
- Key Actions:
277
- Checks out the repository again to ensure consistency.
278
- Logs into Amazon ECR using AWS credentials.
279
- Builds and tags the Docker image with latest and SHA-based tags.
280
- Pushes the image to Amazon ECR and verifies by pulling it back.
281
- Step 4: Stop and Delete EC2 Runner
282
- Purpose: Stops and terminates the EC2 instance to ensure cost efficiency and cleanup.
283
- Key Actions:
284
- Configures AWS credentials.
285
- Stops the EC2 instance using the label and instance ID from start-runner.
286
- Validates the termination state of the EC2 instance to ensure proper cleanup.
 
1
  ---
2
+ title: My Gradio App MNIST Classifier
3
  emoji: 🚀
4
  colorFrom: blue
5
  colorTo: green
 
8
  app_file: app.py
9
  pinned: false
10
  ---