soutrik
commited on
Commit
•
40bd351
1
Parent(s):
4c0bcea
Add README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
title: My Gradio App
|
3 |
emoji: 🚀
|
4 |
colorFrom: blue
|
5 |
colorTo: green
|
@@ -8,279 +8,3 @@ sdk_version: "5.7.1"
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
11 |
-
|
12 |
-
# aws_ec2_automation
|
13 |
-
Here’s a detailed explanation of the GitHub Actions (GHA) pipeline in **raw Markdown format**:
|
14 |
-
|
15 |
-
---
|
16 |
-
|
17 |
-
# GitHub Actions Pipeline Documentation
|
18 |
-
|
19 |
-
## Name: Deploy PyTorch Training with EC2 Runner and Docker Compose
|
20 |
-
|
21 |
-
This pipeline automates the following tasks:
|
22 |
-
1. Starts an EC2 instance as a self-hosted GitHub runner.
|
23 |
-
2. Deploys a PyTorch training pipeline using Docker Compose.
|
24 |
-
3. Builds, tags, and pushes Docker images to Amazon ECR.
|
25 |
-
4. Stops the EC2 instance after the job is completed.
|
26 |
-
|
27 |
-
---
|
28 |
-
|
29 |
-
### Workflow Triggers
|
30 |
-
|
31 |
-
```yaml
|
32 |
-
on:
|
33 |
-
push:
|
34 |
-
branches:
|
35 |
-
- main
|
36 |
-
```
|
37 |
-
|
38 |
-
- **Trigger**: This workflow runs whenever a push is made to the `main` branch.
|
39 |
-
|
40 |
-
---
|
41 |
-
|
42 |
-
## Jobs Overview
|
43 |
-
|
44 |
-
### 1. **start-runner**
|
45 |
-
Starts a self-hosted EC2 runner using the GitHub Actions Runner.
|
46 |
-
|
47 |
-
#### Steps:
|
48 |
-
1. **Configure AWS Credentials**:
|
49 |
-
```yaml
|
50 |
-
- name: Configure AWS credentials
|
51 |
-
uses: aws-actions/configure-aws-credentials@v4
|
52 |
-
with:
|
53 |
-
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
54 |
-
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
55 |
-
aws-region: ${{ secrets.AWS_REGION }}
|
56 |
-
```
|
57 |
-
- Authenticates with AWS using access keys and the region specified in the secrets.
|
58 |
-
- Required for creating and managing the EC2 instance.
|
59 |
-
|
60 |
-
2. **Start EC2 Runner**:
|
61 |
-
```yaml
|
62 |
-
- name: Start EC2 runner
|
63 |
-
id: start-ec2-runner
|
64 |
-
uses: machulav/ec2-github-runner@v2
|
65 |
-
with:
|
66 |
-
mode: start
|
67 |
-
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
|
68 |
-
ec2-image-id: ami-044b0717aadbc9dfa
|
69 |
-
ec2-instance-type: t2.xlarge
|
70 |
-
subnet-id: subnet-024811dee81325f1c
|
71 |
-
security-group-id: sg-0646c2a337a355a31
|
72 |
-
```
|
73 |
-
- Starts an EC2 instance with the specified AMI, instance type, subnet, and security group.
|
74 |
-
- Outputs:
|
75 |
-
- `label`: A unique label for the EC2 runner.
|
76 |
-
- `ec2-instance-id`: The ID of the created EC2 instance.
|
77 |
-
|
78 |
-
---
|
79 |
-
|
80 |
-
### 2. **deploy**
|
81 |
-
Deploys the PyTorch training pipeline using the EC2 runner started in the previous step.
|
82 |
-
|
83 |
-
#### Dependencies:
|
84 |
-
```yaml
|
85 |
-
needs: start-runner
|
86 |
-
runs-on: ${{ needs.start-runner.outputs.label }}
|
87 |
-
```
|
88 |
-
- **Depends on** the `start-runner` job and runs on the newly created EC2 instance.
|
89 |
-
|
90 |
-
#### Steps:
|
91 |
-
1. **Checkout Repository**:
|
92 |
-
```yaml
|
93 |
-
- name: Checkout repository
|
94 |
-
uses: actions/checkout@v4
|
95 |
-
```
|
96 |
-
- Clones the current repository to the runner.
|
97 |
-
|
98 |
-
2. **Set Up Docker Buildx**:
|
99 |
-
```yaml
|
100 |
-
- name: Set up Docker Buildx
|
101 |
-
uses: docker/setup-buildx-action@v3
|
102 |
-
```
|
103 |
-
- Configures Docker Buildx for building multi-platform Docker images.
|
104 |
-
|
105 |
-
3. **Configure AWS Credentials**:
|
106 |
-
```yaml
|
107 |
-
- name: Configure AWS credentials
|
108 |
-
uses: aws-actions/configure-aws-credentials@v4
|
109 |
-
with:
|
110 |
-
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
111 |
-
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
112 |
-
aws-region: ${{ secrets.AWS_REGION }}
|
113 |
-
```
|
114 |
-
- Reconfigures AWS credentials for Docker ECR authentication and resource management.
|
115 |
-
|
116 |
-
4. **Log in to Amazon ECR**:
|
117 |
-
```yaml
|
118 |
-
- name: Log in to Amazon ECR
|
119 |
-
id: login-ecr
|
120 |
-
uses: aws-actions/amazon-ecr-login@v2
|
121 |
-
```
|
122 |
-
- Logs into Amazon ECR for pushing and pulling Docker images.
|
123 |
-
|
124 |
-
5. **Create `.env` File**:
|
125 |
-
```yaml
|
126 |
-
- name: Create .env file
|
127 |
-
run: |
|
128 |
-
echo "AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }}" >> .env
|
129 |
-
echo "AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }}" >> .env
|
130 |
-
echo "AWS_REGION=${{ secrets.AWS_REGION }}" >> .env
|
131 |
-
```
|
132 |
-
- Generates a `.env` file for the application with AWS credentials and region.
|
133 |
-
|
134 |
-
6. **Run Docker Compose for Train and Eval Services**:
|
135 |
-
```yaml
|
136 |
-
- name: Run Docker Compose for train and eval service
|
137 |
-
run: |
|
138 |
-
docker-compose build
|
139 |
-
docker-compose up --build
|
140 |
-
docker-compose logs --follow
|
141 |
-
docker-compose down --remove-orphans
|
142 |
-
```
|
143 |
-
- **Build**: Builds all services defined in the `docker-compose.yml` file.
|
144 |
-
- **Up**: Runs all services, including training and evaluation.
|
145 |
-
- **Logs**: Outputs logs for debugging purposes.
|
146 |
-
- **Down**: Stops all services and removes orphaned containers.
|
147 |
-
|
148 |
-
7. **Build, Tag, and Push Docker Image to Amazon ECR**:
|
149 |
-
```yaml
|
150 |
-
- name: Build, tag, and push Docker image to Amazon ECR
|
151 |
-
env:
|
152 |
-
REGISTRY: ${{ steps.login-ecr.outputs.registry }}
|
153 |
-
REPOSITORY: soutrik71/mnist
|
154 |
-
IMAGE_TAG: ${{ github.sha }}
|
155 |
-
run: |
|
156 |
-
docker build -t $REGISTRY/$REPOSITORY:$IMAGE_TAG .
|
157 |
-
docker push $REGISTRY/$REPOSITORY:$IMAGE_TAG
|
158 |
-
docker tag $REGISTRY/$REPOSITORY:$IMAGE_TAG $REGISTRY/$REPOSITORY:latest
|
159 |
-
docker push $REGISTRY/$REPOSITORY:latest
|
160 |
-
```
|
161 |
-
- **Build**: Creates a Docker image with the repository and tag.
|
162 |
-
- **Push**: Pushes the image to Amazon ECR.
|
163 |
-
- **Tag**: Updates the `latest` tag.
|
164 |
-
|
165 |
-
8. **Pull and Verify Docker Image from ECR**:
|
166 |
-
```yaml
|
167 |
-
- name: Pull Docker image from ECR and verify
|
168 |
-
env:
|
169 |
-
REGISTRY: ${{ steps.login-ecr.outputs.registry }}
|
170 |
-
REPOSITORY: soutrik71/mnist
|
171 |
-
IMAGE_TAG: ${{ github.sha }}
|
172 |
-
run: |
|
173 |
-
docker pull $REGISTRY/$REPOSITORY:$IMAGE_TAG
|
174 |
-
docker images | grep "$REGISTRY/$REPOSITORY"
|
175 |
-
```
|
176 |
-
- **Pull**: Pulls the built image from ECR.
|
177 |
-
- **Verify**: Ensures the image exists locally.
|
178 |
-
|
179 |
-
9. **Clean Up Environment**:
|
180 |
-
```yaml
|
181 |
-
- name: Clean up environment
|
182 |
-
run: |
|
183 |
-
rm -f .env
|
184 |
-
docker system prune -af
|
185 |
-
```
|
186 |
-
- Deletes the `.env` file and removes unused Docker resources.
|
187 |
-
|
188 |
-
---
|
189 |
-
|
190 |
-
### 3. **stop-runner**
|
191 |
-
Stops and terminates the EC2 runner created in the `start-runner` job.
|
192 |
-
|
193 |
-
#### Dependencies:
|
194 |
-
```yaml
|
195 |
-
needs:
|
196 |
-
- start-runner
|
197 |
-
- deploy
|
198 |
-
```
|
199 |
-
|
200 |
-
#### Steps:
|
201 |
-
1. **Configure AWS Credentials**:
|
202 |
-
```yaml
|
203 |
-
- name: Configure AWS credentials
|
204 |
-
uses: aws-actions/configure-aws-credentials@v4
|
205 |
-
with:
|
206 |
-
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
207 |
-
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
208 |
-
aws-region: ${{ secrets.AWS_REGION }}
|
209 |
-
```
|
210 |
-
|
211 |
-
2. **Stop EC2 Runner**:
|
212 |
-
```yaml
|
213 |
-
- name: Stop EC2 runner
|
214 |
-
uses: machulav/ec2-github-runner@v2
|
215 |
-
with:
|
216 |
-
mode: stop
|
217 |
-
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
|
218 |
-
label: ${{ needs.start-runner.outputs.label }}
|
219 |
-
ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }}
|
220 |
-
```
|
221 |
-
- Stops the EC2 runner instance created in the first job.
|
222 |
-
|
223 |
-
3. **Validate EC2 Termination**:
|
224 |
-
```yaml
|
225 |
-
- name: Validate EC2 termination
|
226 |
-
run: aws ec2 describe-instances --instance-ids ${{ needs.start-runner.outputs.ec2-instance-id }}
|
227 |
-
```
|
228 |
-
- Ensures the EC2 instance has been properly terminated.
|
229 |
-
|
230 |
-
---
|
231 |
-
|
232 |
-
### Key Highlights
|
233 |
-
1. **Sequential Execution**:
|
234 |
-
- The `start-runner`, `deploy`, and `stop-runner` jobs are executed sequentially.
|
235 |
-
|
236 |
-
2. **Error Handling**:
|
237 |
-
- The `stop-runner` job runs even if previous jobs fail (`if: ${{ always() }}`).
|
238 |
-
|
239 |
-
3. **Efficiency**:
|
240 |
-
- Docker layer caching speeds up builds.
|
241 |
-
- Cleanup steps maintain a clean environment.
|
242 |
-
|
243 |
-
4. **Security**:
|
244 |
-
- Secrets are masked and removed after use.
|
245 |
-
- Proper resource cleanup ensures cost efficiency.
|
246 |
-
|
247 |
-
---
|
248 |
-
|
249 |
-
This pipeline ensures robust deployment with error handling, logging, and cleanup mechanisms. So far we have discussed the GitHub Actions pipeline , the basic structure of the pipeline, and the steps involved in the pipeline.
|
250 |
-
Next we will have an interdependent pipeline where the output of one job will be used as input for the next job.
|
251 |
-
|
252 |
-
---
|
253 |
-
## Advanced Pipeline with
|
254 |
-
* Sequential Flow: Each job has clear dependencies, ensuring no step runs out of order.
|
255 |
-
* Code Checkout: Explicit repository checkout in each job ensures consistent source code.
|
256 |
-
* Secure Credential Handling: Sensitive credentials are masked and stored securely.
|
257 |
-
* Resource Cleanup: Includes Docker clean-up and EC2 instance termination validation.
|
258 |
-
* Logging: Added detailed logs to improve debugging and monitoring.
|
259 |
-
|
260 |
-
|
261 |
-
Step 1: Start EC2 Runner
|
262 |
-
Purpose: Initializes a self-hosted EC2 runner for running subsequent jobs.
|
263 |
-
Key Actions:
|
264 |
-
Configures AWS credentials.
|
265 |
-
Launches an EC2 instance using specified AMI, instance type, and networking configurations.
|
266 |
-
Outputs the runner label and instance ID for downstream jobs.
|
267 |
-
Step 2: Test PyTorch Code Using Docker Compose
|
268 |
-
Purpose: Tests the PyTorch training and evaluation services.
|
269 |
-
Key Actions:
|
270 |
-
Checks out the repository.
|
271 |
-
Sets up Docker Buildx for advanced build capabilities.
|
272 |
-
Configures AWS credentials and creates a masked .env file for secure credential sharing.
|
273 |
-
Runs all services (train, eval) using Docker Compose, monitors logs, and cleans up containers.
|
274 |
-
Step 3: Build, Tag, and Push Docker Image
|
275 |
-
Purpose: Builds a Docker image, tags it, and pushes it to Amazon ECR after successful tests.
|
276 |
-
Key Actions:
|
277 |
-
Checks out the repository again to ensure consistency.
|
278 |
-
Logs into Amazon ECR using AWS credentials.
|
279 |
-
Builds and tags the Docker image with latest and SHA-based tags.
|
280 |
-
Pushes the image to Amazon ECR and verifies by pulling it back.
|
281 |
-
Step 4: Stop and Delete EC2 Runner
|
282 |
-
Purpose: Stops and terminates the EC2 instance to ensure cost efficiency and cleanup.
|
283 |
-
Key Actions:
|
284 |
-
Configures AWS credentials.
|
285 |
-
Stops the EC2 instance using the label and instance ID from start-runner.
|
286 |
-
Validates the termination state of the EC2 instance to ensure proper cleanup.
|
|
|
1 |
---
|
2 |
+
title: My Gradio App MNIST Classifier
|
3 |
emoji: 🚀
|
4 |
colorFrom: blue
|
5 |
colorTo: green
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|