Spaces:
Runtime error
Runtime error
title: My Gradio App Mnist Classifier | |
emoji: 🚀 | |
colorFrom: blue | |
colorTo: green | |
sdk: gradio | |
sdk_version: "5.7.1" | |
app_file: app.py | |
pinned: false | |
# aws_ec2_automation | |
Here’s a detailed explanation of the GitHub Actions (GHA) pipeline in **raw Markdown format**: | |
--- | |
# GitHub Actions Pipeline Documentation | |
## Name: Deploy PyTorch Training with EC2 Runner and Docker Compose | |
This pipeline automates the following tasks: | |
1. Starts an EC2 instance as a self-hosted GitHub runner. | |
2. Deploys a PyTorch training pipeline using Docker Compose. | |
3. Builds, tags, and pushes Docker images to Amazon ECR. | |
4. Stops the EC2 instance after the job is completed. | |
--- | |
### Workflow Triggers | |
```yaml | |
on: | |
push: | |
branches: | |
- main | |
``` | |
- **Trigger**: This workflow runs whenever a push is made to the `main` branch. | |
--- | |
## Jobs Overview | |
### 1. **start-runner** | |
Starts a self-hosted EC2 runner using the GitHub Actions Runner. | |
#### Steps: | |
1. **Configure AWS Credentials**: | |
```yaml | |
- name: Configure AWS credentials | |
uses: aws-actions/configure-aws-credentials@v4 | |
with: | |
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | |
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | |
aws-region: ${{ secrets.AWS_REGION }} | |
``` | |
- Authenticates with AWS using access keys and the region specified in the secrets. | |
- Required for creating and managing the EC2 instance. | |
2. **Start EC2 Runner**: | |
```yaml | |
- name: Start EC2 runner | |
id: start-ec2-runner | |
uses: machulav/ec2-github-runner@v2 | |
with: | |
mode: start | |
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }} | |
ec2-image-id: ami-044b0717aadbc9dfa | |
ec2-instance-type: t2.xlarge | |
subnet-id: subnet-024811dee81325f1c | |
security-group-id: sg-0646c2a337a355a31 | |
``` | |
- Starts an EC2 instance with the specified AMI, instance type, subnet, and security group. | |
- Outputs: | |
- `label`: A unique label for the EC2 runner. | |
- `ec2-instance-id`: The ID of the created EC2 instance. | |
--- | |
### 2. **deploy** | |
Deploys the PyTorch training pipeline using the EC2 runner started in the previous step. | |
#### Dependencies: | |
```yaml | |
needs: start-runner | |
runs-on: ${{ needs.start-runner.outputs.label }} | |
``` | |
- **Depends on** the `start-runner` job and runs on the newly created EC2 instance. | |
#### Steps: | |
1. **Checkout Repository**: | |
```yaml | |
- name: Checkout repository | |
uses: actions/checkout@v4 | |
``` | |
- Clones the current repository to the runner. | |
2. **Set Up Docker Buildx**: | |
```yaml | |
- name: Set up Docker Buildx | |
uses: docker/setup-buildx-action@v3 | |
``` | |
- Configures Docker Buildx for building multi-platform Docker images. | |
3. **Configure AWS Credentials**: | |
```yaml | |
- name: Configure AWS credentials | |
uses: aws-actions/configure-aws-credentials@v4 | |
with: | |
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | |
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | |
aws-region: ${{ secrets.AWS_REGION }} | |
``` | |
- Reconfigures AWS credentials for Docker ECR authentication and resource management. | |
4. **Log in to Amazon ECR**: | |
```yaml | |
- name: Log in to Amazon ECR | |
id: login-ecr | |
uses: aws-actions/amazon-ecr-login@v2 | |
``` | |
- Logs into Amazon ECR for pushing and pulling Docker images. | |
5. **Create `.env` File**: | |
```yaml | |
- name: Create .env file | |
run: | | |
echo "AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }}" >> .env | |
echo "AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }}" >> .env | |
echo "AWS_REGION=${{ secrets.AWS_REGION }}" >> .env | |
``` | |
- Generates a `.env` file for the application with AWS credentials and region. | |
6. **Run Docker Compose for Train and Eval Services**: | |
```yaml | |
- name: Run Docker Compose for train and eval service | |
run: | | |
docker-compose build | |
docker-compose up --build | |
docker-compose logs --follow | |
docker-compose down --remove-orphans | |
``` | |
- **Build**: Builds all services defined in the `docker-compose.yml` file. | |
- **Up**: Runs all services, including training and evaluation. | |
- **Logs**: Outputs logs for debugging purposes. | |
- **Down**: Stops all services and removes orphaned containers. | |
7. **Build, Tag, and Push Docker Image to Amazon ECR**: | |
```yaml | |
- name: Build, tag, and push Docker image to Amazon ECR | |
env: | |
REGISTRY: ${{ steps.login-ecr.outputs.registry }} | |
REPOSITORY: soutrik71/mnist | |
IMAGE_TAG: ${{ github.sha }} | |
run: | | |
docker build -t $REGISTRY/$REPOSITORY:$IMAGE_TAG . | |
docker push $REGISTRY/$REPOSITORY:$IMAGE_TAG | |
docker tag $REGISTRY/$REPOSITORY:$IMAGE_TAG $REGISTRY/$REPOSITORY:latest | |
docker push $REGISTRY/$REPOSITORY:latest | |
``` | |
- **Build**: Creates a Docker image with the repository and tag. | |
- **Push**: Pushes the image to Amazon ECR. | |
- **Tag**: Updates the `latest` tag. | |
8. **Pull and Verify Docker Image from ECR**: | |
```yaml | |
- name: Pull Docker image from ECR and verify | |
env: | |
REGISTRY: ${{ steps.login-ecr.outputs.registry }} | |
REPOSITORY: soutrik71/mnist | |
IMAGE_TAG: ${{ github.sha }} | |
run: | | |
docker pull $REGISTRY/$REPOSITORY:$IMAGE_TAG | |
docker images | grep "$REGISTRY/$REPOSITORY" | |
``` | |
- **Pull**: Pulls the built image from ECR. | |
- **Verify**: Ensures the image exists locally. | |
9. **Clean Up Environment**: | |
```yaml | |
- name: Clean up environment | |
run: | | |
rm -f .env | |
docker system prune -af | |
``` | |
- Deletes the `.env` file and removes unused Docker resources. | |
--- | |
### 3. **stop-runner** | |
Stops and terminates the EC2 runner created in the `start-runner` job. | |
#### Dependencies: | |
```yaml | |
needs: | |
- start-runner | |
- deploy | |
``` | |
#### Steps: | |
1. **Configure AWS Credentials**: | |
```yaml | |
- name: Configure AWS credentials | |
uses: aws-actions/configure-aws-credentials@v4 | |
with: | |
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} | |
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} | |
aws-region: ${{ secrets.AWS_REGION }} | |
``` | |
2. **Stop EC2 Runner**: | |
```yaml | |
- name: Stop EC2 runner | |
uses: machulav/ec2-github-runner@v2 | |
with: | |
mode: stop | |
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }} | |
label: ${{ needs.start-runner.outputs.label }} | |
ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }} | |
``` | |
- Stops the EC2 runner instance created in the first job. | |
3. **Validate EC2 Termination**: | |
```yaml | |
- name: Validate EC2 termination | |
run: aws ec2 describe-instances --instance-ids ${{ needs.start-runner.outputs.ec2-instance-id }} | |
``` | |
- Ensures the EC2 instance has been properly terminated. | |
--- | |
### Key Highlights | |
1. **Sequential Execution**: | |
- The `start-runner`, `deploy`, and `stop-runner` jobs are executed sequentially. | |
2. **Error Handling**: | |
- The `stop-runner` job runs even if previous jobs fail (`if: ${{ always() }}`). | |
3. **Efficiency**: | |
- Docker layer caching speeds up builds. | |
- Cleanup steps maintain a clean environment. | |
4. **Security**: | |
- Secrets are masked and removed after use. | |
- Proper resource cleanup ensures cost efficiency. | |
--- | |
This pipeline ensures robust deployment with error handling, logging, and cleanup mechanisms. So far we have discussed the GitHub Actions pipeline , the basic structure of the pipeline, and the steps involved in the pipeline. | |
Next we will have an interdependent pipeline where the output of one job will be used as input for the next job. | |
--- | |
## Advanced Pipeline with | |
* Sequential Flow: Each job has clear dependencies, ensuring no step runs out of order. | |
* Code Checkout: Explicit repository checkout in each job ensures consistent source code. | |
* Secure Credential Handling: Sensitive credentials are masked and stored securely. | |
* Resource Cleanup: Includes Docker clean-up and EC2 instance termination validation. | |
* Logging: Added detailed logs to improve debugging and monitoring. | |
Step 1: Start EC2 Runner | |
Purpose: Initializes a self-hosted EC2 runner for running subsequent jobs. | |
Key Actions: | |
Configures AWS credentials. | |
Launches an EC2 instance using specified AMI, instance type, and networking configurations. | |
Outputs the runner label and instance ID for downstream jobs. | |
Step 2: Test PyTorch Code Using Docker Compose | |
Purpose: Tests the PyTorch training and evaluation services. | |
Key Actions: | |
Checks out the repository. | |
Sets up Docker Buildx for advanced build capabilities. | |
Configures AWS credentials and creates a masked .env file for secure credential sharing. | |
Runs all services (train, eval) using Docker Compose, monitors logs, and cleans up containers. | |
Step 3: Build, Tag, and Push Docker Image | |
Purpose: Builds a Docker image, tags it, and pushes it to Amazon ECR after successful tests. | |
Key Actions: | |
Checks out the repository again to ensure consistency. | |
Logs into Amazon ECR using AWS credentials. | |
Builds and tags the Docker image with latest and SHA-based tags. | |
Pushes the image to Amazon ECR and verifies by pulling it back. | |
Step 4: Stop and Delete EC2 Runner | |
Purpose: Stops and terminates the EC2 instance to ensure cost efficiency and cleanup. | |
Key Actions: | |
Configures AWS credentials. | |
Stops the EC2 instance using the label and instance ID from start-runner. | |
Validates the termination state of the EC2 instance to ensure proper cleanup. |