How to Expand Your AI Music Generations of 30 Seconds to Several Minutes

Community Article Published December 13, 2024

Imagine creating a symphony from a simple 30-second audio snippet or turning a brief melody into an entire song. With AI-powered tools like Facebook's MusicGen, this is now possible. In this tutorial, you'll learn how to build an API that can take a short audio file, extend it to several minutes of cohesive music, and process it to professional-grade quality.

What You’ll Learn

Uploading and Processing Audio: Handle multiple formats like MP3, WAV, etc.
AI-Powered Music Expansion: Extend tracks seamlessly using Facebook’s MusicGen.
Ensuring Cohesion: Use the same description (prompt) for the initial and extended audio for better consistency.
Post-Processing for Audio Quality: Clean up the generated audio with normalization and filters.
Deployment Options: Deploy locally or on RunPod for scalable GPU hosting.

Why Use the Same Prompt for Expansion?

The prompt (or description) plays a crucial role in generating consistent music. When expanding a track, using the same prompt ensures:

Musical Cohesion: The extended segments match the theme, mood, and style of the original audio.
Natural Transitions: Overlapping and blending become smoother with similar soundscapes.
Creative Integrity: Avoids jarring changes in tone or genre between the original and generated sections.

Full Code Implementation

Below is the full implementation for your Music Generation API:

from fastapi import FastAPI, HTTPException, BackgroundTasks, UploadFile, File, Form
from pydantic import BaseModel, Field
import uvicorn
import torch
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
from tempfile import NamedTemporaryFile, gettempdir
import logging
import os
import soundfile as sf
from pydub import AudioSegment
from pydub.effects import normalize, high_pass_filter, low_pass_filter
import threading

# Initialize logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI()

# Lock for thread-safe model access
model_lock = threading.Lock()

# Function to initialize and reset the model
def get_musicgen_model():
    logger.info("Loading MusicGen model...")
    model = MusicGen.get_pretrained("facebook/musicgen-large")
    model.set_generation_params(use_sampling=True, top_k=250)
    return model

# Global model instance
model = get_musicgen_model()

@app.post("/extend-audio/")
async def extend_audio(
    total_duration: int = Form(..., gt=0, le=300, description="Desired total duration of the audio in seconds (1-300)."),
    description: str = Form(...),
    segment_duration: int = Form(30, description="Duration of generated segments in seconds (default: 30)."),
    overlap: int = Form(7, description="Overlap duration in seconds for smoother transitions (default: 7)."),
    file: UploadFile = File(...),
):
    try:
        logger.info(f"Extending audio: {total_duration}s with description '{description}'")

        # Save the uploaded file temporarily
        with NamedTemporaryFile(delete=False) as temp_file:
            temp_file.write(await file.read())
            input_audio_path = temp_file.name

        # Read and convert audio if necessary
        try:
            input_audio, sample_rate = sf.read(input_audio_path)
            input_audio = torch.tensor(input_audio).unsqueeze(0).float()
        except RuntimeError:
            logger.info("Converting unsupported format to WAV...")
            audio = AudioSegment.from_file(input_audio_path)
            wav_temp_path = f"{gettempdir()}/converted_audio.wav"
            audio.export(wav_temp_path, format="wav")
            input_audio, sample_rate = sf.read(wav_temp_path)
            input_audio = torch.tensor(input_audio).unsqueeze(0).float()
            os.remove(wav_temp_path)
        finally:
            os.remove(input_audio_path)

        # Generate audio in a thread-safe manner
        with model_lock:
            segment = model.generate_continuation(
                input_audio, sample_rate, descriptions=[description], progress=True
            )

        # Generate additional segments
        while total_duration > 0:
            last_sec = segment[:, :, -overlap * sample_rate:]
            with model_lock:
                next_segment = model.generate_continuation(
                    last_sec, sample_rate, descriptions=[description], progress=True
                )
            segment = torch.cat([segment[:, :, :-overlap * sample_rate], next_segment], dim=2)
            total_duration -= (segment_duration - overlap)

        # Save and process final audio
        final_audio = segment.detach().cpu().float()[0]
        output_path = f"extended_audio_{torch.randint(0, 100000, (1,)).item()}.wav"
        audio_write(output_path, final_audio, sample_rate)

        return {"file_path": output_path}

    except Exception as e:
        logger.error(f"Error: {e}")
        raise HTTPException(status_code=500, detail="Audio generation failed.")

Key Features of the API

Multi-Format Upload
Handles MP3, FLAC, WAV, and more by converting to WAV when necessary.
Seamless Expansion
Generates additional segments with overlapping transitions for cohesion.
Customizable Output
Set segment duration, overlap, and total length.
Post-Processing
Enhances audio quality with normalization and frequency filtering.
Thread-Safe Model Access
Manages concurrent requests without conflicts.

How to Use the API Locally

Install Dependencies

pip install fastapi uvicorn torch soundfile pydub audiocraft

Run the Server
```
python main.py
```

Send a Test Request
Use curl to send a POST request:

curl -X POST "http://127.0.0.1:8000/extend-audio/" \
-F "total_duration=120" \
-F "description='Calm piano with ambient strings'" \
-F "file=@path_to_audio.wav"

Deployment on RunPod

Why RunPod?

RunPod is an excellent platform for GPU-powered deployments. It offers affordable, scalable GPU hosting for AI models like MusicGen.

Steps to Deploy

Create a GPU Instance
Visit RunPod and set up a GPU environment.

Prepare a Dockerfile

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and Run the Docker Image

docker build -t musicgen-api .
docker run -p 8000:8000 musicgen-api

Access the API
Use the public IP provided by RunPod to interact with your API.

Best Practices for Expanding Audio

Re-use the Same Prompt
Consistency in prompts ensures the generated audio aligns seamlessly with the original.
Adjust Overlap for Smooth Transitions
Experiment with overlap values (default: 7 seconds) to minimize artifacts during transitions.
Pre-Process Input Audio
Ensure your input audio is clean and normalized for the best output quality.
Monitor Model Parameters
Fine-tune MusicGen's parameters like top_k to balance creativity and coherence.

Final Thoughts

This API empowers creators, musicians, and developers to extend their short audio tracks into beautiful, cohesive compositions. Whether you're producing a full soundtrack or exploring AI's creative potential, this tutorial equips you with the tools to get started.

For more AI innovations, check out my projects on Hugging Face. Feel free to connect with me on LinkedIn and share your creations! 🎶✨

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote