sql-snippets / src /snippets /alpaca_to_chatml.md
cfahlgren1's picture
cfahlgren1 HF staff
add height
9ba3c9f
|
raw
history blame
3.96 kB
metadata
id: alpaca-to-chatml
title: Alpaca to Conversation Format
slug: alpaca-to-chatml
description: Convert Alpaca format to conversational format
code: |
  -- Convert Alpaca format to Conversation format
  WITH 
  source_view AS (
    SELECT * FROM train  -- Change 'train' to your desired view name here
  )
  SELECT 
    [
      struct_pack(
        "from" := 'user',
        "value" := CASE 
                    WHEN input IS NOT NULL AND input != '' 
                    THEN instruction || '\n\n' || input
                    ELSE instruction
                  END
      ),
      struct_pack(
        "from" := 'assistant',
        "value" := output
      )
    ] AS conversation
  FROM source_view
  WHERE instruction IS NOT NULL 
    AND output IS NOT NULL;

Converting Alpaca to ChatML Conversation Format

-- Convert Alpaca format to Conversation format
WITH 
source_view AS (
  SELECT * FROM train  -- Change 'train' to your desired view name here
)
SELECT 
  [
    struct_pack(
      "from" := 'user',
      "value" := CASE 
                   WHEN input IS NOT NULL AND input != '' 
                   THEN instruction || '\n\n' || input
                   ELSE instruction
                 END
    ),
    struct_pack(
      "from" := 'assistant',
      "value" := output
    )
  ] AS conversation
FROM source_view
WHERE instruction IS NOT NULL 
  AND output IS NOT NULL;

Why?

Differences between Alpaca and ChatML Conversation Format:

  1. Alpaca Format:

    • The Alpaca format usually has three columns: instruction, input, and output.
  2. ChatML Conversation Format:

    • The ChatML Conversation format is a JSON format that contains a list of messages.
    • Each message has a from field, which can be either system, user, or assistant.
    • The value field contains the message content.

Example

yahma/alpaca-cleaned

You can run this query through via the sql_console in the Hugging Face Hub here.

Alpaca to ChatML

Final Dataset