Instructions to use WhiteRabbitNeo/Trinity-13B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WhiteRabbitNeo/Trinity-13B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WhiteRabbitNeo/Trinity-13B", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WhiteRabbitNeo/Trinity-13B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("WhiteRabbitNeo/Trinity-13B", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use WhiteRabbitNeo/Trinity-13B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WhiteRabbitNeo/Trinity-13B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WhiteRabbitNeo/Trinity-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/WhiteRabbitNeo/Trinity-13B

SGLang

How to use WhiteRabbitNeo/Trinity-13B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WhiteRabbitNeo/Trinity-13B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WhiteRabbitNeo/Trinity-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WhiteRabbitNeo/Trinity-13B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WhiteRabbitNeo/Trinity-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use WhiteRabbitNeo/Trinity-13B with Docker Model Runner:
```
docker model run hf.co/WhiteRabbitNeo/Trinity-13B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Trinity

Trinity is a coding specific model series that can be used to create autonomous agents. In the future, we will be releasing agent software that uses this model.

Our Offensive Cybersecurity Model WhiteRabbitNeo-33B model is now in beta!

Check out the Prompt Enhancing feature! Access at: https://www.whiterabbitneo.com/

Join Our Discord Server

Join us at: https://discord.gg/8Ynkrcbk92 (Updated on Dec 29th. Now permanent link to join)

Terms of Use

By accessing and using this Artificial Intelligence (AI) model, you, the user, acknowledge and agree that you are solely responsible for your use of the model and its outcomes. You hereby agree to indemnify, defend, and hold harmless the creators, developers, and any affiliated persons or entities of this AI model from and against any and all claims, liabilities, damages, losses, costs, expenses, fees (including reasonable attorneys' fees and court costs) that may arise, directly or indirectly, from your use of the AI model.

This AI model is provided "as is" and "as available" without any warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. The creators make no warranty that the AI model will meet your requirements or be available on an uninterrupted, secure, or error-free basis.

Your use of the AI model is at your own risk and discretion, and you will be solely responsible for any damage to computer systems or loss of data that results from the use of the AI model.

This disclaimer constitutes part of the agreement between you and the creators of the AI model regarding your use of the model, superseding any prior agreements between you and the creators regarding your use of this AI model.

Sample Inference Code

import torch, json
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "/home/migel/models/WhiteRabbitNeo"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_4bit=False,
    load_in_8bit=True,
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)


def generate_text(instruction):
    tokens = tokenizer.encode(instruction)
    tokens = torch.LongTensor(tokens).unsqueeze(0)
    tokens = tokens.to("cuda")

    instance = {
        "input_ids": tokens,
        "top_p": 1.0,
        "temperature": 0.5,
        "generate_len": 1024,
        "top_k": 50,
    }

    length = len(tokens[0])
    with torch.no_grad():
        rest = model.generate(
            input_ids=tokens,
            max_length=length + instance["generate_len"],
            use_cache=True,
            do_sample=True,
            top_p=instance["top_p"],
            temperature=instance["temperature"],
            top_k=instance["top_k"],
            num_return_sequences=1,
        )
    output = rest[0][length:]
    string = tokenizer.decode(output, skip_special_tokens=True)
    answer = string.split("USER:")[0].strip()
    return f"{answer}"


tot_system_prompt = """
Answer the Question by exploring multiple reasoning paths as follows:
- First, carefully analyze the question to extract the key information components and break it down into logical sub-questions. This helps set up the framework for reasoning. The goal is to construct an internal search tree.
- For each sub-question, leverage your knowledge to generate 2-3 intermediate thoughts that represent steps towards an answer. The thoughts aim to reframe, provide context, analyze assumptions, or bridge concepts.
- Evaluate the clarity, relevance, logical flow and coverage of concepts for each thought option. Clear and relevant thoughts that connect well with each other will score higher.
- Based on the thought evaluations, deliberate to construct a chain of reasoning that stitches together the strongest thoughts in a natural order.
- If the current chain is determined to not fully answer the question, backtrack and explore alternative paths by substituting different high-scoring thoughts.
- Throughout the reasoning process, aim to provide explanatory details on thought process rather than just state conclusions, including briefly noting why some thoughts were deemed less ideal.
- Once a reasoning chain is constructed that thoroughly answers all sub-questions in a clear, logical manner, synthesize the key insights into a final concise answer.
- Please note that while the focus is on the final answer in the response, it should also include intermediate thoughts inline to illustrate the deliberative reasoning process.
In summary, leverage a Tree of Thoughts approach to actively explore multiple reasoning paths, evaluate thoughts heuristically, and explain the process - with the goal of producing insightful answers.
"""

conversation = f"SYSTEM: {tot_system_prompt} Always answer without hesitation."


while True:
    user_input = input("You: ")
    llm_prompt = f"{conversation} \nUSER: {user_input} \nASSISTANT: "
    answer = generate_text(llm_prompt)
    print(answer)
    conversation = f"{llm_prompt}{answer}"
    # print(conversation)
    json_data = {"prompt": user_input, "answer": answer}

    # print(json_data)
    # with open(output_file_path, "a") as output_file:
    #     output_file.write(json.dumps(json_data) + "\n")

Sample Conversation:

You: Code me a Fast API server, with one endpoint. This endpoint returns files from a S3 bucket.

Here is how to code a Fast API server with one endpoint that returns files from an S3 bucket:

from fastapi import FastAPI                                                                                                                                                                            
from starlette.responses import StreamingResponse                                                                                                                                                      
import boto3                                                                                                                                                                                           
import botocore.exceptions                                                                                                                                                                             
                                                                                                                                                                                                       
app = FastAPI()                                                                                                                                                                                        
s3_client = boto3.client('s3')                                                                                                                                                                         
                                                                                                                                                                                                       
@app.get("/s3files")                                                                                                                                                                                   
async def get_s3_files(bucket_name: str, key: str):                                                                                                                                                    
    try:                                                                                                                                                                                               
        obj = s3_client.get_object(Bucket=bucket_name, Key=key)                                                                                                                                        
        return StreamingResponse(obj['Body'], media_type=obj['ContentType'])                                                                                                                           
    except botocore.exceptions.ClientError as e:                                                                                                                                                       
        return {"error": str(e)}, 404

To explain:

We initialize a FastAPI app and an S3 client
The /s3files endpoint takes the bucket name and object key as parameters
We try to get the object from S3 using the S3 client
If successful, we return the object's Body wrapped in a StreamingResponse, setting the ContentType
If the client raises an error, we return a 404 with the error message

This allows clients to request files from an S3 bucket by specifying the bucket name and object key. The endpoint streams the file content back to the client.

To elaborate further:

The endpoint can be extended to validate inputs, handle authentication, logging, etc.
We could make this endpoint more generic by accepting a list of object keys instead of requiring a single key.
There are several ways to optimize performance, like caching responses.

Let me know if you have any other questions!

You: How do I authenticate s3?

Here are a few ways to authenticate with S3:

Credentials - You can provide AWS access key ID and secret access key to boto3. This will sign requests automatically.

import boto3 

s3 = boto3.client('s3', 
                 aws_access_key_id='YOUR_ACCESS_KEY',
                 aws_secret_access_key='YOUR_SECRET_KEY')

IAM roles - If you are running on EC2 with an IAM role, you don't need to provide credentials. Boto3 will automatically use the role.
Environment variables - Store credentials in environment variables and read them in boto3.
AWS credentials file - Store credentials in ~/.aws/credentials. Boto3 will automatically use them.
STS Assume roles - Get temporary credentials from STS and assume a role to access S3.
AWS SDK credential providers - Use environment variables, configuration files, etc to load credentials.

So in summary, the best way depends on your deployment environment. The best practice is to use IAM roles when possible. Other options like environment variables and credential files can be useful for testing.

Let me know if you have any other questions!

Downloads last month: 307

Model tree for WhiteRabbitNeo/Trinity-13B

Quantizations

2 models

WhiteRabbitNeo
/

Trinity-13B