Optimized Inference Engines

Supported Engines

Chat Preview
Code Preview

Chat Engine

The chat engine is general-purpose chat-interactions such as chatbots, support assistants, etc. It intelligently routes each query to one of the below models: model selection:

GPT-4-Turbo
Claude 3 Sonnet
Claude 3 Haiku

Code Engine

The code engine is optimized for coding-related use-cases such as code generation, coding copilots, code explanation, etc. It intelligently routes each query to one of the below models: model selection:

GPT-4-Turbo
Claude 3 Sonnet
Claude 3 Haiku

Usage

An engine is a collection of LLMs with a routing function that identifies the optimal model for each given query. You can treat an engine as a sort of ‘meta LLM’.

from openai import OpenAI

client = OpenAI(
    base_url="https://router.neutrinoapp.com/api/engines",
    api_key="<Neutrino-API-key>"
)

response = client.chat.completions.create(
    # Instead of a specific model, set this to the Neutrino engine of choice
    model="chat-preview",  # options: "chat-preview", "code-preview"
    messages = [
        {"role": "system", "content": "You are a helpful AI assistant. Your job is to be helpful and respond to user requests."},
        {"role": "user", "content": "What is a Neutrino?"},
    ],
)

print(f"Optimal model: {response.model}")
print(response.choices[0].message.content)

Streaming Responses

from openai import OpenAI

client = OpenAI(
    base_url="https://router.neutrinoapp.com/api/engines",
    api_key="<Neutrino-API-key>"
)

response = client.chat.completions.create(
    # Instead of a specific model, set this to the Neutrino engine of choice
    model="chat-preview",  # options: "chat-preview", "code-preview"
    messages = [
        {"role": "system", "content": "You are a helpful AI assistant. Your job is to be helpful and respond to user requests."},
        {"role": "user", "content": "Does a Neutrino have mass?"},
    ],
    stream=True
)

for i, chunk in enumerate(response):
    if i == 0:
        print(f"Optimal model: {chunk.model}")
    print(chunk.choices[0].delta.content, end="")

Get Started

Inference Engines

Model Gateway

Router Tags

Structured Outputs

Integrations

Pricing

Optimized Inference Engines

Supported Engines

Chat Engine

Code Engine

Usage

Streaming Responses

Get Started

Inference Engines

Model Gateway

Router Tags

Structured Outputs

Integrations

Pricing

​Supported Engines

​Chat Engine

​Code Engine

​Usage

​Streaming Responses

Supported Engines

Chat Engine

Code Engine

Usage

Streaming Responses