Streaming Responses

Usage

Neutrino AI

Dashboard

Support

Concurrently generate and compare responses from different models in a single request.

Model Gateway

Quickstart

Inference Engines are designed to generate optimal LLM inference for their respective use-case. They have access to a carefully curated model selection and intelligently route queries to the best-suited LLM for each prompt. Maximize response quality while optimizing for cost and latency.

Optimized Inference Engines

Supported Models

Routing tags allow you to gather observability metrics for specific sections of your AI application,  explore how different models perform on your use-case, and get the highest quality responses while balancing for cost and latency for your LLM queries.

Ingest past queries with batch upload for Neutrino's exploration system.

Manually Upload Queries

Use OpenAI's function calling API with the Neutrino router and supported models.

Get Started

Inference Engines

Model Gateway

Router Tags

Structured Outputs

Integrations

Pricing

Model Gateway

Usage

Streaming Responses

Get Started

Inference Engines

Model Gateway

Router Tags

Structured Outputs

Integrations

Pricing

​Usage

​Streaming Responses

Usage

Streaming Responses