An OpenAI-compatible server. On your laptop. Right now.

VirexaLLM ships a local HTTP server at http://localhost:1775/v1 that speaks the OpenAI API. Point any SDK, IDE, or agent at it and run Llama, Mistral, Phi-3, Gemma, or Qwen without a cloud account.

The local API server, by the numbers

localhost:1775

Local Endpoint

OpenAI-compatible server running on your own machine

1 line

Integration Change

Repoint your base URL — keep every SDK and tool

0 bytes

Leave the Device

Prompts, files, and completions stay on disk

30+ tok/s

Typical Throughput

On Apple Silicon with a 7B Q4_K_M quant

Works with the SDKs and IDEs you already use

Change the base URL. Keep every line of code.

OpenAI SDKAnthropic SDKVercel AI SDKLangChainLlamaIndexcontinue.devCursorZedAiderOpen WebUIcurl / HTTPNode.jsPythonGoRustSwiftLiteLLM-compatibleOllama-compatible

What the local server gives you

Production-grade primitives for any app that already speaks the OpenAI format.

OpenAI-Compatible API

Drop-in replacement for api.openai.com served from localhost. Same request shape, same SDKs, your own models underneath.

Streaming & Tool Calling

Full support for streaming, function calling, and JSON mode on every model that exposes them — implemented locally.

Hot-Swap Models

Switch between Llama 3, Mistral, Phi-3, and Qwen per request by setting the model field. The runtime loads and caches on demand.

Zero Config

Start the server from the tray icon. No Python env, no Docker, no GGUF hunting — the app handles model download and quant selection.

Header-Level Controls

Override temperature, context length, thread count, or force a CPU-only run per request with simple X-Virexa headers.

Per-Request Traces

Every call captured with prompt, completion, tokens/sec, memory high-water, and model hash — inspectable in the desktop app.

Drop-in OpenAI compatibility

Keep the SDK, keep the request shape, keep your streaming handlers. Set OPENAI_BASE_URL to http://localhost:1775/v1 and suddenly your app is running on a model sitting on your own disk.

OpenAI-compatible local endpoint

Three steps to local inference

1

Install the app

Signed installer for macOS, Windows, or Linux. Under 100 MB, no dependencies, no Docker.

2

Pick a model

Llama 3.1 8B, Mistral 7B, Phi-3-mini, Gemma 2, Qwen 2.5 — one-click download with the right quantization.

3

Point your app

Set OPENAI_BASE_URL=http://localhost:1775/v1. Streaming, tool use, and JSON mode work on day one.

One binary. Every open model family.

Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, TinyLlama, Mixtral-small — all loaded by the same local runtime, all exposed through the same OpenAI-shaped endpoint. Switch families in one API call.

Hot-swap between open model families

Built for the way you already build

No proprietary SDK. No Python env to babysit. If your app talks to OpenAI today, it can talk to VirexaLLM in five seconds.

OpenAI SDK

Python, Node, Go, .NET — every official SDK works unchanged when you repoint the base URL.

LangChain & LlamaIndex

Use VirexaLLM as your LLM provider. Callbacks, streaming, and tool APIs behave exactly like the cloud.

Cursor, continue.dev, Zed

Configure the custom OpenAI endpoint in any modern AI IDE and keep coding with a local model.

Frequently asked questions

How much code do I have to rewrite?
One line. Change your OpenAI base URL to http://localhost:1775/v1 and keep the same SDK, request shape, and streaming code you already ship.
Do I need an internet connection?
Only to download a model the first time. After that, VirexaLLM runs fully offline — on a plane, in a SCIF, or behind a strict customer firewall.
Which features work across all models?
Streaming, tool calling, JSON mode, and vision (where the model supports it) are normalized so your code behaves the same on Llama, Mistral, or Qwen.
What happens when I switch models?
The previous model is unloaded from RAM and the new one is memory-mapped. A typical 7B Q4_K_M hot-swap completes in a few seconds.
How do I pick a model?
Specify the model name in the request — or let VirexaLLM route to the smallest local model that fits your latency and context-length requirements.

Your laptop is the server now

Download VirexaLLM and run Llama, Mistral, Phi-3, Gemma, or Qwen locally in minutes. Free desktop app for macOS, Windows, and Linux — your prompts never leave the device.