An OpenAI-compatible server. On your laptop. Right now.
VirexaLLM ships a local HTTP server at http://localhost:1775/v1 that speaks the OpenAI API. Point any SDK, IDE, or agent at it and run Llama, Mistral, Phi-3, Gemma, or Qwen without a cloud account.
The local API server, by the numbers
localhost:1775
Local Endpoint
OpenAI-compatible server running on your own machine
1 line
Integration Change
Repoint your base URL — keep every SDK and tool
0 bytes
Leave the Device
Prompts, files, and completions stay on disk
30+ tok/s
Typical Throughput
On Apple Silicon with a 7B Q4_K_M quant
Works with the SDKs and IDEs you already use
Change the base URL. Keep every line of code.
What the local server gives you
Production-grade primitives for any app that already speaks the OpenAI format.
OpenAI-Compatible API
Drop-in replacement for api.openai.com served from localhost. Same request shape, same SDKs, your own models underneath.
Streaming & Tool Calling
Full support for streaming, function calling, and JSON mode on every model that exposes them — implemented locally.
Hot-Swap Models
Switch between Llama 3, Mistral, Phi-3, and Qwen per request by setting the model field. The runtime loads and caches on demand.
Zero Config
Start the server from the tray icon. No Python env, no Docker, no GGUF hunting — the app handles model download and quant selection.
Header-Level Controls
Override temperature, context length, thread count, or force a CPU-only run per request with simple X-Virexa headers.
Per-Request Traces
Every call captured with prompt, completion, tokens/sec, memory high-water, and model hash — inspectable in the desktop app.
Drop-in OpenAI compatibility
Keep the SDK, keep the request shape, keep your streaming handlers. Set OPENAI_BASE_URL to http://localhost:1775/v1 and suddenly your app is running on a model sitting on your own disk.

Three steps to local inference
Install the app
Signed installer for macOS, Windows, or Linux. Under 100 MB, no dependencies, no Docker.
Pick a model
Llama 3.1 8B, Mistral 7B, Phi-3-mini, Gemma 2, Qwen 2.5 — one-click download with the right quantization.
Point your app
Set OPENAI_BASE_URL=http://localhost:1775/v1. Streaming, tool use, and JSON mode work on day one.
One binary. Every open model family.
Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, TinyLlama, Mixtral-small — all loaded by the same local runtime, all exposed through the same OpenAI-shaped endpoint. Switch families in one API call.

Built for the way you already build
No proprietary SDK. No Python env to babysit. If your app talks to OpenAI today, it can talk to VirexaLLM in five seconds.
OpenAI SDK
Python, Node, Go, .NET — every official SDK works unchanged when you repoint the base URL.
LangChain & LlamaIndex
Use VirexaLLM as your LLM provider. Callbacks, streaming, and tool APIs behave exactly like the cloud.
Cursor, continue.dev, Zed
Configure the custom OpenAI endpoint in any modern AI IDE and keep coding with a local model.
Frequently asked questions
How much code do I have to rewrite?
Do I need an internet connection?
Which features work across all models?
What happens when I switch models?
How do I pick a model?
Your laptop is the server now
Download VirexaLLM and run Llama, Mistral, Phi-3, Gemma, or Qwen locally in minutes. Free desktop app for macOS, Windows, and Linux — your prompts never leave the device.