Open models and the tooling you already use
Run Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, and more on your own machine — and serve them to Cursor, continue.dev, LangChain, the Vercel AI SDK, or any app that speaks the OpenAI format.
Works with the models and tools you care about
30+
Curated Models
Open-weight families quantized and ready to run
8
Model Families
Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, Mixtral, TinyLlama
OpenAI
Compatible API
Drop-in for any SDK or tool you already use
BYOW
Bring Your Own Weights
Load any GGUF — or add a compatible Ollama registry
Every open model family, one runtime
VirexaLLM speaks GGUF (and MLX on Apple Silicon), so every open-weight family runs under the same engine — with per-model hot-swap, consistent sampling, and a single local API.
Model families and supported tooling
Chat & Instruct Models
Lightweight / On-device
Reasoning & Coding
Multilingual & Specialist
Quantizations Supported
IDEs & AI Tools
SDKs & Frameworks
Languages
Three steps to every open model
Install the app, pick a model, point your tool at localhost.
Install
Download VirexaLLM for macOS, Windows, or Linux — under 100 MB, signed, no dependencies.
Pick a model
Llama 3.1, Mistral 7B, Phi-3-mini, Gemma 2, Qwen 2.5, DeepSeek-R1 — all one click away.
Point your tool
Set OPENAI_BASE_URL=http://localhost:1775/v1 in your SDK, IDE, or agent — and go.
Integration security
Everything signed, verified, and local by default.
Consistent behavior across every model
Streaming, tool calling, vision (where supported), and JSON mode behave the same regardless of the model family. Write your code once — point it at Llama today, Qwen tomorrow, your own fine-tune next week.
Works with any language or framework
If it speaks HTTP or the OpenAI format, it works with VirexaLLM running on your machine.
OpenAI SDK Drop-in
Python, Node, Go, .NET — repoint the base URL to localhost and keep every call you already wrote.
LangChain & LlamaIndex
Use VirexaLLM as your local LLM provider. Callbacks, tools, and streaming all behave like the cloud.
curl & Raw HTTP
Any language that speaks HTTP can call the local server. Vercel AI SDK, Rust, shell scripts — all welcome.
Frequently asked questions
How quickly can I swap models?
Can I bring my own weights?
What about Ollama-style registries?
Does streaming work across all models?
Which SDKs are supported?
Your laptop is the server now
Download VirexaLLM and run Llama, Mistral, Phi-3, Gemma, or Qwen locally in minutes. Free desktop app for macOS, Windows, and Linux — your prompts never leave the device.