Open models and the tooling you already use

Run Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, and more on your own machine — and serve them to Cursor, continue.dev, LangChain, the Vercel AI SDK, or any app that speaks the OpenAI format.

Works with the models and tools you care about

30+

Curated Models

Open-weight families quantized and ready to run

8

Model Families

Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, Mixtral, TinyLlama

OpenAI

Compatible API

Drop-in for any SDK or tool you already use

BYOW

Bring Your Own Weights

Load any GGUF — or add a compatible Ollama registry

Every open model family, one runtime

VirexaLLM speaks GGUF (and MLX on Apple Silicon), so every open-weight family runs under the same engine — with per-model hot-swap, consistent sampling, and a single local API.

Model families and supported tooling

Chat & Instruct Models

Llama 3.1 8B
Llama 3.2 3B
Mistral 7B
Mixtral-small
Phi-3-mini
Phi-3-medium

Lightweight / On-device

TinyLlama 1.1B
Phi-3-mini 3.8B
Gemma 2 2B
Qwen 2.5 1.5B
SmolLM 1.7B
StableLM 3B

Reasoning & Coding

DeepSeek-R1-Distill
DeepSeek Coder
Qwen 2.5 Coder
Llama 3.1 Instruct
CodeGemma
StarCoder2

Multilingual & Specialist

Qwen 2.5 7B
Gemma 2 9B
Mistral Nemo
Aya Expanse
BGE Embeddings
Nomic Embed

Quantizations Supported

Q4_K_M
Q5_K_M
Q6_K
Q8_0
FP16
GGUF / MLX

IDEs & AI Tools

Cursor
continue.dev
Zed AI
Aider
Open WebUI
Raycast AI

SDKs & Frameworks

OpenAI SDK
LangChain
LlamaIndex
Vercel AI SDK
LiteLLM-compatible
Ollama-compatible

Languages

Python
Node.js / TypeScript
Go
Rust
Swift
.NET

Three steps to every open model

Install the app, pick a model, point your tool at localhost.

1

Install

Download VirexaLLM for macOS, Windows, or Linux — under 100 MB, signed, no dependencies.

2

Pick a model

Llama 3.1, Mistral 7B, Phi-3-mini, Gemma 2, Qwen 2.5, DeepSeek-R1 — all one click away.

3

Point your tool

Set OPENAI_BASE_URL=http://localhost:1775/v1 in your SDK, IDE, or agent — and go.

Integration security

Everything signed, verified, and local by default.

Code-Signed Binaries
Signed Model Weights
Local-Only By Default
Air-Gap Mode
TLS for Admin Console
Reproducible Builds

Consistent behavior across every model

Streaming, tool calling, vision (where supported), and JSON mode behave the same regardless of the model family. Write your code once — point it at Llama today, Qwen tomorrow, your own fine-tune next week.

Works with any language or framework

If it speaks HTTP or the OpenAI format, it works with VirexaLLM running on your machine.

OpenAI SDK Drop-in

Python, Node, Go, .NET — repoint the base URL to localhost and keep every call you already wrote.

LangChain & LlamaIndex

Use VirexaLLM as your local LLM provider. Callbacks, tools, and streaming all behave like the cloud.

curl & Raw HTTP

Any language that speaks HTTP can call the local server. Vercel AI SDK, Rust, shell scripts — all welcome.

Frequently asked questions

How quickly can I swap models?
Hot-swap by name in the API request, or pick a different one in the chat UI. The previous model unloads and the next one is memory-mapped in seconds.
Can I bring my own weights?
Yes. Drop any compatible GGUF into the models folder and VirexaLLM will serve it. Managed fleets can restrict this via signed policies.
What about Ollama-style registries?
VirexaLLM can pull from Ollama-compatible model registries. Your existing Ollama model library works without re-downloading.
Does streaming work across all models?
Yes. Streaming, tool calls (where the model supports them), and JSON mode are normalized so your code behaves identically across families.
Which SDKs are supported?
Any SDK that speaks the OpenAI API format — the official OpenAI SDKs, LangChain, LlamaIndex, the Vercel AI SDK, Cursor, continue.dev, and raw HTTP in any language.

Your laptop is the server now

Download VirexaLLM and run Llama, Mistral, Phi-3, Gemma, or Qwen locally in minutes. Free desktop app for macOS, Windows, and Linux — your prompts never leave the device.