Open models and the tooling you already use

Run Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, and more on your own machine — and serve them to Cursor, continue.dev, LangChain, the Vercel AI SDK, or any app that speaks the OpenAI format.

Download Browse the catalog

Works with the models and tools you care about

30+

Curated Models

Open-weight families quantized and ready to run

Model Families

Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek, Mixtral, TinyLlama

OpenAI

Compatible API

Drop-in for any SDK or tool you already use

BYOW

Bring Your Own Weights

Load any GGUF — or add a compatible Ollama registry

Every open model family, one runtime

VirexaLLM speaks GGUF (and MLX on Apple Silicon), so every open-weight family runs under the same engine — with per-model hot-swap, consistent sampling, and a single local API.

Model families and supported tooling

Chat & Instruct Models

Llama 3.1 8B

Llama 3.2 3B

Mistral 7B

Mixtral-small

Phi-3-mini

Phi-3-medium

Lightweight / On-device

TinyLlama 1.1B

Phi-3-mini 3.8B

Gemma 2 2B

Qwen 2.5 1.5B

SmolLM 1.7B

StableLM 3B

Reasoning & Coding

DeepSeek-R1-Distill

DeepSeek Coder

Qwen 2.5 Coder

Llama 3.1 Instruct

CodeGemma

StarCoder2

Multilingual & Specialist

Qwen 2.5 7B

Gemma 2 9B

Mistral Nemo

Aya Expanse

BGE Embeddings

Nomic Embed

Quantizations Supported

Q4_K_M

Q5_K_M

Q6_K

Q8_0

FP16

GGUF / MLX

IDEs & AI Tools

Cursor

continue.dev

Zed AI

Aider

Open WebUI

Raycast AI

SDKs & Frameworks

OpenAI SDK

LangChain

LlamaIndex

Vercel AI SDK

LiteLLM-compatible

Ollama-compatible

Languages

Python

Node.js / TypeScript

Rust

Swift

.NET

Three steps to every open model

Install the app, pick a model, point your tool at localhost.

Install

Download VirexaLLM for macOS, Windows, or Linux — under 100 MB, signed, no dependencies.

Pick a model

Llama 3.1, Mistral 7B, Phi-3-mini, Gemma 2, Qwen 2.5, DeepSeek-R1 — all one click away.

Point your tool

Set OPENAI_BASE_URL=http://localhost:1775/v1 in your SDK, IDE, or agent — and go.

Integration security

Everything signed, verified, and local by default.

Code-Signed Binaries

Signed Model Weights

Local-Only By Default

Air-Gap Mode

TLS for Admin Console

Reproducible Builds

Consistent behavior across every model

Streaming, tool calling, vision (where supported), and JSON mode behave the same regardless of the model family. Write your code once — point it at Llama today, Qwen tomorrow, your own fine-tune next week.

Works with any language or framework

If it speaks HTTP or the OpenAI format, it works with VirexaLLM running on your machine.

OpenAI SDK Drop-in

Python, Node, Go, .NET — repoint the base URL to localhost and keep every call you already wrote.

LangChain & LlamaIndex

Use VirexaLLM as your local LLM provider. Callbacks, tools, and streaming all behave like the cloud.

curl & Raw HTTP

Any language that speaks HTTP can call the local server. Vercel AI SDK, Rust, shell scripts — all welcome.

Frequently asked questions

How quickly can I swap models?

Hot-swap by name in the API request, or pick a different one in the chat UI. The previous model unloads and the next one is memory-mapped in seconds.

Can I bring my own weights?

Yes. Drop any compatible GGUF into the models folder and VirexaLLM will serve it. Managed fleets can restrict this via signed policies.

What about Ollama-style registries?

VirexaLLM can pull from Ollama-compatible model registries. Your existing Ollama model library works without re-downloading.

Does streaming work across all models?

Yes. Streaming, tool calls (where the model supports them), and JSON mode are normalized so your code behaves identically across families.

Which SDKs are supported?

Any SDK that speaks the OpenAI API format — the official OpenAI SDKs, LangChain, LlamaIndex, the Vercel AI SDK, Cursor, continue.dev, and raw HTTP in any language.

Your laptop is the server now

Download VirexaLLM and run Llama, Mistral, Phi-3, Gemma, or Qwen locally in minutes. Free desktop app for macOS, Windows, and Linux — your prompts never leave the device.

Download