A curated model library. Templates. Hot-swap.
Browse open-weight models, download the right quantization in one click, and switch between them per request. Save prompt templates, compare models side by side, and wire local inference into your workflow.
The local-model workflow, without the tinkering
1-click
Model Downloads
Curated registry with quantizations picked for your hardware
<2 sec
Hot-Swap Time
Swap a 7B Q4_K_M between requests with zero server restarts
Parallel
Model Comparison
Run the same prompt against multiple local models side by side
Reusable
Prompt Templates
Save, version, and share prompts across devices
Model library & workflows
Everything you need to pick models, run them, and build on them locally.
Curated Model Library
Browse Llama, Mistral, Phi-3, Gemma, Qwen, DeepSeek-R1, and Mixtral-small. Each model pre-quantized (Q4_K_M, Q5_K_M, Q8_0) with hardware fit indicators.
One-Click Downloads
Pick a model, hit Install. VirexaLLM resolves the right quant, verifies the weights, and makes it available to every app on your machine.
Hot-Swap Models
Change models per request by name, or flip the active one in the chat UI. The runtime unloads and memory-maps in seconds.
Prompt Templates
Save system prompts, chat presets, and variable slots. Reuse them across conversations, the API, or automation hooks.
Parallel Model Comparison
Send the same prompt to two or three local models at once. Diff the outputs, pick the winner, pin it as your default.
Automation Hooks
Trigger local inference from a keyboard shortcut, a shell command, or a webhook. Build agents that run entirely on the device.
Download once, run on every app
Pick Llama 3.1 8B Q4_K_M, hit Install, and the same weights power the chat UI, your IDE via continue.dev, and any script hitting http://localhost:1775/v1. No duplication, no re-downloads.

From install to inference
Three moves the desktop app makes each time you pick a new model.
Choose
Browse the curated library. VirexaLLM marks models that fit your CPU, GPU, and RAM budget.
Download
Signed GGUF weights pulled from the registry, verified against a hash, and cached on disk forever.
Run
Chat in the desktop UI or call the local API — the model is memory-mapped and ready in seconds.
Workflows teams turn on day one
Every one of these runs entirely on the laptop — no cloud account required.
Coding with Cursor or continue.dev
Point your IDE at the local server and keep writing code with a 7B model when you're offline or behind a customer firewall.
Confidential Summaries
Drop a PDF into the chat UI and summarize it with Phi-3-mini without the file ever touching the internet.
Model Bake-Offs
Run Llama 3.1 8B vs Qwen 2.5 7B vs Mistral 7B on your eval set. Pick the local winner for your use case.
Offline Agent Loops
Chain tool calls locally with DeepSeek-R1 or Llama 3.1 — useful for field devices and air-gapped research.
Tiny-Model Classification
Run TinyLlama or Phi-3-mini on CPU for cheap, fast intent detection and routing — no GPU required.
Prompt Template Library
Ship a shared template pack across your team's laptops so everyone uses the same system prompts.
Compare two local models in one click
Split the chat window and send the same prompt to Llama 3.1 8B and Qwen 2.5 7B. Watch tokens stream side by side. Pick the winner for your task and pin it as the default in your template.
Automation that runs on your machine
Wire local inference into the tools you already use — without standing up a server.
Keyboard Shortcuts
Bind a prompt template to a hotkey. Summarize, translate, or rephrase from any app — all offline.
Shell & Scripts
curl localhost:1775/v1 or use the OpenAI SDK from Python, Node, Go — local agents that never phone home.
Local Webhooks
Fire prompts on file changes, clipboard events, or custom triggers. Your laptop is the backend.
Frequently asked questions
How do I pick the right quantization?
Can I hot-swap models mid-session?
Where are models stored?
Can I bring my own GGUF?
Can I automate local inference?
Your laptop is the server now
Download VirexaLLM and run Llama, Mistral, Phi-3, Gemma, or Qwen locally in minutes. Free desktop app for macOS, Windows, and Linux — your prompts never leave the device.