Zero API fees after hardware. OpenAI-compatible API. GGUF and HuggingFace model support. Bundled with Aura Workshop as a one-click sidecar.
Cloud API providers charge per token, forever. With Aura Inference Engine, your only cost is the hardware you already own. Run millions of tokens per day at zero marginal cost.
Aura Inference Engine supports the most popular open-source model formats. Download from HuggingFace or point to your own GGUF files.
Quantized models in GGUF format. Support for Q4, Q5, Q6, Q8, and FP16 quantization levels. Optimized for CPU and GPU inference.
Automatically scans ~/.cache/huggingface/hub/ for downloaded models. Browse and launch directly from the Aura AI model browser.
Dedicated model directory at ~/.cache/aura-inference/models/ for your GGUF files. Drop in a model and it appears in the browser.
Full support for LLaMA 2, LLaMA 3, Code LLaMA, and all derivatives. Any model in GGUF format works out of the box.
Optimized for code generation models like CodeGemma, DeepSeek Coder, StarCoder, and WizardCoder.
Run multilingual models for content in any language. No external API needed for translation or multilingual tasks.
From MacBook Air to dedicated GPU servers, Aura Inference scales to your hardware.
| Configuration | RAM | Recommended Models | Performance |
|---|---|---|---|
| Apple Silicon (M1/M2/M3/M4) | 16GB+ | 7B-13B Q4/Q5 models | Excellent — Metal GPU acceleration, unified memory |
| NVIDIA GPU (8GB+ VRAM) | 16GB+ | 7B-70B depending on VRAM | Excellent — CUDA acceleration for maximum throughput |
| CPU Only (x86/ARM) | 8GB+ | 3B-7B Q4 models | Good — AVX2/NEON optimized, slower than GPU |
| AMD GPU (ROCm) | 16GB+ | 7B-13B models | Good — ROCm support for compatible AMD GPUs |
Aura Inference Engine ships as a sidecar binary inside Aura Workshop. No separate installation. No Docker. No Python environments. Just open the app, go to the Aura AI tab, and click Start.
Aura Inference Engine exposes an OpenAI-compatible REST API. Any tool, library, or application that works with the OpenAI API works with Aura Inference. Just change the base URL.
Download Aura Workshop. Open the Aura AI tab. Click Start. That's it.