A curated, searchable list of tools for running AI models locally -- inference engines, UIs, quantization, fine-tuning, RAG, and more.
View on GitHubTurn long videos into viral shorts with AI. Automatically finds the best moments and creates clips optimized for TikTok, YouTube Shorts, and Instagram Reels.
Get up and running with large language models locally. Simple CLI with model management.
LLM inference in C/C++. The foundational project for GGUF-based local inference.
High-throughput and memory-efficient inference and serving engine for LLMs.
NVIDIA's library for optimizing LLM inference on NVIDIA GPUs.
Array framework for machine learning on Apple silicon, by Apple.
Universal LLM deployment engine with ML compilation.
Fast inference library for running LLMs locally on modern consumer GPUs.
Drop-in replacement REST API compatible with OpenAI. No GPU required.
Distribute and run LLMs with a single file. By Mozilla.
Minimalist ML framework for Rust with a focus on performance.
Python bindings for llama.cpp with OpenAI-compatible API server.
Toolkit for compressing, deploying, and serving LLMs with high throughput.
Fast serving framework for large language and vision models.
Feature-rich, self-hosted web UI for LLMs. Supports Ollama and OpenAI-compatible APIs.
Desktop app to discover, download, and run local LLMs.
Open-source large language model chatbot ecosystem. Run models entirely offline.
Open-source alternative to ChatGPT that runs 100% offline.
Gradio-based web UI for running large language models.
Easy-to-use AI text generation with GGUF support.
LLM frontend for power users with advanced character features.
All-in-one desktop and Docker AI app with built-in RAG and agents.
Desktop client for ChatGPT, Claude, and local models.
Enhanced ChatGPT clone supporting many AI providers.
Modern-design ChatGPT/LLM UI with Ollama support.
The largest open-source ML model repository.
Binary format for fast loading and saving of models used by llama.cpp.
Tensor library for ML. Foundation behind GGUF and llama.cpp.
Curated models optimized and packaged for Ollama.
GGUF, GPTQ, and AWQ versions of popular models.
Safe way to store and distribute tensors by Hugging Face.
Accurate post-training quantization for generative pre-trained transformers.
Easy-to-use LLM quantization package with user-friendly APIs.
Activation-aware weight quantization for efficient LLM compression.
Lightweight CUDA wrapper for 8-bit and 4-bit quantization.
Efficient finetuning of quantized LLMs.
Half-Quadratic Quantization -- fast and accurate.
Extreme LLM compression with incoherence processing.
State-of-the-art additive quantization for 2-bit compression.
Finetune LLMs 2-5x faster with 80% less memory.
Streamlined tool for fine-tuning LLMs with many methods.
Unified framework for fine-tuning 100+ LLMs with a web UI.
Parameter-Efficient Fine-Tuning by Hugging Face.
Transformer Reinforcement Learning: RLHF, DPO, PPO.
PyTorch-native library for LLM fine-tuning.
Pretrain, finetune, and deploy 20+ LLMs. By Lightning AI.
Interact with your documents using LLMs, 100% privately.
Your second brain, powered by generative AI.
Chat with your documents on your local device.
Personal AI assistant for notes, documents, and images.
Open-source RAG engine based on deep document understanding.
Clean, customizable RAG UI for chatting with your documents.
LLM orchestration for RAG, agents, and search pipelines.
High-performance inference of OpenAI's Whisper in C/C++.
Fast, local neural text-to-speech. Optimized for Raspberry Pi.
Deep learning toolkit for text-to-speech with many voices.
Text-prompted generative audio model by Suno.
Whisper reimplementation using CTranslate2, up to 4x faster.
Offline speech recognition supporting 20+ languages.
Human-level TTS through style diffusion.
Modular Stable Diffusion GUI with node-based workflow.
The most popular Stable Diffusion web UI.
Offline, open source Midjourney-like experience.
Professional creative engine with polished node-based UI.
Optimized A1111 fork with better memory management.
State-of-the-art text-to-image model by Black Forest Labs.
Training scripts for LoRA, DreamBooth, and textual inversion.
Comprehensive GPU benchmarks for local LLM inference.
Apple Silicon performance examples and benchmarks.
AMD's open-source GPU computing platform for ML.
NVIDIA's toolkit for AI on RTX GPUs.