Model Library
ollama and llama.cpp
Highlights of HuggingFace Models Usable by Ollama and llama.cpp
Discover premium GGUF models from HuggingFace that are optimized for use with Ollama and llama.cpp inference engines. These models have been carefully selected for their performance, compatibility, and community support.
🏆 Top Performing Models
Leading language models including Llama 3.2, Qwen 2.5, Phi-4, and Gemma 3 that deliver exceptional results across various tasks.
⚡ Optimized for Speed
GGUF quantized models designed for fast inference with minimal memory usage while maintaining high quality outputs.
🎯 Specialized Models
Task-specific models for coding (CodeLlama, DeepSeek-Coder), reasoning (DeepSeek-R1), and embeddings (BGE, Nomic).
🌍 Multilingual Support
Models supporting multiple languages including Qwen series for Chinese, Llama multilingual variants, and more.
💡 Getting Started
These models are ready to use with:
- Ollama: Simple installation and management
- llama.cpp: High-performance C++ inference
- Various quantization levels: Q4_0, Q8_0, IQ4_XS for different performance/quality trade-offs
Community Driven: These models represent the best of the HuggingFace community, regularly updated and maintained by researchers and developers worldwide.