gguf-ing on the shoulders of giants

Home Models Top Models Newest

Model Library

ollama and llama.cpp

Open on Hugging Face

Open on Hugging Face

Open on Hugging Face

Open on Hugging Face

Open on Hugging Face

Highlights of HuggingFace Models Usable by Ollama and llama.cpp

Discover premium GGUF models from HuggingFace that are optimized for use with Ollama and llama.cpp inference engines. These models have been carefully selected for their performance, compatibility, and community support.

🏆 Top Performing Models

Leading language models including Llama 3.2, Qwen 2.5, Phi-4, and Gemma 3 that deliver exceptional results across various tasks.

⚡ Optimized for Speed

GGUF quantized models designed for fast inference with minimal memory usage while maintaining high quality outputs.

🎯 Specialized Models

Task-specific models for coding (CodeLlama, DeepSeek-Coder), reasoning (DeepSeek-R1), and embeddings (BGE, Nomic).

🌍 Multilingual Support

Models supporting multiple languages including Qwen series for Chinese, Llama multilingual variants, and more.

💡 Getting Started

These models are ready to use with:

Ollama: Simple installation and management
llama.cpp: High-performance C++ inference
Various quantization levels: Q4_0, Q8_0, IQ4_XS for different performance/quality trade-offs

Community Driven: These models represent the best of the HuggingFace community, regularly updated and maintained by researchers and developers worldwide.

© ggufy