Documentation

Getting Started with ggufy

ggufy is a powerful tool that helps you discover, download, and manage GGUF (GPT-Generated Unified Format) models from HuggingFace and other sources. This guide will teach you the essential commands and workflows.

What are GGUF Models?

GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models in GGUF format are optimized for inference and can be used with various inference engines like llama.cpp, Ollama, and others.

Installation

# Clone the ggufy repository
git clone https://github.com/nbiish/ggufy.git
cd ggufy

# Install dependencies (if any)
# Follow the setup instructions in the README

Core Commands

Search Models

ggufy search "model name"

Search for GGUF models on HuggingFace by name or keywords.

Download Models

ggufy download username/model-name

Download a specific GGUF model from HuggingFace.

List Local Models

ggufy list

Show all GGUF models you have downloaded locally.

Model Information

ggufy info username/model-name

Get detailed information about a specific model.

Finding Models on HuggingFace

HuggingFace is the primary repository for GGUF models. Here's how to effectively search and discover models:

Search Strategies

Use specific tags: Search for "gguf" + model type (e.g., "gguf llama")
Filter by category: Look for models tagged with "text-generation", "question-answering", etc.
Check model cards: Read the model descriptions and requirements
Verify quantization: Ensure the model is in GGUF format for optimal performance

Popular Model Categories

🏠 Starter Models

Perfect for beginners - Llama 3.2, Qwen 2.5, Phi-4

🧠 Reasoning Models

Advanced reasoning - DeepSeek R1, specialized logic models

🌍 Multilingual

Multiple languages - Qwen series, multilingual Llama variants

⚡ Lightweight

Fast inference - Gemma series, small parameter models

Complete Workflow Example

Here's a typical workflow for discovering and using GGUF models:

1

Search for Models

ggufy search "llama 3.2"

This will show you available Llama 3.2 GGUF variants.

2

Download Your Choice

ggufy download meta-llama/Llama-3.2-3B-Instruct-GGUF

Download the specific model variant you want to use.

3

Verify Download

ggufy list

Confirm the model is available locally.

4

Use with Inference Engine

# Example with llama.cpp
./main -m models/meta-llama/Llama-3.2-3B-Instruct-GGUF/llama-3.2-3b-instruct-q4_0.gguf \
       -p "Your prompt here"

Load the model into your preferred inference engine.

Advanced Tips

🔍 Model Discovery

Use HuggingFace's advanced search with filters like:

gguf AND llama AND size:3B
gguf AND quantization:Q4_0
gguf AND task:text-generation

📊 Model Comparison

Compare different quantizations of the same model:

Q4_0: Fast, smaller size, good quality
Q8_0: Higher quality, larger size
IQ4_XS: Balanced performance and size

⚙️ Performance Optimization

Choose models based on your hardware:

CPU: Smaller models (1B-3B parameters)
GPU: Larger models (7B+ parameters)
Mobile: Quantized lightweight models

Helpful Resources

📚 ggufy GitHub

Complete documentation and source code for ggufy

🔥 Popular GGUF Models

Browse the most downloaded GGUF models on HuggingFace

🦙 llama.cpp

Popular inference engine for running GGUF models

🐙 Ollama

Simple tool for running GGUF models locally

Guffy Tool README

Integrated reference content from the Guffy repository. Use the link below to open the source, or read it embedded here.

Open on GitHub

Signature Verification

Use the following information to verify commits and releases signed by Nbiish Kenwabikise.

GPG Key Fingerprint

67B8 55EC 8DB1 20B5 6BA6 9420 68F4 E3D4 B068 32C0

Public GPG Key

View on GitHub (GPG) View on GitHub (SSH)

Join the Community

Connect with other GGUF model enthusiasts, share your discoveries, and get help with model selection and optimization.

GitHub Discussions HuggingFace Spaces