Run LLMs Locally
For those who want privacy, offline access, or just the satisfaction of running your own AI. We'll walk you through setting up Ollama, llama.cpp, or similar tools.
Why Run AI Locally?
When you use Claude, ChatGPT, or other cloud AI:
- Your conversations go to a server
- You need an internet connection
- There are usage limits
- The service could change or disappear
Running AI locally means:
- Complete privacy: nothing leaves your computer
- Offline access: use AI without internet
- No limits: run as many queries as you want
- Learning: understand how these systems actually work
What You'll Need
Hardware Requirements
Minimum (for small models):
- 8GB RAM
- Any modern CPU
- Works on most laptops from the last 5 years
Recommended (for better models):
- 16GB+ RAM
- Modern CPU with good single-thread performance
- SSD for storage
For best results:
- 32GB+ RAM, or
- A gaming GPU with 8GB+ VRAM (NVIDIA works best)
Don't have powerful hardware? Start with smaller models. They're surprisingly capable.
The Easiest Path: Ollama
Ollama is the simplest way to run AI locally. It handles all the technical details.
Installing Ollama
Mac:
- Go to ollama.ai
- Download the Mac app
- Install and run it
Windows:
- Go to ollama.ai
- Download the Windows installer
- Run the installer
Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Your First Local Model
Open a terminal (Command Prompt on Windows, Terminal on Mac) and type:
ollama run llama3.2
Ollama will download the model (a few GB) and start it. Then you can chat:
>>> Hello! What can you help me with?
That's it. You're running AI on your own computer.
Popular Models to Try
| Model | Size | Good For |
|---|---|---|
| llama3.2 | 2GB | General chat, fast responses |
| llama3.1:8b | 4.5GB | Better quality, still fast |
| mistral | 4GB | Concise, efficient responses |
| codellama | 4GB | Programming help |
| phi3 | 2GB | Lightweight, good for testing |
Try different models:
ollama run mistral
ollama run codellama
Managing Models
List installed models:
ollama list
Remove a model:
ollama rm modelname
Pull a model without running:
ollama pull modelname
Using a Chat Interface
The command line works, but you might want a nicer interface.
Open WebUI
A beautiful web interface for Ollama:
- Make sure Ollama is running
- Install Docker Desktop (docker.com)
- Run:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
- Open your browser to
http://localhost:3000
Now you have a ChatGPT-like interface for your local models.
Other Options
- LM Studio (lmstudio.ai) - Standalone app with nice UI
- GPT4All (gpt4all.io) - Simple desktop app
- Jan (jan.ai) - Clean, modern interface
Understanding Model Sizes
Models come in different sizes, measured in parameters (billions):
- 1-3B parameters: Fast, runs on anything, limited capability
- 7-8B parameters: Good balance, needs decent RAM
- 13B parameters: Better quality, needs 16GB+ RAM
- 70B parameters: Near cloud-quality, needs 64GB+ RAM or good GPU
For most people, 7-8B models are the sweet spot.
What Local AI Can Do
Works well:
- General conversation
- Writing help
- Code assistance
- Brainstorming
- Summarization
- Simple analysis
Limitations:
- No internet access
- No image generation (usually)
- Smaller knowledge base than cloud models
- Slower than cloud (usually)
- Can't match GPT-4 quality (yet)
Privacy Considerations
Local AI is truly private:
- Conversations never leave your computer
- No logging by companies
- No training on your data
- No one knows what you're asking
This matters for:
- Sensitive business information
- Personal matters
- Journaling or therapy-like conversations
- Anything you wouldn't want stored
Troubleshooting
"Not enough memory" Try a smaller model, or close other applications.
"Model downloads slowly" Models are large. Be patient, or download during off-hours.
"Responses are slow" Normal for larger models. Try a smaller model, or consider a GPU upgrade.
"Ollama won't start" Make sure you don't have another instance running. Restart your computer.
Going Deeper
Custom System Prompts
Create specialized versions of models:
ollama create myassistant -f ./Modelfile
Where Modelfile contains your customization.
API Access
Ollama provides an API at localhost:11434:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello!"
}'
Use this to integrate local AI into scripts or applications.
Fine-tuning
For the truly ambitious: train models on your own data. This requires significant technical knowledge and hardware.
Comparison to Cloud AI
| Aspect | Local | Cloud |
|---|---|---|
| Privacy | Complete | Limited |
| Cost | Free after setup | Subscription/usage |
| Quality | Good, not best | State-of-the-art |
| Speed | Depends on hardware | Fast |
| Offline | Yes | No |
| Updates | Manual | Automatic |
Many people use both: cloud AI for best quality, local AI for privacy-sensitive tasks.
Next Steps
Want to use AI for a major creative project? Try Write a Book with AI for an ambitious undertaking.
Or explore Build a Course Curriculum to design your own learning path.
Or browse all Projects.