Large Language Models (LLMs) have revolutionized the way we interact with artificial intelligence, offering capabilities ranging from text generation and translation to complex reasoning. Traditionally, accessing these powerful models required connecting to cloud-based services. However, advancements in software like Ollama now allow users to run these sophisticated AI models directly on their local computers, providing a private, offline, and secure alternative. This local operation eliminates the need for constant internet access and protects sensitive data by ensuring it never leaves the user's machine. Furthermore, unlike many online AI services that impose usage limits, running LLMs locally with Ollama often comes without such restrictions.
Before embarking on the installation and configuration process, it is essential to ensure your Windows 10 system meets certain prerequisites. These foundational elements will ensure a smooth and successful setup of both Ollama and AnythingLLM.
Ensuring these prerequisites are met will lay the groundwork for a successful journey into running local LLMs.
Ollama provides a simplified interface for downloading, managing, and running open-source LLMs on your local machine. Unlike cloud-based solutions, Ollama ensures your data never leaves your computer, offering enhanced privacy and security while eliminating subscription costs.
AnythingLLM provides the user interface to interact with the LLMs managed by Ollama. You can create complex AI workflows with various ways to configure the models. It is available as a desktop application or a Docker container. This guide focuses on the desktop version, which is easier for most users.
To use LLMs managed by Ollama within AnythingLLM, you need to configure the connection.
In AnythingLLM, find the settings or preferences section (look for "Settings", "Preferences", or an icon). Then locate the "LLM Preference" or "LLM Configuration" section.
In the LLM preferences, select "Ollama" from the list of available providers.
Enter the base URL of your Ollama server: http://127.0.0.1:11434 (This is the default. If you changed the Ollama port, use that instead).
Make sure to save the changes in the AnythingLLM settings.
AnythingLLM might automatically attempt to connect. If successful, the manual input field might be hidden. If it fails, double-check the URL and that the Ollama server is running (check the system tray icon or use `curl http://localhost:11434`).
AnythingLLM uses Ollama for both the LLM and potentially for embedding models. Make sure to configure Ollama and the base URL in both relevant sections (LLM provider and embedding provider) for full functionality.
Ollama uses a command-line interface (CLI) to manage models.
Now you can instruct AnythingLLM to use the downloaded models.
Ensure "Ollama" is selected as the LLM provider in AnythingLLM's settings (as in Step 3).
In the LLM configuration, choose a "Chat Model". This might be a dropdown or a text field. Select or enter the exact name of the model you downloaded (e.g., `llama3.2`, `mistral`).
You'll also need to configure the "Embedder" or "Embedding Model" if you want to use Ollama for document processing. Select "Ollama" and choose a suitable embedding model (some models are specifically named with "embedding").
Navigate to the main chat interface. You can now type prompts and receive responses powered by the LLM running locally via Ollama.
If you've uploaded documents, the LLM can use them as context (Retrieval-Augmented Generation - RAG). AnythingLLM finds relevant snippets using the embedding model and passes them to the LLM.
Explore different chat modes like "Conversation" (retains history) and "Query" (simple Q&A).
Ollama automatically checks for updates. A notification will appear on the system tray icon. Click the icon and select "Restart to update". You can also manually update by downloading the latest installer from ollama.com.
Go to the AnythingLLM download page (anythingllm.com/desktop) and download the latest installer. Re-run the installer; this will overwrite the existing application while preserving your data.
If using Docker, stop the container, pull the latest image (`docker pull mintplexlabs/anythingllm`), and restart the container.
Use the AnythingLLM interface to create, manage, and delete workspaces and documents. Refer to the AnythingLLM documentation for details.
Create workspaces to organize documents and chats. Look for a button or menu option to create a new workspace.
Upload documents (PDF, TXT, DOCX, etc.) to your workspace. AnythingLLM will process them. Manage documents within the workspace (view, select, delete).
In the chat interface, type questions. AnythingLLM uses the LLM (via Ollama) and document content to generate responses. Experiment with different question types.
If you have multiple LLMs, switch between them in AnythingLLM's settings.
Be clear and specific in your prompts to guide the LLM.
AnythingLLM offers agent flows for automated workflows. Explore the "Agent Skills" page for more.
These requirements cover operating system, processor, RAM, disk space, and optionally a GPU.
Ollama:
AnythingLLM:
Ollama:
AnythingLLM:
Requirement | Ollama (Minimum) | Ollama (Recommended) | AnythingLLM (Minimum) | AnythingLLM (Recommended) |
---|---|---|---|---|
Operating System | Windows 10+ (64-bit) | Windows 10+ (64-bit) | Windows 10+ (64-bit) | Windows 10+ (64-bit) |
Processor | 2 cores | 4 cores or better | 2 cores | - |
RAM | 8 GB | 16 GB or more | 2 GB | 4 GB |
Disk Space | 1 GB + model space | 5 GB + model space | 5 GB | 1 GB for install |
GPU (Optional) | NVIDIA (Compute 5.0+) or AMD | NVIDIA (Compute 6.0+) or AMD | - | - |
RAM requirements for Ollama depend *heavily* on the LLM size.
RAM needed is closely related to the LLM's parameter size. Quantization (reducing precision) can reduce memory usage.
Parameter Size | Approximate Minimum RAM (16-bit) | Notes |
---|---|---|
1B-3B | 4-8 GB | Smaller models. |
7B | 14-16 GB | Common general-purpose size. |
13B | 26-32 GB | Better performance, more RAM. |
30B-70B | 60-140+ GB | High-performance, requires substantial RAM and often a GPU. |
These are rough estimates. Actual usage varies.
More precise estimation formula:
M = (P × (Q/8)) × 1.2
Where:
Example: 70B model, 4-bit quantization: (70 × (4/8)) × 1.2 = 42 GB RAM.
Start with smaller models and gradually increase if your system allows.
This guide has provided a comprehensive process for running LLMs locally on Windows 10 using Ollama and AnythingLLM. You should now have both applications installed, configured, and be ready to download and use LLMs.
Local LLMs offer significant advantages: privacy, data security, offline access, and no usage restrictions.
Explore the Ollama model library and experiment. AnythingLLM provides an intuitive interface, especially for working with documents.
Consult the official documentation for Ollama (https://ollama.com) and AnythingLLM (https://anythingllm.com) for advanced features and support.