Local LLMs for Zed and Obsidian

Got no WiFi or cash to pay for LLMs? Learn how to use local LLMs with Ollama and Claude Code.

For coding and writing, I rely on Zed and Obsidian, respectively. LLMs can be helpful as coding or writing/editing assistants. While Claude Code with Sonnet and Opus is excellent, each token either costs money or eats up your limited quota, which can vanish rapidly. Sometimes a local LLM is fine—or your only option when the WiFi is gone. In those cases, I use Ollama with local LLMs I download beforehand. You do not need a high-end machine, as I run everything on a 2020 MacBook Air M1 with only 16 GB RAM. Here is what you need to do to replicate my setup.

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Download model(s)

Pick the model(s) you wish to have offline from the list of supported models or based on the intended use case and your machine’s specs. You can also check which LLMs fit on your machine to pick the best for your specific hardware, though that website’s recommendations are not always accurate.

For my M1, Qwen3 (qwen3:14b) is decent, because it runs comfortably on my machine and it supports tool usage in Claude Code. Note that tool usage is minimally functional, not fabulous. DeepSeek (deepseek-r1:8b) is perhaps better at reasoning and Google’s Gemma (gemma3:12b) or Microsoft’s Phi4 (phi4:14b) beat Qwen3 at writing prose and editing, but none supports tools, so they do not integrate with Claude Code. Since I prefer an identical setup for offline and online usage, Qwen3 is the obvious choice. A selection of a few models downloaded on my machine means I can switch as the need arises.

ollama pull qwen3:14b

Such a small LLM is mostly good for quick back-and-forth Q&As, code autocomplete, or short summarization, not multi-file refactorings, crafting prose with nuance, or scientific reasoning. On better hardware, you can run larger models, which improve the quality considerably. Models with 32 billion weights or more are noticeably more capable, but they need at least 20 GB of unified memory with the default quantization. For GPT-4 class models, you need to accommodate 70 billion weights (approx. 40 GB), and current frontier models need a few hundred billion weights or more (60 GB+).

You can see the quantization with ollama show qwen3:14b | grep quantization. In my case, it shows Q4_K_M, which means 4-bit quantization with the M (medium) variant of the K-quants method, which uses mixed precision to ensure important weights (e.g. in attention layers) have higher precision. This is the community default on Ollama. For a model with 14 billion weights, such as Qwen3, Q4_K_M quantization has roughly 4.5 bits/weight. You therefore need 14 billion weights with 4.5 bits/weight and 8 bits/byte or 7.9 GB just for the weights. A good rule of thumb is to add 10% overhead, so around 9 GB, which fits inside the 16 GB available on the MacBook Air M1.

Launch Claude Code with a local model

ollama launch claude --model qwen3:14b

Depending on the model size and context window, the responses within Claude Code can be slow with basic instructions taking a minute or so. Inside Ollama it is much faster, though. From ollama run qwen3:14b --verbose, I see that it runs at 6–8 tokens per second. A smaller model (e.g. deepseek-r1:8b) clocks in at 10–12 tokens per second.

Configure Zed

First, install the Claude agent ACP:

npm i -g @agentclientprotocol/claude-agent-acp

Or grab the pre-built binaries straight off GitHub without the need for Node.js.

Then paste into settings.json:

"agent_servers": {
  "claude-acp": {
    "type": "registry"
  },
  "Ollama/Qwen": {
    "type": "custom",
    "command": "/opt/homebrew/bin/claude-agent-acp",
    "args": [],
    "env": {
      "ANTHROPIC_BASE_URL": "http://localhost:11434",
      "ANTHROPIC_AUTH_TOKEN": "ollama",
      "ANTHROPIC_MODEL": "qwen3:14b"
    },
  },
},

The first agent server is the original Claude Code from the ACP configuration. Ollama/Qwen is the entry for Qwen3 through Ollama’s Claude Code integration. The path in command must match which claude-agent-acp.

Query Obsidian

Local LLMs like Qwen3 are not powerful enough for complex questions, but you can ask it to summarize, write, or edit single files in your Obsidian vault. All files are plain Markdown, so that is supported out of the box.

I have a function in .zshrc, so I can run lclaude instead of claude from the vault’s root folder:

lclaude() { ollama launch claude --model qwen3:14b "$@" }

The experience is the same, but of course with less powerful models. Still, even a six-year-old laptop can handle LLMs locally with relative ease.