免费的Gemini2.0,本地的DeepSeek-R1,加进化的Obsidian

免费的Gemini2.0,本地的DeepSeek-R1,加进化的Obsidian


A long time ago, I noticed an open-source project called Khoj. At the time, the keywords I tagged in my mind were: Obsidian plugin support, Agent, and local knowledge base management. However, during that period, I was building knowledge graphs for tech companies and needed to call upon many more external tools, so I could only add Khoj to my To-Do List (another reason was that it didn't support the Gemini API at the time).

Recently, while comparing and studying DeepSeek-R1, GPT o1/o3, Gemini 2.0, and Claude 3.5, I decided to deploy Khoj.

First, let's look at the introduction on Khoj's homepage. This page is the interface for the online version. It's available for free but with a very low quota; the paid version is $30 per month, which is quite expensive.

Khoj Pricing

However, for me, self-hosting is the way to go. The project is open-source and can be installed directly via Python's PiP or using Docker. Considering that containerized deployment is more convenient for the backend database, code execution sandbox, and search engine, I chose Docker.

Since it's being used on Windows, it requires a WSL environment. By comparison, Mac is much more convenient (it's the same old story—I continue to be bearish on Windows gradually falling behind in the AI era).

Starting the installation process:

  1. Download Docker Desktop.
  2. Install docker-compose in the terminal: brew install docker-compose
  3. Download Khoj's docker-compose.yml: mkdir ~/.khoj && cd ~/.khoj wget https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml

The above code comes from Khoj's official help documentation: https://docs.khoj.dev/get-started/setup?os=macos

The difference is that the official documentation suggests installing Docker via brew, but since the Docker Desktop version is standard on my system, I skipped that step. For those who haven't installed Docker, I recommend downloading the Desktop version directly from the official website for peace of mind.

However, considering that accessing Docker in mainland China might be inconvenient, some network settings will be necessary.

After downloading the docker-compose.yml file, you need to modify some configurations. For simplicity, I've screenshotted the official documentation.

Config documentation

There are several ways to proceed: one is using APIs, which currently support OpenAI, Claude, and Gemini (finally!). For me, I un-commented the GEMINI_API_KEY line without hesitation and filled in my API key from Google AI Studio.

I must continue to praise Gemini 2.0, not just because I think it's the best model currently available, but also because Google generously provides developers with a free quota: 1500 calls per day, with no more than 10 requests per minute. Basically, as long as you aren't "distilling" their models, it's sufficient for most use cases. Even if it isn't enough, you can use 1.5; even today, 1.5 is perfectly fine for work, just a bit slower.

Gemini Config 1

Gemini Config 2

In my initial setup, I only changed the username/password and the Gemini API Key.

Next, run: docker-compose up

The first time you pull images and initialize, it takes some time. Once the command line returns the following prompt, it's ready for use.

Docker run log

A small "gotcha": for the first step, you shouldn't open the local application homepage (localhost:42110) in your browser. Instead, go to the admin page: http://localhost:42110/server/admin/, which will require the username and password you configured earlier.

On the management page, you need to add a model. Select "Chat Model" from the left sidebar, then click the "+" icon in the top right corner.

Admin model setup 1

Admin model setup 2

In my model settings page, I called Gemini 2.0 Flash. Fill it out as shown above—the model name must be correct, the model category should be "Google", and the API should be "Google Gemini".

Don't forget to click "Save" in the bottom right corner.

The model setup is now complete. Next, go to the homepage at localhost:42110. It looks like this:

Homepage interface

At this point, you still can't chat. You need to configure the model in the UI. Select "Settings" from the left sidebar.

Settings menu

Under the "Chat" section in the Models column, select gemini-2.0-flash-exp.

Select Gemini Model

Returning to the homepage, here comes the hurdle. I routinely entered "introduce yourself" in the chat box, but there was no response. I thought I had set something up wrong. I tried many options and spent half an hour troubleshooting. Finally, I tried typing "/", selected the "general" command, and clicked the Khoj (Agent) icon above; only then did I get output.

Command and Agent toggle

The output looks like this.

General output

Another confusing part: this is clearly the output of the "Khoj" Agent, which seems to be using Gemini 1.5.

Khoj Agent info

To use Gemini 2.0, you need to add another Agent. In the "Agents" section of the left sidebar, I added an Agent named "test". After activating it, the feeling of Gemini 2.0's rapid token output finally arrived.

Agent creation

Gemini 2.0 output

That completes the basic setup (actually, you don't need the "/" command; just activating an agent allows you to chat directly).

With that working, I moved to my second goal: using local DeepSeek-R1. The method, of course, is using Ollama (installation is simple; just download it from the website).

Considering the performance on a laptop, I chose the llama3-8B distilled version: deepseek-r1:8b-llama-distill-q8_0.

Pulling the model is easy: ollama pull deepseek-r1:8b-llama-distill-q8_0. The weight file is 8.5GB, so download time depends on your internet speed.

Once downloaded, stop the previous docker-compose (either by pressing Ctrl+C in the terminal or running docker-compose down). Modify the docker-compose.yml file by un-commenting the OPENAI_BASE_URL line.

Then run docker-compose up again and go to the admin page to configure the model.

First, add an API.

API setup

The name and key can be anything; the base URL should be filled as shown above.

Then add the model. The model name is 8b-llama-distill-q8_0, the model type must be "openai", and the API should be "ollama" (the API setup step might be unnecessary, but I did it to be safe).

Model setup with Ollama

In the chat page, I also added an agent named "deepseek-llama". When testing with a math problem, the output process was very close to the online full version of R1, and the answer was correct.

DeepSeek Math Test

Beyond chat and Agents, Khoj offers several features: search (web and local knowledge base), image generation (requires config), voice generation (requires config), and code generation.

Since image and voice generation require extra setup time, I only tested code generation and execution. Following the previous math problem, it provided Python code to calculate it and ran it to get the result (verified by backend logs).

Code and Execution

Of course, outputting charts directly would be even more intuitive.

Chart output

At this point, both calling Gemini 2.0 online and using local DeepSeek-R1 were successful. I'm satisfied with Khoj's completeness; it's starting to feel like a comprehensive solution.

Another feature I value highly is the Obsidian plugin. You can find it by searching community plugins in Obsidian; with over 34K downloads, it's quite popular.

Obsidian Plugin Search

In settings, just fill in the local address and port in the "URL" field.

Unfortunately, there's no Agent selection in the plugin, so I assume it defaults to Khoj's Gemini 1.5 Agent. I couldn't find where to change this yet.

Plugin settings

After installation, you can open a chat window in Obsidian's right sidebar. Conversations now include search functionality (configured in Khoj admin). In other chats, it even returned results from my Obsidian documents. Building a personal knowledge base is becoming increasingly important.

The plugin's features still need refinement, but after installation, Obsidian documents gradually sync to the local database as a Khoj knowledge base. This process is seamless and provides a good experience.

Since both Khoj and its plugin are open-source, the difficulty of developing more features is lower, and the possibilities are greater. This concludes my initial trial of the new tool.

Postscript

Over the past few years, I've tried hundreds of new tools like this. Being a perfectionist, I always want to find the perfect solution: simple, direct, All-in-One.

Every trial starts with dissatisfaction with current tools and workflows, but after every trial, I usually return to my familiar stack. Take the growing "Second Brain" concept—I've used Obsidian for over five years. In that time, I've tried almost every note-taking tool I could find, from the paid Notion to new projects with only dozens of stars, and I've even written a few simple note tools myself. Yet, almost all my documents remain in Obsidian. Despite the frustration, I believe that someone out there shares my ideas but has more talent and diligence to write the necessary plugins.

Khoj's appearance reinforces this belief.

Because the excellent and free Gemini 2.0 API lowers the entry barrier to zero, and because DeepSeek-R1 elevates the power of local open-weight models, Khoj's architectural vision becomes viable.

This belief's true source might be the open and open-source ecosystem. Obsidian provides a perfect blend: the app is closed-source and free (with paid options), but plugins are completely open and open-source. Gemini follows a similar path: the model is closed, but developers get enough free quota to experiment. Many AI tools like Khoj are the same: the core code is open-source, with polished commercial versions available.

Pure software development and service have always been difficult, and maintaining open source is even harder. However, as I've always said, today's AI grew out of the open-source ecosystem. Any branch that leaves the open environment will quickly wither from a lack of nutrients.

← Back to Blog