从“第二大脑”到“增强大脑”,我们需要什么样的AI工具

从“第二大脑”到“增强大脑”,我们需要什么样的AI工具


After Google's NotebookLM went viral for its ability to generate near-human half-hour podcasts, the Meta team also released NotebookLlama under the Llama Recipes open-source repository.

NotebookLlama https://github.com/meta-llama/llama-recipes/tree/main/recipes/quickstart/NotebookLlama

Simply put, there are four steps: 1. Use the LLaMA-3.2-1B-Instruct model to preprocess PDF documents; 2. Use LLaMA-3.2-70B-Instruct to generate the podcast transcript; 3. Use LLaMA-3.2-8B-Instruct to refine the transcript; 4. Use parler-tts-mini-v1 and Suno, two TTS models, to generate audio.

The performance, of course, is currently far behind NotebookLM, but it at least provides a complete, open-source, and locally deployable alternative.

The original is great, but alternatives often have unique charm: those who like plug-and-play choose the original, while those who enjoy tinkering go for the alternative. Finally, after more than a decade of smartphone dominance, the diverse joy of the PC era is back.

Admittedly, over the past six months, NotebookLM has become a high-frequency application for me: it has helped me "devour" thousand-page professional books within a week multiple times, generating various granularities of auxiliary information to help enhance my memory.

However, I still believe these tools are far from my ideal workflow.

Seeing many call NotebookLM a "Second Brain," I personally find this term outdated. A "Second Brain" is more about helping organize memory, acting as an extension of personal memory. Although most so-called AI tools today haven't even mastered this step well, the broad prospect of Generative AI lies in creating scenarios, training intelligence, or directly "Augmenting the Brain."

Of course, the transition from "Second Brain" to "Augmented Brain" is a progressive relationship, and the same applies to the tool level. Although no tool is truly perfect yet, various attempts have given us a lot of inspiration.

The most important inspiration is: I need to write my own set of tools because there are so many existing tools to reference, and the emergence of Generative AI has made building one's own tools increasingly feasible from both a technical and process perspective.

Yes, writing my own tools is the most important realization of the past month. I'm grateful for every tool since Notion in 2017.

I. Tools of the "Second Brain" Era

(To keep it brief, I will provide very short evaluations for each tool based on my own user experience).

What is a "Second Brain"? My feeling and understanding is: with the explosion of information, terminal devices, and apps over the past decade, the brain's inability to process information and the fragmentation of data scattered in various corners have become the two biggest pain points. We need a set of tools and processes to aggregate and process information; a personalized "Second Brain" serves this purpose.

  1. The most representative example comes from Obsidian: information collection, organization, and association.

2024-10-29-从第二大脑到增强大脑我们需要什么样的ai工具-1crg15-1771992730609-2881.jpg

Obsidian is the tool I have used most over the past five years and has become standard for my team. It isn't open-source, but it feels like it:

  • Uses Markdown as the primary file format, making it easiest to integrate and edit multimedia content like text, images, video, and even code;
  • Powerful link and backlink functions; not only can documents be associated, but content within documents can be interlinked, matching the brain's thinking habits;
  • The core program is not open-source, but a rich community ensures a large number of plugin functions: it can execute programs, automate document organization, and integrate Generative AI;
  • Team collaboration can be managed via code repositories, facilitating knowledge sharing among team members;
  • The most powerful plugin, Excalidraw, is what I believe most resembles the future form of Generative AI tools (search "obsidian excalidraw" if interested);

Yes, Obsidian has many flaws—an editor that feels slightly outdated in the AI era and a file processing system that isn't very efficient—causing it to become harder to use as the knowledge base grows. Over the past five years, I've looked at dozens of tools to replace Obsidian, but I always end up back here.

  1. Notion. This is the most famous tool today. I was lucky enough to start using it in 2017.
  • Notion is currently the Markdown-based tool with the largest user base;
  • Notion was the first to introduce massive templates (e.g., schedule management, reading plans, project management). While the tool isn't open-source, the templates are, allowing users to share and collaborate—returning to the original form of the internet;
  • Notion was the first to integrate GPT for writing, summarizing, and editing;
  • As a commercial tool, the interface is beautiful;
  • Notion's downsides are equally obvious: compared to Obsidian, the editor is too rigid, and the lack of sufficient plugin support results in fewer functions. Also, while it offers a paid team version, $10/month per member just for information sharing feels expensive to me.

So, after migrating my team's work to Obsidian and integrating Generative AI, I abandoned Notion.

  1. Passing through Logseq, Heptabase, Clickup, Appflowy... In the search for an Obsidian replacement, I tried almost every similar tool available. Each had a highlight, but none were as powerful as Obsidian.

  2. Miro, Figjam, Diagram... Coming from a programmer background, I've always been a fan of flowcharts and mind maps. When the concept of rich-media whiteboards emerged, I used Miro and Figjam, but their lack of open-source status made me see their ceiling: limited extensibility and dynamic code execution capabilities.

  3. TlDraw. If you like the concept of an infinite canvas, you'll love Excalidraw. If you also like boundless freedom and extensibility, TlDraw has the potential to become a favorite.

  • Everything starts with a "blank canvas," then you can write, draw mind maps, paste screenshots, videos, share while in a meeting, collaborate on edits, and integrate all Generative AI functions;
  • The core program is simple—just a canvas—with all applications being "brainstormed" on top;
  • The advantage is limitlessness; the disadvantage is its simplicity, requiring a lot of work for content persistence;
  1. Affine, Anytype. They are similar yet different. They borrow advantages from all the tools discussed above. While many features are still on the roadmap, they offer infinite space for imagination. These two tools have a major advantage: because they were launched or rewritten almost after ChatGPT emerged, their architectural design is excellent. Being open-source, concise, and well-documented, they are very suitable for customization.

  2. My progress: I am integrating code based on TLDraw, Affine, and AnyType to build my own tool. The difficulty isn't the code, but constantly thinking through what I actually want to do.

II. What Do We Need?

I still hold a neutral view of ChatGPT's "sudden emergence": it should have been just a tool on the road to "artificial intelligence," but it raised false expectations and subsequent disappointment over natural flaws like hallucinations. However, it did attract enough capital and talent to the field.

However, ChatGPT's biggest problem is that it has caused many users to not know what they want, while simultaneously becoming "desensitized" to its truly great features.

  1. Do we need smart search? When ChatGPT came out, everyone thought it would replace search, then they were disappointed by "hallucinations." Actually, we need it to perform two functions: summarize existing knowledge and actively search/organize the latest information. Search is the most important way to obtain valid information; autonomous processing of that information is the most important function of an "Augmented Brain." With ChatGPT "quietly" launching search functions, the real shift has begun.

  2. Do we need it to write code? To some extent, laziness is a major driver of technological progress. An "agent" where a person can just speak to generate code to complete tasks is something almost everyone loves.

  3. Do we need it to write documents or make PPTs? Maybe for now. But I maintain that documents and PPTs are for humans to see. In the AI era, we might not need them at all;

  4. Do we need it to draw or generate animations? As a productivity tool, absolutely. As a photographer, I still hold the view that camera manufacturers should think seriously: digital photography as an art might disappear within five years.

  5. Do we need it to talk to us? Yes, both for efficiency and for the increasingly lonely nature of humanity.

So, whether it's ChatGPT, Claude, Gemini, or NotebookLM, no single one satisfies all our ideas. But Generative AI offers a massive possibility: we have one interface, and humans return to their nature, communicating through language, expressions, body language, and even writing/drawing. Keyboards and mice will likely disappear. Writing and drawing? They are rooted in culture. Just as music remains, but its medium has shifted from tapes to streaming.

The ideal tool is still blowing in the wind. But Claude's Artifacts, ChatGPT's Canvas, and Google's NotebookLM—don't they first give us a blank piece of paper (canvas) and then constantly add material through model calls?

From 0 to N, not 1 to N. Because "1 to N" is standardized replication, while "0 to N" is human nature: everyone is different, and every "brain" is unique.

III. What Are We Missing?

  1. Better interaction, both hardware and software. For multimodal models, context limits are huge. Even though Gemini 1.5 Pro can handle a 2M context, processing high-definition video is still a struggle. We need hardware upgrades (on-device), improved model processing power, and major algorithm optimizations.

  2. Data storage systems, or rather, filesystems. Generative AI has made vector databases famous. How to let models personally understand, store, call, and generate information is difficult relying solely on databases or models. AI filesystems are a massive opportunity.

  3. "Augmentation." The emergence of Generative AI has given us a massive chance to reflect and reconstruct ourselves: Can this workflow be completed by a model? If so, why does the workflow still need to exist? Where are we? Rebuilding everything from scratch has gradually relieved my increasing internal friction.

To be continued: I've accidentally written over four thousand words again. Every choice in an article attempts to "align granularity," but in that constantly extending network structure, every alignment feels like a massive regret.

← Back to Blog