Agent智能体的诱惑与陷阱

Agent智能体的诱惑与陷阱


This was intended to be a very short article, just to quickly organize the conclusions in my head.

Thanks to a certain product, "Agents" have become popular again. Putting aside capital considerations, the concept itself is incredibly attractive: having AI work for you is everyone's dream. The decreasing barrier to building Agents on top of large models makes this goal seem infinitely close.

However, this might just be a "beautiful trap."

  1. No matter how powerful an Agent is, its capabilities come from the underlying base model: language models for understanding and interaction, code generation, instruction building, and various "uses" like computer use or browse use. It relies on multi-modal models to recognize images, videos, and even sound. Without the base large model, an Agent is useless. This is why Agents have become increasingly useful as models have progressed, but it is also a massive bottleneck for all so-called general-purpose Agent applications developed on large models.

  2. If an Agent can complete a certain task, why do we still need that task? For example, if an Agent can generate Word documents and PPTs, lowering the barrier to a level everyone hopes for—the "novice" level—then why do we still need Word or PPT? We are entering a phase of rebuilding "various protocols."

  3. If Agents are used as productivity tools, they must be integrated with specific production environments, which requires synchronization with humans and private data. This process presents both massive technical hurdles and significant management obstacles. Solving these issues requires not just effort, but also immense trust, space, and mutual understanding and tolerance.

  4. If Agents are viewed as consumer (C-end) tools, it seems that successful C-end products in the past always had some form of emotional connection. They solved the first 99 kilometers of a problem through the best technical means or algorithms of the time, leaving the final kilometer to humans or human-to-human connection. I believe that in the future—though not now—there will certainly be a further "technological compression" that solves 99.99 kilometers of the problem, leaving the final 0.01 kilometer to humans, human connections, and even emotional bonds between humans and machines.

But for now, we need to do more toward this goal, during a period that has clearly entered a hype cycle.

← Back to Blog