Sep 3, 2025

硬件--，软件++，智能##

The "Hardware Failure Week" ordeal continues: my phone lost Wi-Fi on Monday, automatically rebooted upon unfolding on Tuesday, the alarm went silent on Wednesday, and the speakers failed completely by Thursday. Meanwhile, on my Windows laptop, the "Windows Module Installer" has been running a system update for three days with CPU usage exceeding 30%, giving me a perfect chance to explain to my daughter what speeds were like in the 386 and 486 era: opening a program takes three to five minutes, and the screensaver renders at less than one frame per second, creating a rather unique effect...

Consequently, replacing the phone became the only option. Naturally, a foldable screen remains an irreplaceable requirement for me. Fortunately, the Samsung Fold 6, which I criticized at its launch, has entered its habitual rapid price-drop mode in retail. Thus, the brand choice was easy: hardware-wise, I just need a foldable eight-inch screen and a system with as little manufacturer intervention as possible (in the Galaxy AI era, Samsung's OS has increasingly moved toward stock Android).

However, I still feel uneasy. Considering the expected lifespan of a new phone is about two years, I've spent a lot of money on things I didn't want: three cameras with barely any upgrades, the S-Pen, excessive RAM, and SSD storage...

So, to minimize my negative emotions, I opted for the 256GB version.

For the first time in over a decade, I felt "justified" in not only avoiding the top-tier model but specifically buying the base model. In fact, if there were options like "choose the number of cameras" or "removable battery," I would definitely keep the highest specs for the chip, RAM, and screen while cutting everything else. Not only would it save money, but it would also give me a sense of pride in "contributing to Mother Earth."

Indeed, counting all my old phones and pads, the number of cameras has reached a terrifying level of redundancy. Why can't manufacturers provide modular camera units, allowing users to freely choose high or low configurations, macro or telephoto lenses?

Yes, in today's mature consumer electronics supply chain, the cost of offering custom configurations might be higher for manufacturers than providing a default full set. Yet, there must be a new balance to be found between visible economic costs, invisible environmental costs, and emotional value.

Perhaps this balance is already starting. There are rumors of Samsung's upcoming "lite" version of the Fold 6, Apple's recurring iPhone SE, and the iPhone 16, which actually looks more attractive than the Pro this time given the price point.

Yes, we still need faster SoC chips and higher bandwidth/capacity RAM. But given that physical dimensions cannot increase, camera quality has long since plateaued, and screen resolution has basically peaked. Storage capacity? Do we really lack space now? What we lack is battery life.

Hardware--.

Just a day ago, Kyutai-lab open-sourced its voice dialogue model, Moshi. Yes, you can now deploy it locally on a laptop, chatting with your Mac without worrying about data leaks. You can even swap the underlying language model if you're unsatisfied with Moshi's defaults to improve the "voice assistant's" intelligence.

Of course, this model can also be deployed directly on a phone. Not just the latest models—even those from three to five years ago have sufficient specs.

So, in the less than six months since Meta released LlaMA-3, phones can run text models, generate images, engage in voice dialogues, and likely, by the end of this year, generate and run code directly—especially in an Android environment. This is no longer an unreachable ideal.

But what kind of software do we still need? We need a lot:

We need automatic summarization of the most important daily emails and messages—not just categorization, but picking out what needs attention, potentially with intelligent auto-replies. Apple and Google's Gmail are working on this, but it's not yet good enough.
We need intelligent scheduling and reminders...
We need smarter well-being and "peace of mind" settings.
We need automation for tedious, repetitive, and time-consuming "grinding" tasks in games.
We need an all-in-one integration of music, video, images, and text.
We need a dynamic balance mechanism with electronic devices and social networks, where stepping away doesn't cause anxiety about potential losses, and deep immersion doesn't lead to guilt. This is also a crucial part of point 3.

...

None of the above require newer hardware; they are purely software-based. Yet, it's hard to find apps that have everything ready as we did in the mobile internet era, because "Software++" is a process of refinement between technology and humans.

Over twenty years ago, Microsoft released a new programming language called C# (pronounced C-Sharp). It remains one of my favorite languages (alongside R). The "#" does not mean a hashtag; it represents "++" and "++" again—essentially, C++ incremented twice.

Unfortunately, due to Microsoft's failed internet and mobile strategies, C# and .NET were gradually relegated to the "cold palace" of that era.

Looking back twenty years, if the first generation of AI was represented by "Deep Blue" and "Deeper Blue," the second generation was AlphaGo and AlphaZero based on neural networks. That was Intelligence++.

Today, in the "Intelligence#" era initiated by GPT, I recently wrote about whether generative AI has entered "garbage time." This doesn't mean it's "cold"; on the contrary, it's because the overall situation is settled, making it seem "boring" to many.

But imagine: one year after GPT-4, we have the peer-level open-source LlaMA-3.1. Soon after, we have SOTA-level open-source text-to-image Flux. Within 24 hours, the open-source dialogue model Moshi. Just days ago, the open-source AI IDE VoiD, Alibaba's QWen-2.5, and even open-source solutions for OpenAI's newly released "Strawberry" o1...

We can still look forward to Claude 3.5-Opus, Gemini-1.5-Ultra, better versions of SORA, open-source alternatives, and breakthroughs in text-to-3D.

In the "Intelligence##" era, we have the best ecosystem ever, with closed-source and open-source advancing side-by-side. We have roughly found the gateway to AGI. "Intelligence##" will continue to advance in giant strides, though likely not in the ways we are familiar with, because intelligence has already crossed the "singularity" in terms of the time dimension.