Jul 11, 2025

2024关于AI的十大观点

I have always been trying to find a balance: expressing my views reasonably without crossing the boundaries of work and compliance. The previous post tried one way; this one tries another.

1. How will Large Language Models (LLMs) continue to progress?

There is not much room for progress in pure language models alone. With the emergence of multi-modality, there is a greater need for models with planning capabilities. Whether it is OpenAI, Google, or other model companies, everyone is working in this direction.

2. Will the Transformer architecture be replaced?

In the long run, the answer is yes. However, it may be hard to see this within the next six months; it is more likely that we will see architectures coexisting with Transformer in the next major version.

3. Will model parameters see another order-of-magnitude increase?

What is certain is that mainstream models in 2024 will all be multi-modal. There is still room for the number of model parameters to increase several times over, and for the volume of training data to increase by orders of magnitude.

4. At what level are text-to-image, text-to-video, and text-to-3D models currently?

I once drew a diagram: the level of text-to-image models is roughly equivalent to where ChatGPT was when it was first released a year ago; text-to-video maturity is a bit lower, and text-to-3D is lower still. Of course, I am using text-to-image and text-to-video here to represent 2D visual models, and text-to-3D to represent 3D visual models—they don't necessarily have to be "text-to." Collectively, they are visual models.

5. Will visual models bring a ChatGPT-level shock?

In my view, the changes brought by visual models in 2024 will be far greater than those of ChatGPT, although the marginal effect on "imagination space" is decreasing. This is because all these possibilities were already envisioned by everyone in 2023; they are now just being gradually implemented. However, never underestimate the power of change when "science fiction becomes reality." ChatGPT merely brought infinite space for imagination; visual models in 2024 will permanently transform many industries.

6. Will 3D models and the Metaverse become a reality soon?

To be realistic, the probability of this happening in 2024 is not very high. Models will progress very quickly, and 3D effects will get better and better, but until they reach a point where they can "deceive" the human eye, it remains a process of quantitative change.

7. Is there a lot of space for AI PCs?

I have said long ago that the best AI PC already exists: the MacBook with Apple's M-series chips. I already discussed this issue when the M3 chip was released, and I don't want to repeat it.

Written upon the release of Apple's next-generation M3.

8. What about AI phones?

If we simplify the classification of AI, there are roughly three types: industrial use, such as various types of recognition we've long been accustomed to, or enterprises optimizing services and products; enthusiast DIY, such as deploying one's own models, which was actually the biggest growth area in 2023; and consumer-end applications, where the biggest cloud service is ChatGPT, yet many scenarios and needs still call for privatization to an individual's phone. The biggest improvement for AI at the application level in this round is "dummy-proof operation."

Furthermore, every technological revolution sees hardware lead and software follow. Fifteen years ago, how many people believed smartphones would achieve absolute dominance? Today, the answer to this question is similar: AI will definitely land on some kind of hardware that we can reach anytime, anywhere. It might not be a phone, but currently, landing on a phone seems like the smoothest path.

9. How to compete in the application space?

I believe the traditional software development model has come to an end. The identities of the "applier" (or perhaps more appropriately, the creator) and the "coder" will gradually merge. Creators who cannot use models to complete their work, and programmers who need others to propose requirements before they can develop, will both be rapidly phased out.

The best applications will definitely be developed by the best creators; the best programmers remain those who understand applications the best.

In this situation, how do we compete? You are welcome to discuss this offline.

10. Will this round of AI follow the past—quickly falling silent after hitting a bottleneck, waiting for the next explosion?

In any single direction, such as LLMs or image generation models, this could happen. But the biggest difference this time is that, with neural networks and the Transformer architecture as the foundation, breakthroughs are being achieved in all directions simultaneously. Increasingly, people are realizing that the Transformer itself represents a potential method we've found to "explain the human world to machines."

Therefore, we see that after the breakthrough in language models, it drove breakthroughs in visual models, then 3D, and the progress of many other models. Conversely, breakthroughs in visual models further open up space for the progress of language models.

As mentioned before, the Transformer will certainly hit bottlenecks, but the momentum brought by multi-point breakthroughs shows no signs of decay for most of 2024.