When I suddenly thought of using this title to make some habitual outlooks, I re-watched "Inception" once again.
Everything starts with an idea, and then it never stops spinning like a top. This is the entirety of the fundamental principle of Generative AI: continuously relying on some correlation to predict the next word, thereby generating an entire article; generating images through random input and diffusion; generating videos based on temporal correlation; generating 3D content based on spatial correlation...
We use this "dreaming" method to help algorithms understand human language, human vision, human hearing, human thought, and ultimately the entire human world.
I don't know if the appeal of "AI" would have dropped significantly over the past year if it had been understood or explained this way from the beginning. However, I know that as more people understand that the essence of Generative AI is "dreaming," a world called "Reverie" is developing at an incredible speed, and a future belonging to the "Reverists" (Dream Weavers) is beginning.
In "Inception," the original term for the dream weaver is "Architect." Interestingly, this is the same title given to the Big Boss who created the virtual world in "The Matrix."
Mapping the real world completely into a digital world is one of the most natural ideas we have as human beings, and many people have been working towards this, including myself.
However, as I started to combine certain tools and use them for generation, a sense of déjà vu as a "Reverist" began to emerge. Although the segment in "Inception" about designing mazes mentions that many "impossibilities" can be constructed in a dream, the scenes must be as realistic as possible—rather than simple repetitions of the real world—to make the "subject" believe. Thus, the creator of the dream calls themselves an "Architect," suggesting a strong foundation in reality.
This is the result when viewed from a human perspective. However, our current Generative AI is not designed this way. Or rather, the so-called Transformer architecture does not make the algorithm cater to humans; instead, the algorithm creates a completely different space from its own standpoint:
The process of mapping from the human world to this space is called encoding, and the process from this space back to the human world (form) is called decoding. Simply put (not strictly correct in a technical sense, but helpful for understanding), early ChatGPT used tokens to express the basic elements of that space, using vectors of tens of thousands of dimensions (later optimized to over a thousand) to represent differences between tokens. Diffusion models for image generation use latent to represent that space, and the process of translating it back into images that a computer can display (human-acceptable input/output, thus the computer acts as a bridge between carbon-based and silicon-based life) is called decoding.
Therefore, the biggest difference from the dreams in "Inception" is that those "dreams" still belong to a form understandable by humans—a human space. The space encoded by AI models belongs to a space understandable by the model but not by humans.
Perhaps we can still say that space is a mapping of the human world, but if we don't stand in our subjective view but rather in the model's view, we can say that space is the human world through the eyes of AI.
As with a question I like to ask friends recently: What is a Transformer? From a human perspective, it is a mutual mapping between the human world and the model world; from the AI model's perspective, it is its understanding of the human world.
This answer is actually incomplete, as encode and decode still reflect my strong subjectivity. From the AI model's perspective, it has a space initially generated by understanding the human world; but since "it" understands it, "it" can also continuously generate "new" content—that is its space, its world.
I discussed this topic with GPT for a long time, and finally, we agreed to call this world and space "Reverie."
Therefore, the so-called "Alignment" is the management and intervention of the content within "Reverie" from our subjective perspective.
This is the only part of this article involving some superficial technical explanation.
However, the purpose of creating AI is not just to manage and intervene. We hope it can not only help us write articles but also do things that any specific individual might not be able to achieve: write code, generate images, videos, and music. We even hope it provides a continuous stream of imagination, constantly surprising us.
Perhaps everyone who hopes to continuously dig "surprises" out of "Reverie" can be called a "Reverist" (Dream Weaver). Perhaps in 2024, "Reverists" will not be a minority but will grow at an unpredictable speed.
Maybe everyone engaged in some form of content production can be called a "creator," and perhaps over 99% of creators will become "Reverists."
Therefore, I have named the result of this stage of experimentation—a video collection—【Reverist 造梦师】.
Models, tools, combined with human creativity and thinking, are rapidly changing the human world alongside us under the infinite transformations of a rapidly expanding "Reverie."
In 2024, as we see more and more models and tools upgrading from version 0.1 to 1.0, or from Gen1 to Gen2, the explosion begins. This is likely not a firework display but an acceleration: 0.1 to 1.0 took six months, 1.0 to 2.0 took three months, and 2.0 to 3.0 (if it survives) might take one month...
We have already seen that with design tools like Figma and Canva, even for complex interfaces, designers don't need to hand over finished designs to developers; they just click "Ask AI" or "Gen," and an entire interface is automatically generated.
Similarly, we have seen that a programmer without a design background can hand-draw a few crude sketches, upload them to a design tool, click "Ask AI" or "Gen," and the interface is likewise automatically generated.
We have already seen that it's not just more videos being generated by AI, but that videos require some form of AI generation.
We will see anyone being able to generate music that meets their requirements; video BGM will no longer require purchasing copyrights.
We have seen that while the power of text remains strong, a form that turns text directly into short video dramas may quickly capture more short-video traffic.
We have seen that everyone can have multiple digital avatars—singing, dancing, and speaking "English" with the fluency of eighteen different countries.
We have seen more and more student assignments and papers being completed under the guidance of GPT.
We have seen a large number of new protein structures and material structures being discovered rapidly.
We have seen that the amount of information visible at any given moment far exceeds the sum of certain periods in history.
...
This is all what we want to see after Alignment. What about those parts belonging to "Reverie" that haven't been aligned yet?
What if my input is just a single photo, but it can evolve into an endless, infinite loop of video?
"Reverie" can be generated infinitely. Thus, a "Reverist" is both happy and tormented because they are working with a "colleague" who never tires and has infinite productivity.
Therefore, we need tools with complete and powerful functions: writing text, writing code, producing images, producing videos, making background music, one-click synthesis. It should read all information on the internet or in private knowledge bases and provide its own suggestions (Reverie). It should be user-friendly (frictionless): if it can be done via voice, never use a keyboard; if it can be described via text, never draw a flowchart; if it can use drag-and-drop for simple flowcharts, never require code.
It should offer professional camera + professional Photoshop functionality with the smooth operation of taking a photo on a smartphone.
Perhaps we no longer need a single category of "programmers": one type becomes a "Reverist" (it is always happy to build tools for oneself); another type builds tools for "Reverists"—but wait, can a programmer who is not a "Reverist" build a good tool?
Therefore, there is only one class of tools: those built by "Reverists." Midjourney, Runway, and Pika, without exception, all possess a certain "premium feel." This premium feel must be derived from an application perspective rather than a programming perspective. The qualitative improvement in user-friendliness (simplification) empowered by Generative AI has completed the penetration of professional tools to ordinary users.
The application perspective in today's era likely corresponds to a digital or data perspective. A year ago, when someone asked me which was most important among computing power, algorithms, and data, I had an answer but found it inconvenient to respond. Now, I can say it clearly: Data—data in the eyes of a "Reverist."
Please forgive me for not being able to delve deeper into this point; I have a line between personal interest and work, and "crossing the line" is not my habit.
Please also forgive the brevity of the following section. I am not only shifting language styles but also keeping it brief, as long-windedness is also not my habit.
Rather than saying "Reverie" is constantly influencing and changing the human world, it is more accurate to say that for a long time, it is the "Reverists" who are undergoing rapid self-transformation.
If we want AI to help us produce in professional fields, "Reverists" need to input a large amount of domain-specific information, just as the quality of a generated video is largely determined by the quality of the original photo source material. One must carefully study the mutual mapping relationship between input information and "Reverie," design workflows, and control parameters to make the results converge significantly, because all professional fields converge on underlying logic. Thus, we saw a series of tools and methods emerge in 2023: LangChain, Function Call, RAG, and AI Agents. For most people, what these represent doesn't matter, because as long as the right workflow and parameters are found, they can be converted into "one-click" simplified operations.
However, this experimental process is full of challenges and uncertainty. Yet, the "Reverists" involved must be happy, because this is one of the possible 🔑 (keys) to AGI within our current understanding. It's not that the technical tools and methods mentioned above are the keys to AGI, but that the process of experimentation itself might be where the answer lies.
It is easy to understand that the first and largest stage for "Reverists" is video content production and gaming, because "Reverie" is a natural fit. When the real world can no longer satisfy us, dreams provide the most suitable space—that infinite possibility under infinite imagination. As described in "Inception": once you experience it, the real world is no longer enough.
We can easily create independent space after independent space for ourselves—perhaps one space for immersive English learning, another for traveling the globe...
We can easily create a digital assistant for ourselves to block over 90% of meaningless external interference: repetitive information bombardment, boring meetings, and discussions...
We can live serial lives as a parallel world—countless dreams and one alignable parallel universe after another...
We can rely entirely on the open-source ecosystem to build our own digital worlds, or we can build closed loops on cloud platforms like Google or Microsoft. We can also choose to be completely taken over by tools like ChatGPT, waiting for its continuous progress to extend our infinite possibilities. We can still install a bunch of apps, use Notion to write novels, let Midjourney draw, let Runway and Pika make videos, try to learn Unity (with its decreasing entry barrier) to make games, use Coqui to mimic our own voices, and use Wav2Lip to match expressions and lip movements...
We can also challenge ourselves by using 3D printing to make a personalized robot or an autonomous toy car...
We can use 3D generation to constantly recreate the physical world we inhabit and continuously modify it within "Reverie"...
We can continuously test new drugs, new materials, and new technologies within "Reverie"...
In November 2022, a laboratory tool called ChatGPT unexpectedly opened a door. Outside that door is the world I call the "Reverie Dreamland," and it also opened another possibility for us—one I call the "Reverist Dream Weaver."
In 2024, the biggest mistake we could make is to view what we call "AI" from a subjective perspective.
In 2024, the experience and lesson we must learn is: if I am a "Reverist," standing from the perspective of "Reverie," what would we humans look like?
Perhaps this is the most likely direction for the evolution of "AI" in our eyes.