Gemini2.0案例分享:一个页面完成搜索到处理到多模态PPT。再看看GPT-o1和Gemini-Thinking的思考结果

Gemini2.0案例分享:一个页面完成搜索到处理到多模态PPT。再看看GPT-o1和Gemini-Thinking的思考结果


I have never hidden my love for Gemini 2.0. Of course, since AI Studio integrated Gemini 1.5 Pro, it has been the most indispensable tool in my serious work. Now, the release of tldraw computer has given me many more reasons to love it.

The best AI application powered by Gemini 2.0 has appeared, and it is completely free.

I can roughly see that the vision of "completing all work on just one page" should be realized by 2025.

A typical workflow would look something like this: Search -> Structured data -> Further iteration by visiting sub-links -> Structured data -> Preliminary processing -> Various modal outputs (Slides, images, voice generation).

Therefore, I created a "small yet fully-featured" workflow within tldraw.

The entire process only requires three "instructions" marked in blue in the menu bar.

For data acquisition, I used Jina Reader, launched by jina.ai in the first half of this year: it outputs web pages in a structured format. It only requires a link and is extremely convenient.

I also sent the flowchart to GPT-o1 and its competitor Gemini-Thinking to see their understanding of this workflow.

Output from GPT-o1:

The flowchart below illustrates a multi-step automated information processing and summarization process, which can be roughly divided into the following stages:

  1. Acquiring News Links First, the program extracts trending news links related to Nvidia from a specified URL (e.g., https://finance.yahoo.com/quote/NVDA/news/). These links are collected and saved, for example: ["https://finance.yahoo.com/news/nvidia-q3-earnings-beat-164079676.html", "https://finance.yahoo.com/news/nvidia-q3-earnings-top-estimates-10000015.html", ...]

  2. Visiting and Summarizing Each News Item After obtaining all links, the program "browses" or crawls the content of each webpage. It then extracts core points for each news item (such as financial figures, market performance, analyst evaluations, etc.) and generates a highly detailed summary. These summaries are integrated into a comprehensive content compilation covering:

    • Nvidia Q3 financial data (Revenue, Profit, EPS, etc.)
    • Stock price performance (Fluctuations after the earnings release)
    • Market and industry impact (Effects on data center, gaming, AI sectors, etc.)
    • Future outlook (Analyst/official outlook for the next quarter or subsequent products/strategies)
  3. Translation and Voice Output After generating the English summary, the user instructs it to be translated into Chinese and output as an "exceedingly pleasant" voice file. Thus, the program first completes the English-to-Chinese translation (ensuring smooth language and accurate wording) and then calls the text-to-speech function to synthesize Chinese audio suitable for reading.

  4. Final Summary Report The final "Nvidia Q3 Earnings Report" is displayed on the webpage (or interface), including:

    • Key Highlights: Main earnings highlights
    • Financial Performance: Detailed financial performance
    • Market Impact: Impact on the market
    • Detailed Analysis: In-depth interpretation and background information
    • Accompanied by visual charts (such as stock price changes, financial data comparisons) to allow readers to understand the results more intuitively.

Overall, the core logic of this workflow is: Crawl news links → Batch summarize news content → Multi-language translation (including text-to-speech) → Generate final report. The entire process utilizes web crawling, natural language processing (including summarization and translation), and speech synthesis to help users automate information acquisition and interpretation, ultimately producing high-quality, visualized earnings interpretation content.


Output from Gemini-Thinking:

This workflow describes the process of handling and summarizing information regarding Nvidia's Q3 earnings report.

  • Step 1: Data Source - Text The starting point is a text box containing a series of links. These links point to different news reports related to Nvidia's Q3 earnings.

  • Step 2: Instruction - Extract the hot news links An instruction box tells the system to extract "hot news links" from the text box above. This means the system needs to identify links related to Nvidia news within the text.

  • Step 3: Intermediate Result - Extracted Links (Output) A list of extracted Yahoo Finance links is produced.

  • Step 4: Instruction - Browse and generate highly detailed summary Another instruction box tells the system to browse all extracted links and generate a highly detailed summary. This requires the system to visit the pages, scrape content, and integrate information.

  • Step 5: Intermediate Result - English Detailed Summary (Text) The system generates a detailed summary in English containing key earnings metrics, business demand, analyst comments, and stock price reactions.

  • Step 6: Instruction - Translate into Chinese and output in a friendly format The next instruction asks to translate the summary into Chinese and present it in a user-friendly format.

  • Step 7: Final Result - Chinese Text Summary (Text) The system outputs a Chinese text box summarizing the Nvidia earnings report.

  • Step 8: Data Source - Chinese Text Another independent process starts with a Chinese text box summarizing the earnings report.

  • Step 9: Instruction - Summarize key points An instruction box asks to summarize the "nice highlights" (extracting more attractive and easy-to-understand points).

  • Step 10: Final Result - Website Style Summary (Website) The system outputs a simulated website-style summary titled "Nvidia Q3 Earnings Report," including Key Highlights, Financial Performance, Market Impact, and Detailed Analysis.

  • Step 11: Auxiliary Information - Speech & Image The workflow includes a voice player and an image showing a stock price increase chart.

Conclusion: Both models ignored the use of jina.ai, likely due to the knowledge cutoff date (August 2024 for Gemini). Aside from that, both accurately explained the workflow: from data acquisition to analysis, extraction, summarization, translation, and generating multi-modal information (interactive pages, voice, and images).

By late 2024, the answers to many questions have become certain: AI is rapidly entering every aspect of productivity; the key is how many ideas "people" have. For nearly two years, I have used Excalidraw for slides and roadshows. Every time I prepared materials, I had a strong desire to integrate current AI features without switching pages. tldraw computer has at least completed the most important step. All of this was made possible by Gemini 2.0.

← Back to Blog