遇到一个史上最强手机三维扫描APP

遇到一个史上最强手机三维扫描APP


When I was testing 3D modeling on mobile phones previously, I downloaded two apps: PolyCam and RealityScan. However, these two apps are essentially only usable on iPhones or iPad Pros equipped with LiDAR (Light Detection and Ranging).

I have also thoroughly tested PolyCam. For example, when modeling an outdoor pond, the operation, speed, and results were actually quite acceptable. It basically involves walking around the scene with an iPhone or iPad Pro, and then processing it in the app for about a minute.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416248-5752.gif

However, for my use cases, it’s still hard to consider this a primary method. In terms of imaging quality, while it’s decent, it doesn’t compare to the image quality I get from rigorous shooting with a professional camera followed by cloud-based modeling. For shooting small objects, such as the following scene, the results are very poor due to the minimum focal distance of the LiDAR and the lens: the precision is far from enough, and the image quality is a mess—completely unusable.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416231-9628.gif

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416302-6485.gif

Of course, the GIFs above also demonstrate a standard capture process. At the beginning of the capture, all areas are covered in blue shadows, meaning insufficient data has been collected. Once the blue shadows in the required area completely disappear, the capture is complete, and you can proceed to the modeling process. The entire process is quite smooth.

As a side note, Apple has included a complete ARKit in its developer suite, where the ObjectCapture library allows for 3D modeling on an iPhone or iPad with just a few lines of code. Leading with code libraries has always been Apple's product development path. We can expect these libraries to fully unleash the potential of hardware-software integration on the Vision Pro, which sold out within half an hour last night.

Back to the topic. Objectively speaking, this Shargeek power bank is extremely difficult to scan because its surface is entirely glass, which is highly reflective. Moreover, the internal circuit boards and electronic components are very detailed. It’s hard enough just to take a photo of it, let alone 3D model it. I even made multiple attempts using my brand-new Revopoint Miraco 3D scanner, but most ended in failure.

Originally, I planned to try 3D Gaussian Splatting (3DGS). The plan was to use a macro lens to capture enough detail and then run the model in the cloud. Based on relevant research papers, 3DGS offers significant progress in scene integrity, surface texture representation, and the performance of transparent or reflective objects.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416322-7126.png

This model has been out for half a year, and I've been following the progress of its variants. In the past month or two, I've noticed an increase in "AI-translated and curated versions" of papers in domestic self-media, though many lack deep understanding (I am preparing a comprehensive introduction to 3D modeling algorithms, models, and workflows for next week).

Anyway, to put it simply, 3DGS models can approach the photo quality of professional cameras to some extent, though their range of use is still quite limited. Running them requires a specific environment. Therefore, while there is a lot of discussion, actual usage is minimal. Many subsequent papers even use the same few datasets, and the code lacks flexibility.

However, this situation has changed. In the past month, two heavyweight apps or applications have emerged: KIRI Engine and Luma.AI. Today's protagonist is KIRI Engine. About a month ago, KIRI announced support for 3DGS models without requiring LiDAR. I downloaded it then, but 3DGS wasn't open yet, so I tried it once and left it there. A couple of days ago, I noticed the app had updated, and a Beta version of 3DGS appeared in the modeling options.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416228-5838.png

Following the usual pattern, the app is free to download, but advanced features require payment. I was prepared to accept the monthly fee of $14.99 or the annual fee of $59.99 (which is $5/month), but when paying, a new option appeared: an early-bird price for beginners at $35.99 annually ($3/month). I didn't hesitate and upgraded to the Pro version to unlock 3DGS.

The operation involves clicking [+] to add a model, selecting [3DGS], and either filming a video directly or uploading one from local storage. I started by filming directly. During the process, the app provides tips, such as moving the phone slowly, and shows capacity alerts (currently, the maximum video length is two minutes, corresponding to a processing limit of about 200 photos). After a few tries, I began filming videos separately and then uploading them. This has two benefits: better image quality and the ability to keep the flash on to further improve clarity.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416245-5150.gif

Another major advantage is that it’s device-agnostic—it works on both Apple and Android. This is because the models are actually processed in the cloud.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416225-4297.png

After uploading, the model joins a queue and then runs using cloud resources. You can check the results after a while. A good method is to take a batch of videos before bed, upload them, and "harvest" them the next morning. KIRI also has a web version where you can directly upload photos taken by professional cameras. Web account permissions are synchronized with the app, so models captured via mobile can also be browsed and edited on the web.

Let’s look at the modeling results for the same Shargeek power bank.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416325-280.gif

To be realistic, if you used this model for product promotion, you wouldn't sell a single unit. But this is the current "ceiling" for 3D modeling of such difficult objects—or rather, the ceiling for "idiot-proof, one-click" operations. However, don't forget that these models can be imported into software like Unreal Engine, Unity, AutoCAD, or Blender for fine-tuning. With adjustments from skilled designers, the final result can be absolutely stunning.

Even with these results, we can see that the glass reflection issue is basically overcome. Much of the text is legible, internal components are identifiable, and the size and texture are largely correct.

Some might ask: Is this AIGC? 3DGS can be viewed as a type of AIGC algorithm, but it's not the same as "text-to-image" or "text-to-3D" (by the way, 3D assets are currently in extremely short supply, so take the hype about text-to-3D with a grain of salt; there won't be massive progress in 2024).

This power bank is likely the hardest model. What about others?

For example, these flowers. I am very satisfied with the actual quality. The problem is that due to the 2-minute video limit, I couldn't capture a high-quality full circle around them. Could we expect higher permissions for, say, a ten-minute video?

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416279-4793.gif

Another example is Shargeek’s retro GaN charger. I didn't crop this model, so you can clearly see the environment and the blurring caused by moving too fast while recording. This is a common issue for NeRF and 3DGS models (3DGS is technically a variant of NeRF), but it can be resolved in post-processing. Also, a quick shout-out to Shargeek: it’s designed and made in China but sold directly overseas, and every product is a masterpiece.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416248-1253.gif

And some small figurines.

2024-01-20-遇到一个史上最强手机三维扫描app-1u0hgp-1772019416339-8065.gif

In Q4 2023, I said that 2024 would focus on AGI, AI hardware, and 3D reconstruction.

While 3D reconstruction still has a way to go before it reaches human-eye acceptance, we have seen accelerated progress since the launch of ChatGPT. We must also realize that while current results are purely computational, the workload savings for high-level designers in the post-processing phase are already immeasurable.

(For many, this is clearly not good news, as the trend of layoffs in Silicon Valley seems difficult to slow down through market forces alone.)

Finally, for those who want to try it, as mentioned, KIRI Engine is currently on sale for $35.99 annually (about 260 RMB). You can also use my referral code for more discounts or coupons for model exports: https://www.kiriengine.app/share/Invitation?code=YN4NI5.

KIRI Engine is developed by KIRI Innov. If you're interested, look them up; you might find some surprises.

← Back to Blog