Date:

Scaling High-Fidelity 3D Mesh Generation with Meshtron

A Mesh as a Sequence of Tokens

Meshes are one of the most important and widely used representations of 3D assets. They are the default standard in the film, design, and gaming industries and they are natively supported by virtually all the 3D softwares and graphics hardwares.

A 3D mesh can be considered as a collection of polygon faces, most commonly consisting of triangles or quadrilaterals. An important property of a mesh is its topology, which refers to the organization of these polygon faces that discretize the 3D surface. Artist-created meshes usually feature highly informative and well-organized topologies that align closely with the underlying structure of the object.

Having artist-like topology is essential for editing, texturing, animation, and efficient rendering. However, creating these meshes manually by artists is a labor-intensive task that requires significant time and expertise in 3D modeling.

A mesh can be extracted algorithmically from other 3D representations. In a typical text-to-3D or image-to-3D generation system, a neural generator produces a neural field, which is then converted into a mesh using algorithms such as variants of Marching Cubes [FlexiCubes (NVIDIA), NMC, DiffMC] or Marching Tetrahedra [DMTet (NVIDIA)].

Unfortunately, these hand-designed algorithms produce dense meshes that do not have artist-like topology, hindering the quality and usefulness of these methods.

Figure 1. Mesh comparisons

Meshtron provides a simple and scalable, data-driven solution for generating intricate, artist-like meshes of up to 64K faces at 1024-level coordinate resolution. This is over an order of magnitude higher face count and 8x higher coordinate resolution compared to existing methods.

A diagram shows previous works generate meshes with limited face counts at a low, 128-level spatial resolution and Meshtron meshes with controllable face counts of up to 64K at a higher, 1024-level spatial resolution. Compared to previous works, Meshtron produces better quality meshes at similar face counts, while being capable of generating much more sophisticated meshes.

Figure 2. Comparison of previous low-poly meshes with low resolution and Meshtron-generated meshes with controllable face count and high resolution

A Mesh as a Sequence of Tokens

Meshtron is an autoregressive model that generates mesh tokens. It shares the same working principle as autoregressive language models such as GPTs.

A mesh can easily be converted to a sequence of tokens. The basic building block of a mesh is a triangle face, which can be represented with nine tokens:

  • Each triangle has three vertices.
  • Each vertex has three coordinates.
  • Each coordinate can be quantized to obtain a discrete token.

A mesh can thus be represented uniquely as a sequence of tokens by chaining these face tokens together according to a bottom-to-top sorted order.

GIF shows that a mesh can be converted to a unique sequence of tokens by sorting the vertices and faces. The obtained sequence has a length of 9 times the face count.

Figure 3. Mesh representation by token sequence

Meshtron is an Efficient Mesh Generator

Another efficiency-boosting technique used by Meshtron is sliding window attention. A conventional Transformer model has a context length that grows with the sequence length, leading to quadratic growth of compute and linear growth of memory consumption as the sequence becomes longer. This leads to significant slowdown with long sequences both during training and generation.

Instead, Meshtron maintains a fixed-length context window of 8192 faces. During training, the mesh sequences are randomly cropped to up to 8192 faces. During inference, the token generated more than 8192 faces ago is evicted from the KV cache. The sliding window technique leads to a constant memory cost and constant token throughput that never slows down as the mesh size grows.

With the help of these techniques, Meshtron achieves 2.5x faster token throughput and over 50% saving in memory both during training and inference, while generating better quality meshes.

Meshtron is Highly Controllable

The current version of Meshtron accepts the following control inputs:

  • Point cloud: Determines the shape of the output mesh.
  • Face count: Determines the density of the output mesh.
  • Quad ratio: Switches between quad and triangle tessellation.
  • Creativity: Can be adjusted to generate extra details not present in the point cloud.

As almost all of the 3D representations can be converted to point clouds, Meshtron can

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here