RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

1HKUST 2Peking University 3Shanghai AI Laboratory
*Equal Contribution Corresponding Author
Interpolate start reference image.

High-quality and style-consistent textures of multiple styles can be synthesized given meshes.
Fine-grained texture control and flexible scene editing are well supported.

Interactive Fine-grained Texture Control


The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex , to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input.

Method Overview

We utilize off-the-shelf 3D shape generative models along with a given room layout to assemble the room mesh. In the coarse stage, the 3D room mesh is unwrapped to a panorama depth map, based on which we generate a panoramic image of the room as a coarse reference. Then in the fine stage, the empty room will be further refined in perspective views. Afterward, we employ an iterative inpainting pipeline to refine and paint every independent 3D object in the room.

Interpolate start reference image.


Living Room

Emauromin Style

Pokémon Style

Impression Style

Anime Style

Real Estate Style

Space Style


Emauromin Style

Real Estate Style

Kawaii Style

Living-dining Room

Emauromin Style

Space Style

Minecraft Style



      title={RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting}, 
      author={Qi Wang and Ruijie Lu and Xudong Xu and Jingbo Wang and Michael Yu Wang and Bo Dai and Gang Zeng and Dan Xu},