Chi-Wei Hsiao


Research


image lost

HorizonNet: Learning Room Layout with 1D Representation and Pano Stretch Data Augmentation

Cheng Sun, Chi-Wei Hsiao, Min Sun, Hwann-Tzong Chen
CVPR 2019, [paper] [code]

We present a new approach to the problem of estimating the 3D room layout from a single panoramic image. We represent room layout as three 1D vectors that encode, at each image column, the boundary positions of floor-wall and ceiling-wall, and the existence of wall-wall boundary. The proposed network, HorizonNet, trained for predicting 1D layout, outperforms previous state-of-the-art approaches. The designed post-processing procedure for recovering 3D room layouts from 1D predictions can automatically infer the room shape with low computation cost—it takes less than 20ms for a panorama image while prior works might need dozens of seconds. We also propose Pano Stretch Data Augmentation, which can diversify panorama data and be applied to other panorama-related learning tasks. Due to the limited data available for non-cuboid layout, we relabel 65 general layout from the current dataset for finetuning. Our approach shows good performance on general layouts by qualitative results and cross-validation.

image lost

Specialize and Fuse: Pyramidal Representation for Semantic Segmentation

Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen
[paper]

We present a novel pyramidal `output' representation for the task of semantic segmentation which is inspired by the observation that features at different scales are good at predicting different region. The semantic output is in pyramidal form, and we train the model to only activates at the coarsest possible cells in the pyramid so that only one pyramid level is responsible for each pixel and different pyramid levels could specialize in the assigned pixels. In addition, we design a coarse-to-fine contextual module that accords with the essence of our pyramidal output representation. We validate the effectiveness of each key module in our method through extensive ablation studies. Our approach achieves state-of-the-art performance on ADE20K, COCO-Stuff 10K and Pascal-Context.

image lost

YawP^3: Yaw-invariant Parametrization and Panoramic Planar reconstruction

Cheng Sun, Chi-Wei Hsiao, Ning-Hsu Wang, Min Sun, Hwann-Tzong Chen

This paper presents the first method for indoor planar reconstruction from a panoramic image. We leverage three unique characteristics in indoor scenes: 1) most planar surfaces are either horizontal or vertical, 2) the 2D orientation of vertical planes co-vary with the change of the yaw angle of the camera, and 3) most vertical planes share common 2D orientations. To inherently ingrain these priors, our model first segments horizontal/vertical planes (HV-planes) for separate treatments. Most importantly, we propose a novel yaw-invariant parameterization for vertical planes to solve the yaw ambiguity problem of 360 and effectively cluster vertical planar segments with a shared 2D orientation. Finally, we fuse the predicted geometry information and plane instance segmentation into a piece-wise planar reconstruction. We evaluate our method on our newly extracted panoramic piece-wise HV-planar dataset derived from three large-scale RGB-D panorama datasets. For benchmark baselines, we train two state-of-the-art planar models on our dataset with modifications to help them adapt to our 360 HV-planar dataset. Our method significantly outperforms all baselines and achieves superior visually pleasing reconstruction of indoor scenes.

image lost

Flat2Layout: Flat Representation for Estimating Layout of General Room Types

Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen
[paper]

This paper proposes a new approach, Flat2Layout, for estimating general indoor room layout from a single-view RGB image whereas existing methods can only produce layout topologies captured from the box-shaped room. The proposed flat representation encodes the layout information into row vectors which are treated as the training target of the deep model. A dynamic programming based postprocessing is employed to decode the estimated flat output from the deep model into the final room layout. Flat2Layout achieves state-of-the-art performance on existing room layout benchmark. This paper also constructs a benchmark for validating the performance on general layout topologies, where Flat2Layout achieves good performance on general room types. Flat2Layout is applicable on more scenario for layout estimation and would have an impact on applications of Scene Modeling, Robotics, and Augmented Reality.

Awards


  • 1st place, MOST Formosa Speech Grand Challenge Warm-up Contest
  • WeTech Qualcomm Global Scholars Program

Experience


image lost

Appier (ML Scientist Intern)
content-based recommendation system

image lost

Academia Sinica (Intern)