Overview

Introduction

Autonomous Driving has become one of the most popular research fields in recent years, enabling a car to sense the environment and drive without human manipulation. However, this task is highly challenging because of the rapidly-varying environment in the real world. Also, training an agent in the real world seems infeasible due to the unaffordable trial-and-error cost. Fortunately, Sim-to-Real methods help us deal with the problem by training the model in simulation environments and adapting it to the real world. It also enhances the robustness of the model because we can create and utilize unlimited amounts of training data in the simulation environment.

Semantic Segmentation is crucial information to self-driving agents. However, training a deep neural network requires a significant amount of labeled data, especially for dense prediction tasks such as Semantic Segmentation. It will be highly time-consuming and expensive if we have to build a dataset for every single task. Thus, we want to leverage the synthetic labeled data generated by simulation environments. Unfortunately, there exists a domain gap between synthetic data and real-world data. UDA-SST is a promising technique to minimize the domain gap, producing a high-quality segmentation map. We train the UDA-SST model with the GTA5 dataset and raw images we collected from the campus to generate segmentation maps without extra labeling effort.

Currently, there exist several well-known UDA-SST works. We propose four effective techniques which can be easily integrated with the current works. First, we adopt consistency learning, helping the model to learn robust and general knowledge. Further, we replace depth information used in CorDA with edge information which is easier to acquire. Lastly, we use the quantization and gray world technique to calibrate images. Combining these techniques, we achieve state-of-the-art performance in UDA-SST works.

To build simulation environments for Sim-to-Real methods, we create a virtual world with the 3D game engine Unity. The environment looks like the street view we can see in our daily lives, including roads, sidewalks, pedestrians, trees, bicycles, cars, and terrain. We train an RL agent in the Unity environment, where it can only drive on the road and dodge all the obstacles. Finally, we combine RL and UDA-SST to achieve obstacle avoidance on roads of NTHU. The RL agent will make decisions based on the segmentation maps predicted by the UDA-SST model. With the Sim-to-Real approach, our agent demonstrates exceptional performance in both the simulation environment and the real world.

In a nutshell, our contributions are as follows:

We propose four easy-to-implement techniques that are compatible with the existing UDA-SST works and can bring considerable improvement.
By combining our proposed techniques with the existing self-training framework, we achieve state-of-the-art performance on the UDA-SST benchmark.
We develop a sim-to-real autonomous driving car by combining UDA-SST and RL methods and prove that our UDA-SST techniques are efficient and practical.

Detail

Introduction