## Automated Toll Gate Passing

A. Detection of Road Users and Infrastructure ElementsConvolutional neural networks have shown huge success in object recognition. From 2014 to 2016, lots of CNN-based algorithms including R-CNN [3], Fast R-CNN [4], Faster R-CNN [5] and YOLO, achieve more and more accurate and high-speed results. Many deep neural network structures, such as GoogleNet[6], have been adopted. In the proposed approach, YOLOv2 [7] is implemented in terms of a single convolutional neural network for detecting vehicles and pedestrians as well as ETC gates because of its accuracy and real-time property.The structure of YOLO algorithm, referring to the structure of GoogleNet, is divided into 24 convolutional layers and 2 connected layers, which uses 1×1 reductionlayer followed by 3×3convolutional layers to substitute the inception modules in GoogleNet. YOLO algorithm predicts positions and class probabilities directly from full images.

The input image is firstly segmented into S×Scells, each of which predicts kbounding boxes with confidence as well as Cconditional class probabilities. Furthermore, each bounding box is represented by a coordinate(x, y, w, h, confidence). The (x, y) are the center offset between the bounding box and the bounds of the grid cell. The wis the width and his the height respectively. The confidence is defined by Pr(object)×IOU. The Pr(object) is 1 when a grid cell containsa part of a ground truth box, vice versa. The IOU is the intersection between the predicted bounding box and the ground truth box. By these predictions, the class-specific confidence score of each bounding box can be obtained and finally the bounding boxes having high scores in each grid cell to predict objects in the image are selected globally. The YOLO model is trained by ImageNet [8] dataset, which is able to detect over 9000 categories of objects including vehicles and pedestrians. The weight of the model is fully used and continuously trained by the ETC dataset and manual gate dataset collected from Shanghai City to Hangzhou City, the total amount of which is 500and 300 respectively. After the training, the model candetect the ETC signs, manual signs,the vehicles and pedestrians

B. Queue ModelingThis functional module is used to estimate the length of the queue at each candidate gate by calculating the correlation between the neighboring vehicles. The lengths of the queues are part of the decision basis for selecting the optimal candidate gate.In this module, the progressive correlation processing is proposed as a means to realize an efficient method to estimate the vehicle queue. It is robust to several situations including heavy and light shadows.Results show that this method has advantages in accuracy as well as robustness.

This method is divided into two steps, which flowchart is shown in Figure 3. The first step is to detect the area of interest (AoI), which restricts the road surface and discards the useless regions like sky and buildings. The Canny edge method is to detect the edges in the image. Based on theedges, Hough transformation determines the straight line, which can be the road border or potential landmarks. The second step is todetermine the total queue length to each candidate gate. For each candidate gate, the algorithm firstly detects the neighboring vehicle 1 and calculatethe correlation of the vehicle1 and the candidate gate. The correlation calculation is determined by the position correlation and overlapping area. If correlated, the neighboringvehicle2 is detected andthe correlation of the vehicle1 and vehicle 2 is calculated. This process is done step by step to finally determine the total queue length.This method has the ability to calculate the total queue length even if some vehicles in the queue are occludedby the closer vehicles.

C. 3D Environment Modeling SLAM is used to construct or update a map of an unknown environment, while simultaneously keeping track of an ego vehicle’s location in the map. In the proposed approach, the Large-Scale Direct Monocular SLAM (LSD-SLAM) [9] is applied. It directly operates on image intensities both for tracking and mapping. The flowchart is shown in upper part of Figure 2, which can be divided into 3 steps: tracking, depth map estimation and map optimization. The combination of positions of detected objects and depth information from SLAM algorithm makes up to the 3D environment model. In order to secure safety, thepredicted pathof detected vehiclesis also calculated in this model.

Motion control is used to manage the vehicle’s longitudinal and lateral control which adopts an adaptive method. It receives a collision-free trajectory data from path planning such as curvature, yaw rate and velocity. In the proposed approach, LQR-PID[12]algorithm is used. The full vehicle model is simplified into a bicycle model. Then a state feedback controller is brought into our dynamic bicycle model, so as to configure a new state transformation matrix. With complex regulation of the feedback matrix, it comes to the pole of the closed loopsystem. However,LQR can exactly pave a way to the optimum pole. The whole part of LQR predicts an expectation as inputs to PID controller. And PID corrects the system response according to the deviation which results from comparisonsbetween expectation and actual value. LQR controller can predict an optimum expectancy for PID with data from the CAN bus. PID is a classic control with strong adaption and robustness. This combination of optimum control and stablecontrol shapes the performance of the proposed motion control, which is more secure and robust than PID