登入
選單
返回
Google圖書搜尋
Extending Object Classificatiion Convolutional Neural Networks to Custom Logo Detection
David Peña Moliner
出版
Universitat Politècnica de Catalunya. Escola d'Enginyeria de Telecomunicació i Aeroespacial de Castelldefels
, 2020
URL
http://books.google.com.hk/books?id=uUYZzgEACAAJ&hl=&source=gbs_api
註釋
The aim of this project is to automate the calculation of the total time that the logos of the sponsoring brands of moto GP appear on the screen during the races. This document explains all the steps that have been followed to train an automatic object detection model for a specific database using RetinaNet. At the beginning, a brief explanation of the main concepts of deep learning is given and it is explained how convolutional neural networks and their kinds of layers, convolution and pooling, operate. Afterwards, it is presented a state of the art of the main classification and object detection systems, where RetinaNet has been chosen because, currently, it is one of the systems that provides better results. It should be noted that the main difference between classification and detection is that a detection system obtains the position of the object (rectangular region called bounding box) and indicates its typology and the classification system only indicates its typology. A database of images from 6 moto GP videos had to be created and labeled using Labelimg software. Labeling an image means drawing the bounding box and defining which brand the logo belongs to. The selected brands have been: Alpinestar, DHL, Repsol, GoPro, Michelin, RedBull, Monster, Tissot, Motul and BMW and a total of 16 classes have been created since one can have several forms of logos. Due to the fact that the database is not large enough to train a model from scratch, the weights of a pre-trained network have been used, this technique is known as transfer learnig. In addition, to avoid overfitting, the layers of one part of the architecture called the backbone have been frozen, which, in this case, has been used with Resnet50. Later, another model has been trained applying data augmentation, to improve the results obtained from the first model trained. Data augmentation is a technique that generates new examples by performing transformations on the images in the database. With this, it has obtained an accuracy of 83.3% and a mean Average Precision (mAP) of 65.33%. Finally, an application example called Brand Logo Monitoring in Moto GP has been developed, which, using the model trained with data augmentation, counts automatically the time of appearance of each brand in the MotoGP Grand Prix and give a final result with the total time.