Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’

Anand Koirala,Kerry B. Walsh,Zhenglin Wang,Chris McCarthy

Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’

2019

The performance of six existing deep learning architectures were compared for the task of detection of mango fruit in images of tree canopies. Images of trees (n = 1 515) from across five orchards were acquired at night using a 5 Mega-pixel RGB digital camera and 720 W of LED flood lighting in a rig mounted on a farm utility vehicle operating at 6 km/h. The two stage deep learning architectures of Faster R-CNN(VGG) and Faster R-CNN(ZF), and the single stage techniques YOLOv3, YOLOv2, YOLOv2(tiny) and SSD were trained both with original resolution and 512 × 512 pixel versions of 1 300 training tiles, while YOLOv3 was run only with 512 × 512 pixel images, giving a total of eleven models. A new architecture was also developed, based on features of YOLOv3 and YOLOv2(tiny), on the design criteria of accuracy and speed for the current application. This architecture, termed ‘MangoYOLO’, was trained using: (i) the 1 300 tile training set, (ii) the COCO dataset before training on the mango training set, and (iii) a daytime image training set of a previous publication, to create the MangoYOLO models ‘s’, ‘pt’ and ‘bu’, respectively. Average Precision plateaued with use of around 400 training tiles. MangoYOLO(pt) achieved a F1 score of 0.968 and Average Precision of 0.983 on a test set independent of the training set, outperforming other algorithms, with a detection speed of 8 ms per 512 × 512 pixel image tile while using just 833 Mb GPU memory per image (on a NVIDIA GeForce GTX 1070 Ti GPU) used for in-field application. The MangoYOLO model also outperformed other models in processing of full images, requiring just 70 ms per image (2 048 × 2 048 pixels) (i.e., capable of processing ~ 14 fps) with use of 4 417 Mb of GPU memory. The model was robust in use with images of other orchards, cultivars and lighting conditions. MangoYOLO(bu) achieved a F1 score of 0.89 on a day-time mango image dataset. With use of a correction factor estimated from the ratio of human count of fruit in images of the two sides of sample trees per orchard and a hand harvest count of all fruit on those trees, MangoYOLO(pt) achieved orchard fruit load estimates of between 4.6 and 15.2% of packhouse fruit counts for the five orchards considered. The labelled images (1 300 training, 130 validation and 300 test) of this study are available for comparative studies.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

109

Citations