Globus Aids University in Efforts to Increase Crop Yields through HPC and Machine Learning for Digital Agriculture
Dilbarjot and Michael Beck, members of the Physics and Applied Computer Science departments at the University of Winnipeg, are part of a team generating labeled datasets to train Machine Learning models to recognize specific features in images captured from crops that indicate a need for specific mitigation techniques, such as targeted pest controls, irrigation changes, etc. To train such models, they are carefully taking up to 40,000 images a day of plants: cash crops and weeds, tagging the images with metadata (e.g., species and age of plant, recognizable issues, presence of pests) and feeding them into the training system. They will also make the images publicly available, which will give researchers and industry the training data they need to spur new innovations in agriculture, much as ImageNet did for machine learning and AI in general. The ultimate goal is to create models that allow autonomous vehicles to roam crop fields, capturing images, and making highly localized decisions about deploying specific mitigation techniques.
Their daily capture of 40,000 images, plus related metadata, results in about 30 GB of data, which needs to be transferred from the lab systems and research field to the campus storage, and then from there to the HPC systems of Compute Canada, where the machine learning model is being trained. Since there are several data capture systems and this is a continuous process, the team has automated these transfers using the Globus command-line interface (Globus CLI). Each capture system has a Globus Connect Personal endpoint installed using a service account, and a bash shell script orchestrates the transfers from each capture system to the campus storage. Another script orchestrates the transfers from the campus storage to Compute Canada.
So far, the team has generated more than 10 TB of data for its model training system, and the work is continuous.
- “We can rely on Globus to ensure that each of our 40,000 files per day is safely copied to campus storage and then to Compute Canada. We can’t always count on the file manager on a single machine to copy 40,000 files between directories without problems, so having it work flawlessly between systems is a huge help.”
- “The Globus CLI allows us to automate our data flows across three tiers of storage: the Windows PCs in our lab, the campus-wide storage at the University of Winnipeg, and the national-scale systems at Compute Canada. In the CLI, these three, very different, systems look the same to us.”