The technology is all in place now with machine learning. You simply train the software by feeding it pictures and identifying what they are, and the software learns how to identify from them. This is being used with things such as plants, which have far more variability in terms of leaf shape, angle, size, etc.
I’m a tech guy but not a coder, but anyone with basic coding skills could likely build the app framework in a week. Then it’s just a matter of feeding it pictures, which the users do themselves (someone who knows what they’re seeing just needs to review them on the back end).
https://www.makeuseof.com/tag/use-smartphone-identify-anything-camfind/