Yeah I would say. There's a slight of hand, which I've glossed over here, but at some point for this dataset... this was actually a conference competition datasets that people put out. And somebody in the team who made the competition has sat down and manually annotated this, probably several people, manually gone through this to provide the training data that allows us to build the network that performs the analysis and then to compare against. Of course, I could show you this picture and say it worked very well, but without checking and validating that this is good - you just have to take my word for that. And that's not how we do science,right? So this supervised methods - guess we're often implicitly talking about supervised methods - at the moment you still need fairly significant quantities of manually annotated ground truth data to train the systems.
But the hope is that as you bootstrap different methods you can use things like transfer learning, where you can use a smaller amount of data to adjust an already existing neural network model. I think that's really the next steps in the field - trying to move away from having to have huge amounts of manual annotations.
One thing that we've tried to alleviate this bottleneck of somebody having to sit down and manually annotate all of these objects. There was a really fascinating project called Galaxy Zoo
, which is a citizen science project. Astronomers have a similar problem to microscopists - they can generate millions of images really easily now, just tons and tons of data. The throughput of the analysis is too low to deal with it. So these Galaxy Zoo teams led by Chris Lintott came up with this method: cropping images and sharing them out on the internet. And then you explain to people: "this is a scientific project, if you want to help us - come along and help us annotate these images".
It worked amazingly well in Galaxy Zoo. And a few years ago, maybe five years ago we set up our own citizen science project where we use this power of the crowds to help us generate the ground truth. Not for this data in particular, but for the Etch-a-Cell
set of projects on the Zooniverse platform. We've got several projects now and we're collaborating with lots of people. While we still need to get large amounts of these manual annotations to do deep learning, we found a really good way of getting large quantities of annotations to train these networks. It's worked very well so far and we're looking forward to lots more exciting things from the project.