Auto Focus Agent and Dataset

Hey all,
I’m working on an RL model to control the Z stage for automatic focus. The current method used in the OpenFlexure app is pretty good, I’m just investigating this approach to see if it can converge quicker on an optimal focus point. Anyway, just curious if anyone else is interested in this and, if you are, if you’d be interested in helping me build a dataset.
I currently have a few Z stacks, 20 steps apart, 100 images per stack, of a blood and kidney slide. The dataset is here: Microscope Focus Dataset | Kaggle but I’m interested in getting a more diverse training data to make the model more general. This is where I’m curious if anyone had an interest in making the dataset larger.
The technical approach I’m taking is based on this paper: https://arxiv.org/pdf/1809.03314, except I may make a few superficial changes to the network.

Update 27-APR-2024: Since starting this problem, I’ve found that RL-based training is probably not appropriate, even though its been used before for this type of problem in research. This is not an RL-problem, since the optimal action the agent may choose to perform in order to bring the slide into focus, during training, is easily determined by the dataset. After considering this, I have found that the following paper is a better basis for a learning-based auto focusing agent: Rapid Whole Slide Imaging via Dual-Shot Deep Autofocusing | IEEE Journals & Magazine | IEEE Xplore

Update 28-APR-2024: With some additional slides coming in the mail that will provide a more diverse set of training images, I’ve decided to make a simple labeling tool to help speed up the labeling process. I haven’t found a labeling tool that streamlines this specific workflow very well, compared to labeling workflows to object detection, segmentation, etc. I’ll be releasing a new version of the dataset I linked earlier that contains more sample types other than blood cells and kidney cells, a larger Z range, and an easier dataset format to work with.

How are you going to label how focused your images are?

The labeling process is pretty simple, just choose the image in the z stack with the best focus. In the training loop, you make a heuristic based on that to train the network.

For RL training loops, you may craft a reward function that rewards reaching the best image, and rewards getting closer to it. If you’re doing supervised learning instead of RL, which is definitely possible, you can predict a signed distance to the optimal focal point and use MSE loss based on that. There’s probably a couple of other techniques. I may actually use supervised learning instead of RL, since determining the optimal action is generally pretty easy (as opposed to most other types of RL problems).

maybe you already know this, but a quick approximation of best focus is just the jpeg filesize. Blurry images compress better than images full of details.

Hey! That’s a neat trick! Before this I used variance of laplacian to measure focus, but using the jpeg size as a heuristic is a neat technique I haven’t heard of before. Makes sense though.