Hey all,
I’m working on an RL model to control the Z stage for automatic focus. The current method used in the OpenFlexure app is pretty good, I’m just investigating this approach to see if it can converge quicker on an optimal focus point. Anyway, just curious if anyone else is interested in this and, if you are, if you’d be interested in helping me build a dataset.
I currently have a few Z stacks, 20 steps apart, 100 images per stack, of a blood and kidney slide. The dataset is here: https://www.kaggle.com/datasets/tay10r/microscope-focus-dataset-v1 but I’m interested in getting a more diverse training data to make the model more general. This is where I’m curious if anyone had an interest in making the dataset larger.
The technical approach I’m taking is based on this paper: https://arxiv.org/pdf/1809.03314, except I may make a few superficial changes to the network.
Update 27-APR-2025: Since starting this problem, I’ve found that RL-based training is probably not appropriate, even though its been used before for this type of problem in research. This is not an RL-problem, since the optimal action the agent may choose to perform in order to bring the slide into focus, during training, is easily determined by the dataset. After considering this, I have found that the following paper is a better basis for a learning-based auto focusing agent: Rapid Whole Slide Imaging via Dual-Shot Deep Autofocusing | IEEE Journals & Magazine | IEEE Xplore
Update 28-APR-2025: With some additional slides coming in the mail that will provide a more diverse set of training images, I’ve decided to make a simple labeling tool to help speed up the labeling process. I haven’t found a labeling tool that streamlines this specific workflow very well, compared to labeling workflows to object detection, segmentation, etc. I’ll be releasing a new version of the dataset I linked earlier that contains more sample types other than blood cells and kidney cells, a larger Z range, and an easier dataset format to work with.
Update 09-MAY-2025: On a bit of a hiatus since my Z-stage broke after falling off the shelf. However, I have pretty much settled on a model architecture. The model will be primarily aimed at reconstructing its input at a higher visual fidelity. For example, if the illumination is low (to reduce photo-toxicity) or parts of the frame are blurry, it will reconstruct the input with reduced blurring and better lighting. I’ve put in a little bit of extra work so that the model can take, as an input, the raw 10-bit bayer data so that it has more information to work with, compared to the lossly JPEG data.
Here are some of the results so far, with the caveat that the change in illumination is currently synthetic and that the input is lossly JPEG data (I wrote the code to capture the bayer data and change the illumination after my Z-stage broke, so haven’t been able to capture training data for that yet). On the top is the input to the model, the middle is the output of the model, and the bottom is what image looks like at the ideal focal point with no changes to the illumination.
There’s still a lot of room for improvement, because the dataset I’ve collected so far is relatively small.