You have a number of difficult problems to solve there!
In the Openflexure software we find that the most robust focus metric is the size of the jpeg images in the camera stream. This also has the advantage of being calculated ‘free’ by GPU in making the camera preview stream on the Raspberry Pi. There is a paper Fast, high‐precision autofocus on a motorised microscope: Automating blood sample imaging on the OpenFlexure Microscope - Knapper - 2022 - Journal of Microscopy which sets out the approach to autofocus in the Openflexure software. (I am an author, but a minor one. The clever stuff on focussing came from the other authors ). I don’t know whether this is also possible in the ESP32 architecture. How big is the Laplacian kernel that you are using, and are you using full 8MP images for your Laplacian calculations?
Mis-focusing on blank areas is also big issue for tiling. @JohemianKnapsody has been working on an automated solution to recognising blank regions. He has a thread on this Forum Automated Slide Scanning and Tiling.
I had a look at your repository, particularly the images of the holders in use. The cross-registration between the two systems is very nicely done.