Developing v3 of the OpenFlexure software

It is still some way off.

Richard has made great progress over recent months, however to really make good progress we need more dedicated software development. We have put a grant application to try to get some funding directly for the software, and we are also in the process of trying to establish some sort of entity outside academia that can focus on the aspects of the microscope development that fall outside the definition of “research”.

V3 images are available for testing. But they are not yet feature complete, all of them are the Lite version of the OS. Once, we get up a bit more steam, I hope we can scale the user testing and feedback on the forum.

4 Likes

For anyone interested in progress on the new server. @JohemianKnapsody and I have made quote a lot of progress adding features and tidying up the server and stitching codebases. We have also added a fair bit of functionality from Joes testing branches into the v3 branch. With code review help from @bprobert

Highlights:

Hopefully we are not far away from semi-complete alpha release. Test versions of the OS are available here:

I have updated the architecture diagram a bit and added in repository URLs:

We plan to go through and label issues on some of these repositories that we think are “low hanging fruit”. In the hope that this helps potential contributors find issues that are a nice way to ease into the codebase.

2 Likes

Will version 3 support capturing lossless data? The current method of storing bayer data in the EXIF is weird, and it’s not obvious to everyone that the bayer data in the EXIF file is not the same as the data you look at when you open the JPEG. Providing either the 10-bit bayer data or 10-bit RGB data in a lossless format would be much appreciated. Formats such as 16-bit PNGs, EXR, or (although it’s my least favorite) TIFF could do this.

Edit: I had a look at the source code of the v3 branch and as of this writing, it looks like you guys are using OpenCV now. I’d like to suggest that the rasberry pi 2 camera interface be used instead, so that you can allow users to capture the full 10-bit resolution of the camera. You can bit shift the data into a 16-bit data type and encode it in a PNG for portability. PNG supports EXIF, so you can add metadata to it as you do with JPEG.

Edit 2: I was mistaken about OpenCV being the primary interface, see comments below.

2 Likes

Sorry for the confusion. the OpenCV camera is in there as is a simulation camera but the picamera use by default is in another repo. In the above diagram it is GitHub#3.

This is something I really want to change because the labthings-picamera2 depends on following the protocall in the OpenFlexure Microscope Server, and the Server depends on labthings-picamera2. The plan is to pull the camera interface out entirely into a consistent Repo.

As for Raw images, this is something we haven’t discussed in huge detail. Much of the v3 development has been getting a stable picamera2 interface, we have had issues with weird colour effects for red images (was a quirk of green equalisation) camera timeouts that hard crashed the pi (turned out to be timeout miss-matches at the software and python level by default), and number of other memory related issues relating to the live stitching while scanning.

We have made great strides with all of these, and now have a branch soon to be merged which has scanning reliably collecting 8MP captures which it downsamples to 2MP. This is done because the optical resolution is a bit below 2MP, but by collecting at 8MP each of the 4 bayer channels is 2MP. We then perform the demosaicing into 8MP and downsample to 2MP. This improves quality compared to 2MP capture.

I assume RAW will be an optional setting in both scan and capture. In the case of scanning it may slow things down due to the speed of saving to disk. If you are on GitLab can you add an issue saying that you would like RAW capture, and how you would refer the data to be saved. This way it will be tracked and won’t get forgotten.

1 Like

I see, thanks for the clarification. I don’t understand what a lab thing is, but it sounds like there’s been work utilizing picamera2 and that’s great to hear! I’ll make a gitlab issue regarding the preservation of access to the 10-bit bayer data in a lossless format.

1 Like

LabThings is a framework that we made to allow us to use the WC3 standard for Web of Things. Basically we have a web server, and that web server is connected to multiple bits of hardware each which need to be controlled. LabThings allows us to control them together, but also to expose their functionality and/or state as a webAPI.

Some more detail is here:
https://royalsocietypublishing.org/doi/full/10.1098/rsos.211158

Got it. Reminds me a little bit of ROS, except this looks way better and it doesn’t use DDS.

1 Like

Hi,
I just registered to the forum, because I got my first build of the OFM finished (and it is looking and working awesome!), however, different topic. First and foremost, I am a linux-guy, not a researcher, and I noticed a few details, that I think should be addresses with the OS-build. As the “stable” build is based on a very old raspbian release, I am currently using a current build named “2025-06-09-raspios-openflexure-bookworm-armhf-lite-999-final-export”.

  1. Legacy apt-sources: This affects all current ubuntu and debian builds, but this should be migrated to the current way of handling keys for apt-sources. I’d assume, the “old way” of configuring apt-sources will be depreciated eventually rather sooner than later…
  2. By default NTP-Servers given by DHCP are ignored. There are several ways to deal with this, but the problem i caused by the combination of systemd-timesyncd and “something different than systemd-network(d)”. However, I think it is very important to use NTP-Servers handed out by the DHCP-Server, and not fallback to Servers on the Internet if possible. Especially with the “stable” build, I consider purpose-build distros like OFM as IoT-Devices, which should be shielded from the internet (because there is no need for them to have internet in most cases). Especially then it is important to have NTP working out of the box, especially when automatic configuration is possible via DHCP. Sure, you can configure this yourself, but I assume most OFM-Users are not primarily Linux-Users. I think many hurdles are already covered in OFM regarding OS-base-config, but i found this one clearly missing. With no NTP, your time is likely out of sync by weeks or more.

However, not a rant, just an attempt to improve. Keep up the great work.

1 Like

Hi yet again.. I’m bouncing around the forum looking at messages I missed.

I 100% agreed with you, stable is embarrassingly out of date. A combination of the PiCamera stack entirely changing at the time we had a very reduced team to push forward with a huge software overhaul has left us in this state.

If someone with experience in distro maintenance could help us build a more secure and updated V2 build we would love to have one. But as Raspian deprecated the camera stack that is fundamental to the server we are focussing efforts on v3.

The first alpha is a huge milestone in the project as much of it has been a bottom up rewrite. I hope with our accelerated progress that we will soon be at a point where v3 is the server/OS.

2 Likes

I am certainly not up to the task maintaining a distribution, but i’ve done alot of linux client and server management for work, and when doing gitlab-issues in the future, i promise to give helpful advice ragarding on how to do something. However, what i can already say, that approach on how you do it now, so taking the raspbian base-image and then consecutively apply some recipes/scripts that change some things related to a certain topic is definately the way to go. building may sometimes take a little bit longer that way, but you always have fresh, reproducible builds, this is the way how it should be done anyways, even tough i might have choosen ansible for that task, to be able to easily apply changes to an already deployed system in a simple manner. I don’t think that there is very much more sophisticated to do, except building a checklist of things that have to work and how it has to work, regularily build images with updated packages, and check if any of the updates conflicts or interacts with the changes to the base-os you apply afterwards. there is not very much to it. Maybe pay attention to not hardcode things. In the GUI-Version of the current stable build the desktop shortcuts don’t work, because the username in the .desktop-files are sometimes hardcoded to “pi” as username (and therefore the path of the homedir where some apps are located). But other than that, many things are already done very much the right way…

however, back to the v3-release: I just experienced an unhandled crash of the stitching progcess causing the scanning to halt. The running scan was not abortable and restarting the openflexure-server unit did eventually complete, but not leading to the UI be available again unfortunately. I tought you were interested in the dmesg-trail:

[24449.032958] vc_sm_cma_vchi_rx_ack: received response 0, throw away...
[24971.284060] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[24971.284102] Mem abort info:
[24971.284107]   ESR = 0x0000000096000047
[24971.284113]   EC = 0x25: DABT (current EL), IL = 32 bits
[24971.284120]   SET = 0, FnV = 0
[24971.284125]   EA = 0, S1PTW = 0
[24971.284129]   FSC = 0x07: level 3 translation fault
[24971.284136] Data abort info:
[24971.284140]   ISV = 0, ISS = 0x00000047, ISS2 = 0x00000000
[24971.284146]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[24971.284152]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[24971.284158] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000080d0c000
[24971.284166] [0000000000000008] pgd=0800000041d06003, p4d=0800000041d06003, pud=0800000041d06003, pmd=0800000084b99003, pte=0000000000000000
[24971.284192] Internal error: Oops: 0000000096000047 [#1] PREEMPT SMP
[24971.284201] Modules linked in: cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep brcmfmac_wcc imx219 v4l2_cci regmap_i2c vc4 brcmfmac snd_soc_hdmi_codec drm_display_helper binfmt_misc brcmutil cfg80211 bcm2835_isp(C) cec bcm2835_v4l2(C) bcm2835_codec(C) rpi_hevc_dec v3d hci_uart bcm2835_unicam_legacy bcm2835_mmal_vchiq(C) videobuf2_vmalloc drm_dma_helper btbcm snd_soc_core bluetooth v4l2_mem2mem vc_sm_cma(C) v4l2_dv_timings gpu_sched v4l2_fwnode videobuf2_dma_contig v4l2_async videobuf2_memops drm_shmem_helper videobuf2_v4l2 ecdh_generic ecc videodev snd_compress drm_kms_helper rfkill libaes snd_pcm_dmaengine snd_bcm2835(C) snd_pcm raspberrypi_hwmon videobuf2_common i2c_mux_pinctrl snd_timer i2c_bcm2835 i2c_mux i2c_brcmstb snd mc raspberrypi_gpiomem nvmem_rmem drm fuse drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6 uio_pdrv_genirq uio
[24971.284415] CPU: 1 UID: 0 PID: 361 Comm: SMIO Tainted: G         C         6.12.25+rpt-rpi-v8 #1  Debian 1:6.12.25-1+rpt1
[24971.284430] Tainted: [C]=CRAP
[24971.284435] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[24971.284443] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[24971.284452] pc : vc_sm_release_resource+0x54/0xf8 [vc_sm_cma]
[24971.284478] lr : vc_sm_release_resource+0x50/0xf8 [vc_sm_cma]
[24971.284490] sp : ffffffc081313cd0
[24971.284495] x29: ffffffc081313cd0 x28: ffffffc0818d8a10 x27: ffffffc0818d8a18
[24971.284509] x26: ffffff8082696088 x25: ffffff8082696068 x24: dead000000000100
[24971.284522] x23: dead000000000122 x22: ffffff80810bf1a0 x21: ffffffe82843a000
[24971.284534] x20: ffffffe82843a000 x19: ffffff80810bf180 x18: 0000000000000002
[24971.284547] x17: ca8de9bfe6e30a08 x16: ffffffe84638b2e0 x15: ffffffc081313aa0
[24971.284560] x14: 0000000000000004 x13: ffffff8040eeb0e8 x12: 0000000000000000
[24971.284578] x11: ffffff8042780110 x10: ffffff8042780000 x9 : ffffffe828435288
[24971.284594] x8 : ffffff8042780028 x7 : 0000000000000034 x6 : 000000000000000c
[24971.284610] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[24971.284626] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff8040eeb100
[24971.284644] Call trace:
[24971.284652]  vc_sm_release_resource+0x54/0xf8 [vc_sm_cma]
[24971.284665]  vc_sm_vpu_event+0x41c/0x510 [vc_sm_cma]
[24971.284676]  vc_sm_cma_vchi_videocore_io+0x1e4/0x3a0 [vc_sm_cma]
[24971.284686]  kthread+0x11c/0x128
[24971.284704]  ret_from_fork+0x10/0x20
[24971.284726] Code: f9438280 91020000 94000c5f a9400662 (f9000441) 
[24971.284742] ---[ end trace 0000000000000000 ]---

…even tough the service recovered by restarting, it did not very gracefully and remained dysfunctional unfortunately:

Jun 22 22:06:45 cyberscope openflexure-microscope-server[601]: INFO:     Waiting for connections to close. (CTRL+C to force quit)
[repeated ~60 times]
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: State 'stop-sigterm' timed out. Killing.
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Killing process 601 (openflexure-mic) with signal SIGKILL.
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Killing process 773 (openflexure-ust) with signal SIGKILL.
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Killing process 782 (n/a) with signal SIGKILL.
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Killing process 786 (openflexure-mic) with signal SIGKILL.
Jun 22 22:08:15 cyberscope kernel: vc_sm_cma_import_dmabuf: imported vc_sm_cma_get_buffer failed -512
Jun 22 22:08:15 cyberscope kernel: bcm2835_mmal_vchiq: vchiq_mmal_submit_buffer: vc_sm_import_dmabuf_fd failed, ret -512
Jun 22 22:08:15 cyberscope kernel: bcm2835-codec bcm2835-codec: device_run: Failed submitting ip buffer
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Killing process 787 (openflexure-mic) with signal SIGKILL.
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Killing process 789 (openflexure-mic) with signal SIGKILL.
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Killing process 790 (n/a) with signal SIGKILL.
Jun 22 22:08:15 cyberscope kernel: ------------[ cut here ]------------
Jun 22 22:08:15 cyberscope kernel: WARNING: CPU: 3 PID: 787 at drivers/media/common/videobuf2/videobuf2-core.c:2215 __vb2_queue_cancel+0x238/0x2d8 [videobuf2_common]
Jun 22 22:08:15 cyberscope kernel: Modules linked in: cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep brcmfmac_wcc imx219 v4l2_cci regmap_i2c vc4 brcmfmac snd_soc_hdmi_codec drm_display_helper binfmt_misc brcmutil cfg80211 bcm283>
Jun 22 22:08:15 cyberscope kernel: CPU: 3 UID: 105 PID: 787 Comm: openflexure-mic Tainted: G      D  C         6.12.25+rpt-rpi-v8 #1  Debian 1:6.12.25-1+rpt1
Jun 22 22:08:15 cyberscope kernel: Tainted: [D]=DIE, [C]=CRAP
Jun 22 22:08:15 cyberscope kernel: Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
Jun 22 22:08:15 cyberscope kernel: pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jun 22 22:08:15 cyberscope kernel: pc : __vb2_queue_cancel+0x238/0x2d8 [videobuf2_common]
Jun 22 22:08:15 cyberscope kernel: lr : __vb2_queue_cancel+0x34/0x2d8 [videobuf2_common]
Jun 22 22:08:15 cyberscope kernel: sp : ffffffc082a23a30
Jun 22 22:08:15 cyberscope kernel: x29: ffffffc082a23a30 x28: ffffff804580a100 x27: 0000000000000009
Jun 22 22:08:15 cyberscope kernel: x26: 0000000000000001 x25: 0000000000000000 x24: ffffff804580a700
Jun 22 22:08:15 cyberscope kernel: x23: ffffff804056cf20 x22: ffffff80439e0da8 x21: ffffff8043acad48
Jun 22 22:08:15 cyberscope kernel: x20: ffffff80439e0e50 x19: ffffff80439e0da8 x18: 0000000000000002
Jun 22 22:08:15 cyberscope kernel: x17: 0000000100000000 x16: ffffffe84595c8f0 x15: ffffffc081923b80
Jun 22 22:08:15 cyberscope kernel: x14: 0000000000000004 x13: ffffff8044e40028 x12: 0000000000000000
Jun 22 22:08:15 cyberscope kernel: x11: ffffff808e80ee60 x10: ffffff808e80eda0 x9 : ffffffe84595cab8
Jun 22 22:08:15 cyberscope kernel: x8 : ffffffc082a23940 x7 : 0000000000000000 x6 : ffffff8042780470
Jun 22 22:08:15 cyberscope kernel: x5 : 0000000000150010 x4 : fffffffec205f6a0 x3 : 0000000000150010
Jun 22 22:08:15 cyberscope kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000001
Jun 22 22:08:15 cyberscope kernel: Call trace:
Jun 22 22:08:15 cyberscope kernel:  __vb2_queue_cancel+0x238/0x2d8 [videobuf2_common]
Jun 22 22:08:15 cyberscope kernel:  vb2_core_queue_release+0x2c/0x88 [videobuf2_common]
Jun 22 22:08:15 cyberscope kernel:  vb2_queue_release+0x18/0x30 [videobuf2_v4l2]
Jun 22 22:08:15 cyberscope kernel:  v4l2_m2m_ctx_release+0x30/0x50 [v4l2_mem2mem]
Jun 22 22:08:15 cyberscope kernel:  bcm2835_codec_release+0x64/0x110 [bcm2835_codec]
Jun 22 22:08:15 cyberscope kernel:  v4l2_release+0xec/0x100 [videodev]
Jun 22 22:08:15 cyberscope kernel:  __fput+0xd0/0x2e0
Jun 22 22:08:15 cyberscope kernel:  ____fput+0x1c/0x30
Jun 22 22:08:15 cyberscope kernel:  task_work_run+0x80/0xe8
Jun 22 22:08:15 cyberscope kernel:  do_exit+0x2e8/0x9c0
Jun 22 22:08:15 cyberscope kernel:  do_group_exit+0x3c/0xa0
Jun 22 22:08:15 cyberscope kernel:  get_signal+0x9ac/0x9c8
Jun 22 22:08:15 cyberscope kernel:  do_signal+0xf8/0x1120
Jun 22 22:08:15 cyberscope kernel:  do_notify_resume+0xd0/0x150
Jun 22 22:08:15 cyberscope kernel:  el0_svc_compat+0x6c/0x80
Jun 22 22:08:15 cyberscope kernel:  el0t_32_sync_handler+0x98/0x140
Jun 22 22:08:15 cyberscope kernel:  el0t_32_sync+0x194/0x198
Jun 22 22:08:15 cyberscope kernel: ---[ end trace 0000000000000000 ]---
Jun 22 22:08:15 cyberscope kernel: videobuf2_common: driver bug: stop_streaming operation is leaving buffer 5 in active state
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Main process exited, code=killed, status=9/KILL
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ An ExecStart= process belonging to unit openflexure-microscope-server.service has exited.
░░ 
░░ The process' exit code is 'killed' and its exit status is 9.
Jun 22 22:08:15 cyberscope systemd[1]: openflexure-microscope-server.service: Failed with result 'timeout'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ The unit openflexure-microscope-server.service has entered the 'failed' state with result 'timeout'.
Jun 22 22:08:15 cyberscope systemd[1]: Stopped openflexure-microscope-server.service - Run the OpenFlexure Microscope software using LabThings.
░░ Subject: A stop job for unit openflexure-microscope-server.service has finished
░░ Defined-By: systemd

…not sure, what could have caused this, because there was not very much more helpful information in the log, but i tought you might wanna know… (next time i’ll do gitlab-issues, i promise :slight_smile: )

P.S.: I ended up restarting the pi (gracefully) and what went well is that the microscope remembered it’s position, even tough the scanning crashed, so no recalibration of X/Y/Z origins were needed, which is good!

1 Like

The position is stored on tbe Sangsboard. Any move that was sent will have been done and recorded, whether or not the Pi died mid move. As long as there is power.

yeah, this is pretty neat. as the sangaboard powers the pi and not the other way around, it will always be “up to date”… which is good, because we have no physical endstops…

Just had another freeze. specifically the stitching-process becomes a zombie in that process, and the gui just does not further update the process, but however, otherwise nothing very helpful in the log, dmesg or journal unfortunately… no indication of temperature-related problems, swapping or OOM-Killer or something that would be equaly obvious…

1 Like

Very large stitches take a very long time, this increased significantly when we moved to PyVIPS, which allows us to stream stitches larger than the RAM into a jpeg. We have some ideas for improving it. I have also noticed that correlating and aligning the images (which used to take up most of the time) both log, but the final stitch does not log. Now the final stitch is starting to dominate the time after the scan we do need to have a way to know how it is progressing.

I think it probably was running, not a zombie. I think the reason it appears as a zombie is that the cancel button won’t work. This is a UI bug, the cancel button is to cancel scanning. And at the end of the scan it stitches the images. As such it ignores the cancel button if it is clicked again. This behaviour made sense for very small scans when stitching was near instant, but it is a pain now.

no, it was definately dead at that moment. the stitch had around ~500 frames at that point and did not update further for an hour. the scanning (i.e. the physical moving of the scope) had also stopped since an hour then, which in the past did not happen. when previous scans were finished, at least the movement finishes by returning to origin as last move. this would have been clearly visible in the log (I am talking system-log, journal and dmesg, not that textbox in the UI…). Cancelation (at that point) also had no effect of course)..

1 Like

:sad_but_relieved_face:

Right. We do need a bit more info to be logged so it is clear what causes things.

The textbox in the UI is a subset of the logs that go into the systemd log. It is basically the Info and above systemd logs that initiate from the scan action.

I’d also suggest the following changes anyways regarding the logging to make life easier:

  • by default do not log every http-request from the webapp/API. to be honest, that is usually rather useless. this should only be logged if log-severity==DEBUG.
  • make logging severity configurable from configfile or UI. not only for display, but in general, what really gets logged.
  • make systemd-logging persistent (at least for non-stable builds). if every http-request would be logged, you most likely would wear likely wear out your sd-card prematurely. therefore i think “persistent logging”, but not at debug-level regarding the webserver/api would be more reasonable. that way you would find relevant messages sooner. (you can still log your webserver-log to a logfile to a tempdir mapped to ram additionally).

Main advantage of persistent (instead of volatile) systemd-logging would be, that wou would have logs from previous boots available (they can be examined with “journalctl -b -1” (“-1” is the last boot before the current one) or similar… For development-versions this would be helpful.

1 Like

Yeah, the logging needs some improvement. We had an issue where we were loosing important error logs, this was due to how Uvicorn overwrite logging config when started. We managed to get a way to output everything, which was a huge improvement. At some point it will be reduced, because the log file is a bit crazy right now.

Making severity configurable is certainly something we need to do. But we will first need to adjust logs from upstream as all of the http request logs are sent at info level.