Diatom detection and identification

Project done in collaboration with Dr. Martin Laviale and Dr. Philippe Usseglio-Polatera from LIEC (CNRS UMR 7360).


Diatoms are a type of  unicellular microalgae commonly used as bio-indicators for monitoring the ecological status of watercourses, particularly in the context of the implementation of the European Water Framework Directive. They are widely used in water quality monitoring due to their high sensitivity to water quality (temperature, pH, micropollutants) and habitat condition. Diatom-based biological indices are based on the occurrence of indicator species in a natural assemblage. Diatom species are identified based on morphological features (size, ornamentation) by human experts using light microscopy. This task can be time-consuming and subject to multiple biases due to the quality of the instrument or the level of expertise in diatom taxonomy.
The objective of the project is to develop a framework to automatically identify diatoms from virtual slide images using deep learning to calculate diatom-based indices for assessing the quality of fresh watersources.


The overall pipeline of the project can be divided into four main stages:
  1. Dataset Acquisition – This step involves acquiring microscope images for further analysis.
  2. Diatom Detection – The images acquired contain several diatoms and debris (dust and impurities). In this step, the  diatoms present in the images are detected and the individual diatoms are extracted .
  3. Diatom Classification – From the detected diatoms, the taxonomy of each of the detected diatoms should be identified. Classification involves idetifying the taxonomy of each of the diatoms detected from the previous step.
  4. Calculate diatom indices – After the classification step, diatom indices can be calculated using the the taxa present and the abundance of each taxa. The indices are further used to assess the quality of water.

Dataset Acquisition

One of the primary and challenging steps involved in this project is the acquisition of labelled dataset for training the deep neural networks. The networks require thousands of accurately labelled dataset  to learn a task. Acquiring the labelled dataset is challenging since the labelling has to be mostly done manually which is time-consuming and subject to human errors and biases depending on the expertise level of the labellers. To deal with the scarcity of labelled real data, we rely on synthetically generated datasets for training our networks.  With synthetic data, it is possible to acquire thousands of labelled images quickly and these images closely resemble the real images from microscope.  

Diatom Detection

There are two mostly used methods for detection in deep learning: Object Detection and Image Segmentation.

Object Detection

In this method, a bounding box is drawn around the detected diatoms. We evaluate two types of bounding boxes using YOLOv5: Axis-aligned bounding boxes and Rotated bounding boxes.
We evaluate three methods for diatom detection:
  1. Zero-shot learning: Network is trained on synthetic data and tested on real microscope images.
  2. Training from scratch: Network is trained only using real microscope images.
  3. Fine-tuning: Network trained on synthetic data is fine-tuned to real dataset and tested on real data.

Axis-aligned bounding box detections:


Rotated bounding box detections:

(Green- correct predictions, Blue- false negatives, Red- false positives)

Diatom Classification

For classification, we considered 166 taxas of diatoms, taken from Rhin Meuse region. A main challenge while performing classification was inter-class similarity and intra-class variability in the different diatom classes. Inter-class similarity refers to the situation where diatoms belonging to different diatom classes have very similar visual appearances and so are not easily distinguishable. Intra-class variance is when diatoms belonging to same taxa have different visual appearances due to difference in view-points from which the images are acquired. This again causes confusion while classifying since the network fails to learn that the multiple appearances belong to the same class. In this work (https://arxiv.org/abs/2109.11891) we address this issue using an automatic clustering mechanism and triplet loss.

                                        (a) Examples of diatoms with high inter-class similarity (b) Examples of a diatom class with high intra-class variance.


© 2019 – 2023 DREAM Lab – Georgia Tech