Spindle Detection

How do you detect spindles?

SomnoBot uses a deep neural network called SUMOv2 [1] to detect spindles. The model was developed in our lab and is publicly available on GitHub. SUMOv2 was trained on artifact-free N2 sleep annotated by a large group of experts to create high quality consensus annotations [2]. Evaluations have shown that SUMOv2 achieves state-of-the-art detection accuracy when compared to held-out consensus annotations and generalizes well to new data [1]. Using SUMOv2, we were further able to qualitatively replicate an expert-led study investigating differences in spindle density between bipolar patients and healthy controls [1].

By default, SomnoBot first detects sleep stages in your data (see here for more details) and then analyzes detected N2 sleep for spindles. While spindles can typically also be found in N1 and N3 sleep [3], we recommend focussing on N2 sleep with SomnoBot to keep as close as possible to the training and validation schemes used in our verification of SUMOv2 [1]. If required, the default settings for sleep staging can be changed (see here). However, you should be aware that SUMOv2 will detect spindles in implausible sleep stages, such as Wake or REM sleep, if sleep staging is completely disabled.

We also note that SUMOv2 was trained on artifact-free data. While we have designed SUMOv2 to be robust to sudden fluctuations in the EEG signal, we cannot guarantee that severe artifacts will not degrade the annotation quality. As a precaution, we recommend removing artifact-contaminated data manually (we plan to develop a tool for automatic artifact detection in the future).

SomnoBot annotates your polysomnographic recordings (PSGs) on your computer in your browser. This means that no PSGs are transmitted to us or any third party. When you visit our website, our implementations of RSN and SUMOv2 are downloaded to your web browser, which runs the neural networks locally.

Which channels should I select to detect spindles?

We recommend to select a central EEG derivation (e.g., "C3-A2" or "C4-A1") for spindle detection. The neural network that SomnoBot uses to detect spindles (SUMOv2 [1]) was trained on data from central EEG derivations, and we expect it to perform best in a similar setting.

Are the spindles correctly detected?

We have taken every possible measure to ensure that SomnoBot's spindles are accurate. However, as a good scientist, you should not blindly trust the model's annotations. It is always a good idea to check the annotations on a sample data segment from your recordings.

Below we explain

what it means to be "accurate" in spindle detection,
what we suggest as a simple protocol that you can use to evaluate whether SomnoBot is indeed accurately detecting spindles in your data, and
how we ensured that the implementation of the underlying neural network (SUMOv2) is correct.

What does it mean to be "accurate" in spindle detection?

Even expert scorers make errors. When different human experts manually annotate the same PSG recording, they will usually disagree on some spindles (inter-rater variability), despite their best efforts and training [4]. When asked to score a recording twice, even a single human expert will often not identify exactly the same spindles (intra-rater variability) [4]. One approach to increase the accuracy of spindle detection is to have different experts annotate the same recordings and derive expert consensus spindles from these annotations. However, creating large datasets with expert consensus annotations is time-consuming and expensive, and we know of few such datasets being publicly available [2].

In light of these results, we suggest that the performance of automated spindle detection systems be compared to the agreement between pairs of human experts. We calculated such expert-pair agreement levels for a dataset annotated by 47 human experts [2] and found that the agreement between expert pairs varies greatly (see figure 1). Although agreement levels between SUMOv2 and human experts showed similar variability depending on the dataset and the expert, SUMOv2 performed well within the distribution of expert-pair agreement levels [1]. We see this as an indication that SUMOv2 is indeed as accurate as human experts in spindle detection.

A simple protocol to evaluate detection accuracy in your data

If you want to assess whether SomnoBot detects spindles as you do, we recommend the following.

Select one of your PSG recordings and annotate spindles manually.
Let SomnoBot detect spindles for the selected recording.
Compare your annotations with SomnoBot's annotations by calculating the Intersection-over-Union (IoU) F1 score, which measures how well SomnoBot's spindles overlap with your annotations.

Below, we link to resources to help you do this. The higher the IoU F1 score, the better the agreement between your annotations and SomnoBot's annotations. Keep in mind that you cannot expect perfect agreement between your and SomnoBot's annotations. Even human experts do not achieve perfect agreement (see previous section).
Compare your IoU F1 score with those obtained between pairs of expert scorers of the MODA dataset [2].

If your IoU F1 score is within the distribution of IoU F1 scores, this is a good indication that SomnoBot is detecting spindles similar to you, with an agreement comparable to that between two expert scorers. If your IoU F1 score is below the distribution of IoU F1 scores, then SomnoBot's scores are in less agreement with your scores than two expert scorers would normally agree with each other. In such a case, you may want to reconsider whether you want to use SomnoBot for your data.

Comparison of IoU F1 scores — **Figure:** Probability distribution of the agreement (measured by the IoU F1 score) between pairs of expert scorers for detecting spindles in artifact-free N2 sleep. The broad distribution indicates how strongly the agreement between pairs of expert scorers varies. The MODA annotations are publicly available [2]; this figure was created by the authors of SomnoBot.

In the following, we provide scripts to calculate the IoU F1 score between two sets of spindles in different programming languages. The code has not been optimized for speed, but it should be clear how the IoU F1 score is calculated.

Python function

def iou_f1_score(true_spindles, pred_spindles, overlap_threshold=0.2):
    """
    Calculate the IoU F1 score for two sets of spindles.

    Parameters:
    - true_spindles: List of tuples (start, end) for the true spindle annotations.
    - pred_spindles: List of tuples (start, end) for the predicted spindle annotations.
    - overlap_threshold: Minimum IoU required to consider a spindle a true positive.

    Returns:
    - F1 score, precision, and recall.
    """
    def iou(true_sp, pred_sp):
        """Compute the IoU between two spindles."""
        start1, end1 = true_sp
        start2, end2 = pred_sp

        intersection = max(0, min(end1, end2) - max(start1, start2))
        union = (end1 - start1) + (end2 - start2) - intersection
        return intersection / union if union > 0 else 0

    matched_true_spindles = set()
    matched_pred_spindles = set()

    # Find true positives
    for i, tsp in enumerate(true_spindles):
        for j, psp in enumerate(pred_spindles):
            if i in matched_true_spindles or j in matched_pred_spindles:
                continue  # Already matched

            if iou(tsp, psp) >= overlap_threshold:
                matched_true_spindles.add(i)
                matched_pred_spindles.add(j)

    tp = len(matched_true_spindles)
    fn = len(true_spindles) - tp
    fp = len(pred_spindles) - tp

    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1_score = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

    return f1_score, precision, recall

# Example usage:
true_spindles = [(0, 5), (10, 15), (20, 25)]
pred_spindles = [(1, 6), (10, 16), (30, 35)]
overlap_threshold = 0.2

f1, precision, recall = iou_f1_score(true_spindles, pred_spindles, overlap_threshold)
print(f"F1 Score: {f1:.2f}, Precision: {precision:.2f}, Recall: {recall:.2f}")

R function

iou_f1_score <- function(true_spindles, pred_spindles, overlap_threshold = 0.2) {
  # Function to compute IoU between two spindles
  iou <- function(true_sp, pred_sp) {
    start1 <- true_sp[1]
    end1 <- true_sp[2]
    start2 <- pred_sp[1]
    end2 <- pred_sp[2]

    intersection <- max(0, min(end1, end2) - max(start1, start2))
    union <- (end1 - start1) + (end2 - start2) - intersection

    if (union > 0) {
      return(intersection / union)
    } else {
      return(0)
    }
  }

  matched_true_spindles <- c()
  matched_pred_spindles <- c()

  # Check all possible pairs (O(N × M))
  for (i in seq_along(true_spindles[,1])) {
    for (j in seq_along(pred_spindles[,1])) {
      if (i %in% matched_true_spindles || j %in% matched_pred_spindles) {
        next # Skip already matched spindles
      }

      if (iou(true_spindles[i,], pred_spindles[j,]) >= overlap_threshold) {
        matched_true_spindles <- c(matched_true_spindles, i)
        matched_pred_spindles <- c(matched_pred_spindles, j)
      }
    }
  }

  tp <- length(matched_true_spindles)
  fn <- nrow(true_spindles) - tp
  fp <- nrow(pred_spindles) - tp

  precision <- ifelse((tp + fp) > 0, tp / (tp + fp), 0)
  recall <- ifelse((tp + fn) > 0, tp / (tp + fn), 0)
  f1_score <- ifelse((precision + recall) > 0, 2 * precision * recall / (precision + recall), 0)

  return(list(F1 = f1_score, Precision = precision, Recall = recall))
}

# Example usage:
true_spindles <- matrix(c(0, 5, 10, 15, 20, 25), ncol = 2, byrow = TRUE)
pred_spindles <- matrix(c(1, 6, 10, 16, 30, 35), ncol = 2, byrow = TRUE)
overlap_threshold <- 0.2

result <- iou_f1_score(true_spindles, pred_spindles, overlap_threshold)
print(result)

Matlab function

function [f1_score, precision, recall] = iou_f1_score(true_spindles, pred_spindles, overlap_threshold)
    % Function to calculate IoU F1 score between two sets of spindles.
    % Inputs:
    %   true_spindles      - Nx2 matrix of (start, end) times for the true spindle annotations.
    %   pred_spindles      - Mx2 matrix of (start, end) times for the predicted spindle annotations.
    %   overlap_threshold - Minimum IoU threshold for considering a spindle a true positive.
    % Outputs:
    %   f1_score     - The IoU F1 score.
    %   precision    - The precision.
    %   recall       - The recall.

    if nargin < 3
        overlap_threshold = 0.2;  % Default overlap threshold
    end

    % Function to compute IoU between two spindles
    function iou_value = iou(true_sp, pred_sp)
        start1 = true_sp(1);
        end1 = true_sp(2);
        start2 = pred_sp(1);
        end2 = pred_sp(2);

        intersection = max(0, min(end1, end2) - max(start1, start2));
        union = (end1 - start1) + (end2 - start2) - intersection;

        if union > 0
            iou_value = intersection / union;
        else
            iou_value = 0;
        end
    end

    matched_true_spindles = [];
    matched_pred_spindles = [];

    % Check all possible pairs (O(N × M))
    for i = 1:size(true_spindles, 1)
        for j = 1:size(pred_spindles, 1)
            if ismember(i, matched_true_spindles) || ismember(j, matched_pred_spindles)
                continue; % Skip already matched events
            end

            if iou(true_spindles(i, :), pred_spindles(j, :)) >= overlap_threshold
                matched_true_spindles = [matched_true_spindles, i];
                matched_pred_spindles = [matched_pred_spindles, j];
            end
        end
    end

    tp = length(matched_true_spindles);  % True Positives
    fn = size(true_spindles, 1) - tp;    % False Positives (for true_spindles)
    fp = size(pred_spindles, 1) - tp;    % False Negatives (for pred_spindles)

    % Calculate Precision, Recall, and F1 Score
    precision = tp / (tp + fp);
    recall = tp / (tp + fn);

    if (precision + recall) > 0
        f1_score = 2 * precision * recall / (precision + recall);
    else
        f1_score = 0;
    end
end

% Example usage:
true_spindles = [0, 5; 10, 15; 20, 25];  % Example set of true spindles
pred_spindles = [1, 6; 10, 16; 30, 35];  % Example set of predicted spindles
overlap_threshold = 0.2;

% Call the function
[f1_score, precision, recall] = iou_f1_score(true_spindles, pred_spindles, overlap_threshold);

% Display results
fprintf('F1 Score: %.2f\n', f1_score);
fprintf('Precision: %.2f\n', precision);
fprintf('Recall: %.2f\n', recall);

How we ensured that the neural network was correctly implemented

SomnoBot uses a well validated neural network called SUMOv2 [1], which achieves state-of-the-art detection accuracy. SUMOv2 has been developed, trained and validated using data from various clinics and sleep studies and is publicly available here. To run SUMOv2 in your browser, we have ported the neural network to SomnoBot using the same model weights as in the original publication [1]. However, computers operate with finite numerical precision, which varies between programming languages, libraries and computing hardware. This will inevitably lead, in rare cases, to SomnoBot predicting slightly different spindles as compared to the original implementation. We tested how strong these discrepancies are on 8 30-min EEG excerpts from the DREAMS dataset [5]. In total, both SomnoBot and the original implementation predicted 505 identical spindles, except for 6 spindles where either the start or end time was shifted by 0.01 seconds. We consider the observed differences to be negligible, indicating that our implementation closely follows the original one.

References

[1]

Grieger, N., Mehrkanoon, S., Schwabedal, J. T. C., Ritter, P. & Bialonski, S. From Sleep Staging to Spindle Detection: Evaluating End-to-End Automated Sleep Analysis. (2025).

[2]

Lacourse, K., Yetton, B., Mednick, S. & Warby, S. C. Massive Online Data Annotation, Crowdsourcing to Generate High Quality Sleep Spindle Annotations from EEG Data. Sci. Data 7, 190 (2020). doi:10.1038/s41597-020-0533-4

[3]

Berry, R. B., Brooks, R., Gamaldo, C. E., Harding, S. M., Lloyd, R. M., Marcus, C. L. & Vaughn, B. V. The AASM manual for the scoring of sleep and associated events: Rules, terminology and technical specifications, version 2.6. (American Academy of Sleep Medicine, 2020).

[4]

Wendt, S. L., Welinder, P., Sorensen, H. B. D., Peppard, P. E., Jennum, P., Perona, P., Mignot, E. & Warby, S. C. Inter-Expert and Intra-Expert Reliability in Sleep Spindle Scoring. Clin. Neurophysiol. 126, 1548–1556 (2015). doi:10.1016/j.clinph.2014.10.158

[5]

Devuyst, S., Kerkhofs, M. & Dutoit, T. The DREAMS Databases and Assessment Algorithm. (2005). doi:10.5281/ZENODO.2650141