Nederlandse samenvatting Introductie onderzoeksproject

The aim of this study was to improve the measurement of deep grey matter (dGM) atrophy in multiple sclerosis (MS) with the use of magnetic resonance imaging (MRI). For this we started with discussing the challenges of measuring GM in MS in Chapter 1. One of the conclusions was that more accurate automated segmentations methods are needed. Therefore, in Chapter 2, we evaluated the relation of the performance of well-established automated segmentation software with MS pathology. In Chapter 3 we addressed the white matter (WM) lesions, as WM lesions play an important role in diagnosis of MS and also affect brain image analyses in MS. To further stimulate methodological improvements in measurement of dGM atrophy in MS, we discussed in Chapter 4 how to improve open science in this field. Lastly, in Chapter 5 we developed an MS-specific automated segmentation software, MS-SMART, and an open reference dataset.
In Chapter 1 we discussed the urgent challenges of measuring grey matter (GM) atrophy in MS, distinguishing two main fields; i. pathology, physiology, and treatment effects and ii. measurement challenges. We discussed in more detail the pathological substrate, evolution of GM atrophy, influence of physiological variability, and evaluation of treatment response. Regarding technical measurement challenges, we discussed the influence of WM lesions on the measurement of GM atrophy. Moreover the influence of atrophy itself, the influence of other MS pathology, and technical variability. For every discussion point, we provided specific recommendations (summarized in Box 1 of Chapter 1) to improve measurements and interpretation of GM atrophy in individual MS patients. Two of these recommendations provide the basis for the rest of this thesis; the need of a public available reference data set and improvement of segmentation methods for MS.

In Chapter 2, we investigated the performance of existing automated dGM segmentation methods compared to a manual reference. Moreover, we evaluated whether there was a relation of the performance of those automated dGM segmentation methods with WM lesions and GM volume. We evaluated four different automated segmentation methods (FSL-FIRST, FreeSurfer, GIF and volBrain) on a multi-center dataset (21 MS subjects and 11 healthy controls). The performance of the methods was evaluated to manual reference on both volumetric (intraclass correlation (ICC)) and spatial (dice similarity coefficient (DSC)) agereement. The relation between segmentation accuracy of the methods, as expressed by their DSC with the manual outlines, and the global and local lesion volumes, region of interest volume, and normalized brain volume, was assessed. We concluded that existing automated methods have impaired performance on data of MS subjects, specifically, that the accuracy of the segmentations is reduced. Moreover, it was observed that performance generally deteriorated with higher lesion volume, and with lower normalized brain volume and structure of interest volume. This suggests that MS pathology may contribute to the impaired performance.

In Chapter 3 we investigated several aspects of the WM lesions in MS, divided over two sub-chapters. In Chapter 3.1 we discussed the performance of five automated WM lesion segmentation methods on a multi-center MS dataset (70 MS subjects). On the 2D fluid attenuated inversion recovery (FLAIR) images, manual lesion segmentation was performed and the segmentations of five automated methods (Cascade, LST-LGA, LST-LPA, Lesion TOADS and kNN-TTP) were compared to the manual outlines. Both volumetric (ICC) and spatial agreements (DSC and false positive and false negative volumes) were analysed. Furthermore, analyses were repeated using a leave-one-center-out design to exclude the center of interest from the training phase, in order to evaluate the performance of the method on ‘unseen’ centers. We concluded that the performance of the methods in this multi-center MS dataset was moderate, but appeared to be robust even with new datasets from centers not included in training the automated methods.

In Chapter 3.2 we developed a lesion simulation method (LESIM) to improve objective investigations of the effects of WM lesions on image analyses methods and to facilitate the development of segmentation methods that are robust to the presence of WM lesions. The LESIM software simulates lesions from an MS patient into a 3D T1-weighted (3DT1) image of a healthy control (HC), which results in a modified HC 3DT1 image with realistic lesions. We evaluated LESIM by visual inspection as well as a quantitative analysis of the effect of simulated lesions on FSL-SIENAX GM segmentations. We concluded that LESIM is a new, robust, and flexible tool for reliable WM MS lesion simulation that produces realistic lesions in healthy control images. Moreover, we showed that the simulated WM lesions have the expected effect on GM segmentation using FSL-SIENAX.

In Chapter 4 we addressed the issue of open science in the field of neuro-radiology. Firstly, by assessing the impact of facial features removal on clinically relevant outcome measurements (Chapter 4.1) and secondly, by developing and evaluating a standardized protocol for manual delineations of dGM structures (Chatper 4.2).

So, in Chapter 4.1 we investigated if removing facial features would affect subsequent automated image analyses. To do so, we tested the effect of three facial features removal methods (QuickShear, FaceMasking, and Defacing) on automated image analyses methods that give clinically relevant outcome measurements. We used three datasets of different diseases: Alzheimer’s Disease, MS, and patients with a glioblastoma. Therefore, we also used three different clinically relevant outcome measurements, respectively,: normalized brain volume, white matter lesion volume, and tumor volume. Differences between outcomes obtained from images from which facial features were removed and those obtained from full images were assessed by quantifying the intra-class correlation coefficient (ICC) for absolute agreement, and by testing for systematic differences using paired t-tests. We conclude that all three outcome measures were affected, although all differently, by the facial features removal methods. This included both failures of analyses methods and altered values for the outcome measures, including both “random” variation and systematic differences.

In Chapter 4.2 we discussed the development and evaluation of a manual segmentation protocol of dGM structures in MS. Next, we evaluated the accuracy of FASTSURF, a semi-automated segmentation method, in which sparse delineations serve as input. The standardized protocol was specifically developed for manually tracing dGM structures on 3D T1-weighted MRI scans of MS patients, by neurologists and neuroradiologists with broad experience in the field of MS and MRI. Aanatomical definitions were specified for each structure and alongside these landmarks, strict guidelines on how to recognize the outermost edges of the structures on orthogonal planes were described. To evaluate the protocol, three raters delineated dGM structures bilaterally on 3D-T1-weighted multi-center MRI scans of 23 MS patients and 12 controls. Intra- and inter-rater agreements? were assessed through volumetric (ICC) and spatial (JI and CIgen) agreement. Segmentations made with FASTSURF were also evaluated in terms of both volumetric and spatial agreement. We showed that raters achieved good to excellent intra- and inter-rater agreement and that these agreements were similar with use of FASTSURF. We concluded that the dGM manual segmentation protocol showed good reproducibility within and among raters. Moreover, this protocol could be combined with FASTSURF to produce a reference set of dGM structures with a lower workload.

In Chapter 5 we discussed the development of an MS-specific dGM automated segmentation method. MS-SMART is an open source automated segmentation method and is an atlas-based approach. The atlases for MS-SMART were manual outlined on 120 (100 MS subjects and 20 healthy controls) T1 MR images with use of the protocol developed in Chapter 4.2. The use of MS-specific atlases (images and labels) could help reduce the influence of MS pathology during alignment of the atlases to the target (input) image. In total, 60 images were used as an? atlas and training set for SMART and the other 60 were used for the evaluation of SMART and two well-established automated segmentation methods (FSL-FIRST and FreeSurfer). Evaluation was performed on both volumetric (ICC) and spatial (DSC) agreement compared to the manual outlines. We concluded that SMART outperformed the two well-established methods on this MS data set. However, we expect that with use of the shared atlas set and software code of SMART more methodological improvements in segmentation of dGM structures in MS could be made.