Statistical analysis is a critical element in the research process: it allows one to draw appropriate research conclusions from sets of collected data. Using the correct statistical approach, i.e., one that fits the nature and structure of the data, is of utter importance in this process. The ever-increasing complexity of data, prompted by advances in experimental techniques available to the field of neuroscience, calls for statistical approaches that go beyond the standard statistical tests. To optimally exploit the information present in experimental data, the statistical methods of choice should not only ensure the reliability and validity of the research conclusions, but also optimally describe and/or accommodate the complexity of the data. In this PhD-project, we aimed to elucidate statistical methods that optimally fit the complex data obtained presently within the field of neuroscience, and to develop a novel statistical model that fully exploits the information contained within intensive longitudinal behavioral mouse data. In addition, we describe a novel method for testing anxiety in a home-cage environment (PhenoTyper).
In Chapter 2 and Chapter 3 of this thesis, we demonstrated that it is crucial to accommodate in statistical analyses the clustered nature of data, which arises when multiple observations are collected from each research object. This not only prevents an increased false positive rate but also optimizes statistical power. In case of a study design in which all observations within a research object pertain to the same experimental condition (design A), it has been pointed out before that the false positive rate increases when the clustered nature of the data is not accommodated in the analysis, both within the neuroscience literature and beyond [72, 73, 75, 91, 93, 94]. However, the prevalence of nested data, and the amount of dependency due to nestedness that can be expected in the field of neuroscience had not been assessed previously. For a study design in which the obtained observations within a research object can pertain to different experimental conditions (design B), the discussion in neuroscience literature was limited to the gain in statistical power when accommodating variation in the average baseline outcome [72, 91]. However, in design B not only the average baseline outcome, but also the effect of the experimental manipulation may vary over research objects. Not accommodating variation in the experimental effect may result in an increased false positive rate. By means of a simulation study, we demonstrated the degree of inflation given systematic variation in either only the experimental effect, or in both the experimental effect and the baseline condition. These results are a valuable addition to the few previous (theoretical) studies [75, 151, 152] in which researchers showed with a example case or cases, or by considering the equation of the standard error of the experimental effect, that not accommodating this variation may result in an increased false positive rate.

In Chapter 4, we described and pharmacologically validated a new anxiety test that allows for unsupervised, automated, high-throughput testing of mice in a homecage system. The development of this test was motivated by a pressing need for reliable, high-throughput methods for comprehensive behavioral phenotyping to optimally benefit from the increasing availability of experimentally engineered mouse lines as expressed by e.g. [24, 34, 35], and nicely adds to the automated home-cage task developed by Kas et al. [52] to assess anxiety related behaviors.

In Chapter 5, a statistical tool based on Markov modeling - a hierarchical hidden semi Markov model (HSMM) - was developed and implemented in a Bayesian context to describe the temporal organization of behavior that can be observed when mice are studied in home-cage systems over a prolonged period of time. While simulation studies showed that the developed model still requires some adjustment if it is to be applied to data that resemble the observed mouse data, a real data example, comparing the behavioral pattern of young adult and aged C57BL/6J mice already clearly illustrated the advantage of the hierarchical HSMM over standard summary statistical tests. A Markov model including hidden behavioral states has been used once before to analyze longitudinal mouse data [59]. These researchers did not use a hierarchical model. In contrast, they used a two-step procedure in which they first assume that the underlying model that generates the observed behavior is similar over all mice in all groups, but then continue to investigate possible differences between groups based on the parameters obtained in the first step. The hierarchical model that we developed, however, allows for heterogeneity in model parameters both within and between groups. As a consequence, more information on individual differences between mice is retained, and group differences are better discernible and can be tested formally. In addition, the model we developed is not based on the generally untenable assumption that the probability of spending more time in the current behavioral state does not depend on the time already spent in that state. Moreover, although HMMs with a hierarchical structure have received some attention in literature [133Ð136], a hierarchical HSMM, allowing for random effects in all
model parameters while utilizing the favorable properties of the Gibbs sampler, has
not been presented before [132, 137].

All in all, the studies reported in these four chapters demonstrate the importance of applying statistical and methodological methods that fully exploit the complex structure of data generated by the novel experimental techniques that conquer the field of neuroscience.