Recently, there has been a huge boost in development of non-invasive diagnostic devices that use Machine Learning algorithms on signals, images, laboratory parameters, etc. Often these algorithms are trained on small data sets coming from a hand full of local sites (hospitals, clinics, etc.), i.e. from a relatively homogeneous population. However, validation of these algorithms or devices embedded with these algorithms required for regulatory and market authorization may need to be carried out on a much larger scale, both in terms of size as well as heterogeneity. This increases the risk of failed validation trials for diagnostic devices. Other risk factors include sub-optimal probability threshold for binary classifiers and inadequate testing before launching for validation.
In this talk, I will describe adaptive designs that could be used to mitigate such risks while enabling unbiased estimation of accuracy measures of interest. These designs are based on implementing further augmented training when required and follow a learningcurve at multiple interim looks at the data to decide whether or not the algorithms have been adequately trained for the purpose or not. Background on diagnostic device validation from a regulatory statistics point of view will also be covered.
Common diseases such as heart disease or diabetes result from a complex interplay between genes and environment. Risk prediction algorithms are used to identify high-risk individuals for early intervention to reduce the premature mortality from non-communicable diseases, but we have yet to discover models that can integrate the tens of thousands of data points that can be measured nowadays by omics technologies in a single blood sample. In this talk, I will present ongoing work in the context of two research projects funded by the Research Council of Norway and the European Innovation Council to create intelligent systems for risk prediction and diagnosis of non-communicable diseases from blood-based multi-omics data. I will present novel algorithms for distinguishing causation from correlation in multi-omics data and for structure learning and inference in large-scale, multi-organ causal Bayesian gene networks. Initial results of a proof-of-concept study in cardiovascular disease will also be presented.
The kinetic theory of active particles offers a rich approach to describing biological systems, particularly when dealing with large multicellular populations and utilizing different scales of description. Mathematically, this approach involves a system of integro-differential equations at the kinetic level, and an autonomous dynamical system in the hydrodynamic limit. In this seminar, I will combine these mathematical approaches to present a model describing the anomalous response of the immune system, including recurrent dynamics, pattern formation, and brain lesions characteristic of certain human pathologies.