Publications
Working
SWAG: A Wrapper Method for Sparse Learning
Arxiv.
The majority of machine learning methods and algorithms give high priority to prediction performance which may not always correspond to the priority of the users. In many cases, practitioners and researchers in different fields, going from engineering to genetics, require interpretability and replicability of the results especially in settings where, for example, not all attributes may be available to them. As a consequence, there is the need to make the outputs of machine learning algorithms more interpretable and to deliver a library of "equivalent" learners (in terms of prediction performance) that users can select based on attribute availability in order to test and/or make use of these learners for predictive/diagnostic purposes. To address these needs, we propose to study a procedure that combines screening and wrapper approaches which, based on a user-specified learning method, greedily explores the attribute space to find a library of sparse learners with consequent low data collection and storage costs. This new method (i) delivers a low-dimensional network of attributes that can be easily interpreted and (ii) increases the potential replicability of results based on the diversity of attribute combinations defining strong learners with equivalent predictive power. We call this algorithm "Sparse Wrapper AlGorithm" (SWAG).
2024
Non-standard errors
Journal of Finance 79 (3), 2339-2390.
In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty: Non-standard errors (NSEs). We study NSEs by letting 164 teams test the same hypotheses on the same data. NSEs turn out to be sizable, but smaller for better reproducible or higher rated research. Adding peer-review stages reduces NSEs. We further find that this type of uncertainty is underestimated by participants.
2023
Platform Combining Statistical Modeling and Patient-Derived Organoids to Facilitate Personalized Treatment of Colorectal Carcinoma
Journal of Experimental & Clinical Cancer Research 42 (1).
This study presents a novel approach for designing personalized treatment for colorectal cancer (CRC) patients. The approach combines ex vivo organoid efficacy testing with mathematical modeling of the results. The study utilized a validated phenotypic approach called Therapeutically Guided Multidrug Optimization (TGMO) to identify optimized drug combinations (ODC) that showed low-dose synergistic effects in 3D human CRC models. The ODCs were validated using patient-derived organoids (PDO) from both primary and metastatic CRC cases. Molecular characterization of the CRC material was performed using whole-exome sequencing and RNAseq. In PDO from patients with liver metastases, the identified ODCs demonstrated significant inhibition of cell viability, outperforming the standard CRC chemotherapy (FOLFOXIRI) administered at clinical doses. Additionally, patient-specific ODCs based on TGMO showed superior efficacy compared to the current chemotherapy standard of care. This approach enables the optimization of synergistic multi-drug combinations tailored to individual patients within a clinically relevant timeframe.A Penalized Two-pass Regression to Predict Stock Returns with Time-Varying Risk Premia
Journal of Econometrics, Volume 237, Issue 2, Part C.
We develop a penalized two-pass regression with time-varying factor loadings. The penalization in the first pass enforces sparsity for the time-variation drivers while also maintaining compatibility with the no arbitrage restrictions by regularizing appropriate groups of coefficients. The second pass delivers risk premia estimates to predict equity excess returns. Our Monte Carlo results and our empirical results on a large cross-sectional data set of US individual stocks show that penalization without grouping can yield to nearly all estimated time-varying models violating the no arbitrage restrictions. Moreover, our results demonstrate that the proposed method reduces the prediction errors compared to a penalized approach without appropriate grouping or a time-invariant factor model.
2022
Evidence of Antagonistic Predictive Effects of miRNAs in Breast Cancer Cohorts Through Data-Driven Networks
Scientific reports, 12(5166), p.1-16.
Non-coding micro RNAs (miRNAs) dysregulation seems to play an important role in the pathways involved in breast cancer occurrence and progression. In different studies, opposite functions may be assigned to the same miRNA, either promoting the disease or protecting from it. Our research tackles the following issues: (i) why aren’t there any concordant findings in many research studies regarding the role of miRNAs in the progression of breast cancer? (ii) could a miRNA have either an activating effect or an inhibiting one in cancer progression according to the other miRNAs with which it interacts? For this purpose, we analyse the AHUS dataset made available on the ArrayExpress platform by Haakensen et al. The breast tissue specimens were collected over 7 years between 2003 and 2009. miRNA-expression profiling was obtained for 55 invasive carcinomas and 70 normal breast tissue samples. Our statistical analysis is based on a recently developed model and feature selection technique which, instead of selecting a single model (i.e. a unique combination of miRNAs), delivers a set of models with equivalent predictive capabilities that allows to interpret and visualize the interaction of these features. As a result, we discover a set of 112 indistinguishable models (in a predictive sense) each with 4 or 5 miRNAs. Within this set, by comparing the model coefficients, we are able to identify three classes of miRNA: (i) oncogenic miRNAs; (ii) protective miRNAs; (iii) undefined miRNAs which can play both an oncogenic and a protective role according to the network with which they interact. These results shed new light on the biological action of miRNAs in breast cancer and may contribute to explain why, in some cases, different studies attribute opposite functions to the same miRNA.
2021
Non Applicability of Validated Predictive Models for Intensive Care Admission and Death of COVID-19 Patients in a Secondary Care Hospital in Belgium
Journal of Emergency and Critical Care Medicine, 5(22), p.1-13.
Background: Simple and reliable predictive scores for intensive care admissions and death based on clinical data in coronavirus disease 2019 (COVID-19) patients are numerous but may be misleading. Predictive scores for admission to intensive care unit (ICU) or death based on clinical and easily affordable laboratory data are still needed in secondary hospital and hospitals in developing countries that do not have high-performance laboratories. Methods: The goal of this study is to verify that a recently published predictive score conducted on a large scale in China (Liang score) can be used on patients coming from a Belgian population catchment area. Monocentric retrospective cohort study of 66 patients with known COVID-19 disease run from early March to end of May in Clinique Saint-Pierre Ottignies, a secondary care hospital in Belgium. The outcomes of the study are (I) admission in the ICU and (II) death. All patients admitted in the Emergency Department with a positive RT-PCR SARS-CoV-2 test were included in the study. Routine clinical and laboratory data were collected at their admission and during their stay, as well as chest X-rays and CT-scans. Liang score was used as benchmark. Logistic regression models were used to develop predictive. Results: Liang score performs poorly, both in terms of admission to intensive care and in terms of death. In our cohort, it appears that lactate dehydrogenase (LDH) above 579 UI/L and venous lactate above 3.02 mmol/L may be considered as good predictive biological factors for ICU admission. With regards to death risk, neutrophil-lymphocyte ratio (NLR) above 22.1, tobacco abuse status and respiratory impairment appears to be relevant predictive factors. Conclusions: Firstly, a promising score from a large-scale study in China appears to perform poorly when applied to a European cohort, whether to predict for admission to ICU or death. Secondly, biological features that are quite significant for the admission to ICU such as LDH or venous lactate cannot predict death. Thirdly, simple and affordable variables such as LDH, LDH + sex, or LDH + sex + venous lactate have a very good sensitivity and an acceptable specificity for ICU admission.
2020
Wavelet-Based Moment-Matching Techniques for Inertial Sensor Calibration
IEEE Transactions on Instrumentation & Measurement, 69(10), p.7542-7551.
The task of inertial sensor calibration has required the development of various techniques to take into account the sources of measurement error coming from such devices. The calibration of the stochastic errors of these sensors has been the focus of increasing amount of research in which the method of reference has been the so-called “Allan variance (AV) slope method” which, in addition to not having appropriate statistical properties, requires a subjective input which makes it prone to mistakes. To overcome this, recent research has started proposing “automatic” approaches where the parameters of the probabilistic models underlying the error signals are estimated by matching functions of the AV or wavelet variance with their model-implied counterparts. However, given the increased use of such techniques, there has been no study or clear direction for practitioners on which approach is optimal for the purpose of sensor calibration. This article, for the first time, formally defines the class of estimators based on this technique and puts forward theoretical and applied results that, comparing with estimators in this class, suggest the use of the Generalized method of Wavelet moments (GMWM) as an optimal choice. In addition to analytical proofs, experiment-driven Monte Carlo simulations demonstrated the superior performance of this estimator. Further analysis of the error signal from a gyroscope was also provided to further motivate performing such analyses, as real-world observed error signals may show significant deviation from manufacturer-provided error models.
2019
A Multisignal Wavelet Variance-based Framework for Inertial Sensor Stochastic Error Modeling
IEEE Transactions on Instrumentation and Measurement, 68(12), p.4924-4936.
The calibration of low-cost inertial sensors has become increasingly important over the last couple of decades, especially when dealing with sensor stochastic errors. This procedure is commonly performed on a single error measurement from an inertial sensor taken over a certain amount of time, although it is extremely frequent for different replicates to be taken for the same sensor, thereby delivering important information which is often left unused. In order to address the latter problem, this paper presents a general wavelet variance-based framework for multisignal inertial sensor calibration, which can improve the modeling and model selection procedures of sensor stochastic errors using all replicates from a calibration procedure and allows to understand the properties, such as stationarity, of these stochastic errors. The applications using microelectromechanical system inertial measurement units confirm the importance of this new framework, and a new graphical user interface makes these tools available to the general user. The latter is developed based on an R package called mgmwm and allows the user to select a type of sensor for which different replicates are available and to easily make use of the approaches presented in this paper in order to carry out the appropriate calibration procedure.
2018
A two-step computationally efficient procedure for IMU classification and calibration
2018 IEEE/ION Position, Location and Navigation Symposium (PLANS) 534-540.
The task of inertial sensor calibration has always been challenging, especially when dealing with stochastic errors that remain after the deterministic errors have been filtered out. Among others, the number of observations is becoming increasingly high since sensor measurements are taken at high frequencies over longer periods of time, thereby placing considerable limitations on the estimation of the complex models that characterize stochastic errors (without considering testing and selection procedures). Moreover, before estimating these models, there is a need for tests that determine whether the error signals are characterized by a model that remains constant over time and, if so, which model best predicts these errors. Considering these needs, this paper presents an open-source software platform that allows practitioners to carry out these procedures by making use of two recent proposals which stem from the Generalized Method of Wavelet Moments framework. These proposals make use of the growing amount of signal replicates issued during sensor calibration procedures and the proposed platform allows users to easily employ various functions that implement these methods in a user-friendly and computationally efficient manner.
2017
A computational multivariate-based technique for inertial sensor calibration
Proceedings of the ION GNSS 2017 , Portland, OR, USA, 3053-3060.
The task of inertial sensor calibration has become increasingly important due to the growing use of low-cost inertial measurement units which are however characterized by measurement errors. Being widely employed in a variety of mass-market applications, there is considerable focus on compensating for these errors by taking into account the deterministic and stochastic factors that characterize them. In this paper we focus on the stochastic part of the error signal where it is customary to register the latter and use the observed error signal to identify and estimate the stochastic models, often complex in nature, that underlie this process. However, it is often the case that these error signals are observed through a series of replicates for the same inertial sensor and equally often it can be noticed that these replicates have the same model structure but their parameters appear to be different between replicates. This phenomenon has not been taken into account by current stochastic calibration procedures which therefore can be conditioned by flawed parameter estimation. For this reason, this paper aims at delivering an approach that takes into account the parameter variation between replicates by delivering an estimator that minimizes a loss function that considers each replicate, thereby improving measurement precision on the long run, and allows to build a statistical test to determine the presence of parameter variation between replicates.An automatic calibration approach for the stochastic parameters of inertial sensors
Proceedings of the ION GNSS 2017 , Portland, OR, USA, 3028-3038.
The use of Inertial Measurement Units (IMU) for navigation purposes is constantly growing and they are increasingly being considered as the core dynamic sensing device for Inertial Navigation Systems (INS). However, these systems are characterized by sensor errors that can affect the navigation precision of these devices and consequently a proper calibration of the sensors is required. The first step in this direction is usually taken by evaluating the deterministic type of errors, such as bias and scale factor, which can be taken into account through known physical models. The second step consists in finding an appropriate model to describe the stochastic nature of the sensor errors. The focus of this paper is related to the second of such calibration procedures. Indeed, we propose an automatic model selection approach which is particularly appropriate when we observe/collect several independent replicates of the error signal of interest. In short, the proposed approach relies on the Generalized Methods of Wavelet Moments (GMWM) and the Wavelet Variance Information Criterion (WVIC), where we proposed a procedure to compute a Cross-Validation (CV) like estimator of the goodness-of-fit of a candidate model. This estimator provides by construction a tradeoff between model fit and model complexity, therefore allowing rank all candidate models and select the one (or the ones) that appears to be the most appropriate for the task of stochastic sensor calibration.