Publications


Working Paper

  1. A penalized two-pass regression to predict stock returns with time-varying risk premia
    Gaetan Bakalli, Stéphane Guerrier and Olivier Scaillet.

    Journal of Econometrics, major revision invited.

    We develop a penalized two-pass regression with time-varying factor loadings. The penalization in the first pass enforces sparsity for the time-variation drivers while also maintaining compatibility with the no arbitrage restrictions by regularizing appropriate groups of coefficients. The second pass delivers risk premia estimates to predict equity excess returns. Our Monte Carlo results and our empirical results on a large cross-sectional data set of US individual stocks show that penalization without grouping can yield to nearly all estimated time-varying models violating the no arbitrage restrictions. Moreover, our results demonstrate that the proposed method reduces the prediction errors compared to a penalized approach without appropriate grouping or a time-invariant factor model.
  2. Multi-Signal Approaches for Repeated Sampling Schemes in Inertial Sensor Calibration

    IEEE Transactions on Signal Processing, major revision submitted.

    The task of inertial sensor calibration has become increasingly important due to the growing use of low-cost inertial measurement units which are however characterized by measurement errors. Being widely employed in a variety of mass-market applications, there is considerable focus on compensating for these errors by taking into account the deterministic and stochastic factors that characterize them. In this paper, we focus on the stochastic part of the error signal where it is customary to record the latter and use the observed error signal to identify and estimate the stochastic models, often complex in nature, that underlie this process. However, it is often the case that these error signals are observed through a series of replicates for the same inertial sensor and equally often that these replicates have the same model structure but their parameters appear different between replicates. This phenomenon has not been taken into account by current stochastic calibration procedures which therefore can be conditioned by flawed parameter estimation. For this reason, this paper aims at studying different approaches for this problem and studying their properties to take into account parameter variation between replicates thereby improving measurement precision and navigation uncertainty quantification in the long run.
  3. Chameleon microRNAs in breast cancer: their elusive role as regulatory factors in cancer progression
    Cesare Miglioli, Gaetan Bakalli, Samuel Orso, Mucyo Karemera, Roberto Molinari, Stéphane Guerrier and Nabil Mili.

    Scientific Reports, major revision submitted.

    Breast cancer is one of the most frequent cancers affecting women. Non-coding micro RNAs (miRNAs) seem to play an important role in the regulation of pathways involved in tumor occurrence and progression. Extending on the research in Haakensen et al. , where significant miRNAs were selected as being associated with the progression from normal breast tissue to breast cancer, in this work we put forward 112 sets of miRNA combinations, each including at most 5 expressions with high accuracy in discriminating healthy breast tissue from breast carcinoma. Our results are based on a recently developed machine learning technique which, instead of selecting a single model (or combination of features), delivers a set of models with equivalent predictive capabilities that allow to interpret and visualize the interaction of these features. These results shed new light on the biological action of the selected miRNAs which can behave in different ways according to the miRNA network with which they interact. Indeed, these revealed connections may contribute to explain why, in some cases, different studies attribute opposite functions to the same miRNA. It is therefore possible to understand how the role of a genomic variable may change when considered in interaction with other sets of variables, as opposed to only considering its effect when it is evaluated within a unique combination of features. The approach proposed in this work provides a statistical basis for the notion of chameleon miRNAs and is inspired by the emerging field of systems biology.
  4. SWAG: A Wrapper Method for Sparse Learning
    Roberto Molinari, Gaetan Bakalli, Stéphane Guerrier, Cesare Miglioli, Samuel Orso and Olivier Scaillet.

    arXiv.

    The majority of machine learning methods and algorithms give high priority to prediction performance which may not always correspond to the priority of the users. In many cases, practitioners and researchers in different fields, going from engineering to genetics, require interpretability and replicability of the results especially in settings where, for example, not all attributes may be available to them. As a consequence, there is the need to make the outputs of machine learning algorithms more interpretable and to deliver a library of “equivalent” learners (in terms of prediction performance) that users can select based on attribute availability in order to test and/or make use of these learners for predictive/diagnostic purposes. To address these needs, we propose to study a procedure that combines screening and wrapper approaches which, based on a user-specified learning method, greedily explores the attribute space to find a library of sparse learners with consequent low data collection and storage costs. This new method (i) delivers a low-dimensional network of attributes that can be easily interpreted and (ii) increases the potential replicability of results based on the diversity of attribute combinations defining strong learners with equivalent predictive power. We call this algorithm ``Sparse Wrapper AlGorithm'' (SWAG).
  5. Non-Standard Errors
    Albert J. Menkveld, Anna Dreber, Felix Holzmeister, Juergen Huber, Magnus Johanneson, Michael Kirchler, Michael Razen, Utz Weitzel, Gaetan Bakalli and et al..

    SSRN.

    In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in sample estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty: non-standard errors. To study them, we let 164 teams test six hypotheses on the same sample. We find that non-standard errors are sizeable, on par with standard errors. Their size (i) co-varies only weakly with team merits, reproducibility, or peer rating, (ii) declines significantly after peer-feedback, and (iii) is underestimated by participants.

2021

  1. Non applicability of validated predictive models for intensive care admission and death of COVID-19 patients in a secondary care hospital in Belgium.
    Nicolas Parisi, Aurore Janier-Dubry, Ester Ponzetto, Charalambos Pavlopoulos, Gaetan Bakalli, Roberto Molinari, Stéphane Guerrier and Nabil Mili.

    Journal of Emergency and Critical Care Medicine, 5: 22, 2021..

    Simple and reliable predictive scores for intensive care admissions and death based on clinical data are still lacking. The goal of this study is to implement such scores based on patients coming from our population catchment area and to compare them to available ones. These scores adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guidelines.

2020

  1. Wavelet-Based Moment-Matching Techniques for Inertial Sensor Calibration
    Stéphane Guerrier, Juan Jurado, Mehran Khaghani, Gaetan Bakalli, Mucyo Karemera, Roberto Molinari, Samuel Orso, John Raquet, Christine Schubert and Jan Skaloud.

    IEEE Transactions on Instrumentation and Measurement,69 (10), 7542-7551.

    The task of inertial sensor calibration has required the development of various techniques to take into account the sources of measurement error coming from such devices. The calibration of the stochastic errors of these sensors has been the focus of increasing amount of research in which the method of reference has been the so-called “Allan variance (AV) slope method” which, in addition to not having appropriate statistical properties, requires a subjective input which makes it prone to mistakes. To overcome this, recent research has started proposing “automatic” approaches where the parameters of the probabilistic models underlying the error signals are estimated by matching functions of the AV or wavelet variance with their model-implied counterparts. However, given the increased use of such techniques, there has been no study or clear direction for practitioners on which approach is optimal for the purpose of sensor calibration. This article, for the first time, formally defines the class of estimators based on this technique and puts forward theoretical and applied results that, comparing with estimators in this class, suggest the use of the Generalized method of Wavelet moments (GMWM) as an optimal choice. In addition to analytical proofs, experiment-driven Monte Carlo simulations demonstrated the superior performance of this estimator. Further analysis of the error signal from a gyroscope was also provided to further motivate performing such analyses, as real-world observed error signals may show significant deviation from manufacturer-provided error models.

2019

  1. A multisignal wavelet variance-based framework for inertial sensor stochastic error modeling
    Ahmed Radi, Gaetan Bakalli, Stéphane Guerrier, Naser El-Sheimy, Abu B Sesay and Roberto Molinari.

    IEEE Transactions on Instrumentation and Measurement, 68 (12), 4924-4936.

    The calibration of low-cost inertial sensors has become increasingly important over the last couple of decades, especially when dealing with sensor stochastic errors. This procedure is commonly performed on a single error measurement from an inertial sensor taken over a certain amount of time, although it is extremely frequent for different replicates to be taken for the same sensor, thereby delivering important information which is often left unused. In order to address the latter problem, this paper presents a general wavelet variance-based framework for multisignal inertial sensor calibration, which can improve the modeling and model selection procedures of sensor stochastic errors using all replicates from a calibration procedure and allows to understand the properties, such as stationarity, of these stochastic errors. The applications using microelectromechanical system inertial measurement units confirm the importance of this new framework, and a new graphical user interface makes these tools available to the general user. The latter is developed based on an R package called mgmwm and allows the user to select a type of sensor for which different replicates are available and to easily make use of the approaches presented in this paper in order to carry out the appropriate calibration procedure.

2018

  1. A two-step computationally efficient procedure for IMU classification and calibration
    Gaetan Bakalli, Ahmed Radi, Sameh Nassar, Stéphane Guerrier, Yuming Zhang and Roberto Molinari.

    2018 IEEE/ION Position, Location and Navigation Symposium (PLANS), 534-540.

    The task of inertial sensor calibration has always been challenging, especially when dealing with stochastic errors that remain after the deterministic errors have been filtered out. Among others, the number of observations is becoming increasingly high since sensor measurements are taken at high frequencies over longer periods of time, thereby placing considerable limitations on the estimation of the complex models that characterize stochastic errors (without considering testing and selection procedures). Moreover, before estimating these models, there is a need for tests that determine whether the error signals are characterized by a model that remains constant over time and, if so, which model best predicts these errors. Considering these needs, this paper presents an open-source software platform that allows practitioners to carry out these procedures by making use of two recent proposals which stem from the Generalized Method of Wavelet Moments framework. These proposals make use of the growing amount of signal replicates issued during sensor calibration procedures and the proposed platform allows users to easily employ various functions that implement these methods in a user-friendly and computationally efficient manner.

2017

  1. A computational multivariate-based technique for inertial sensor calibration
    Gaetan Bakalli, Ahmed Radi, Naser El-Sheimy, Roberto Molinari and Stéphane Guerrier.

    Proceedings of the 30th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS+ 2017), 3053-3060.

    The use of Inertial Measurement Units (IMU) for navigation purposes is constantly growing and they are increasingly being considered as the core dynamic sensing device for Inertial Navigation Systems (INS). However, these systems are characterized by sensor errors that can affect the navigation precision of these devices and consequently a proper calibration of the sensors is required. The first step in this direction is usually taken by evaluating the deterministic type of errors, such as bias and scale factor, which can be taken into account through known physical models. The second step consists in finding an appropriate model to describe the stochastic nature of the sensor errors. The focus of this paper is related to the second of such calibration procedures. Indeed, we propose an automatic model selection approach which is particularly appropriate when we observe/collect several independent replicates of the error signal of interest. In short, the proposed approach relies on the Generalized Methods of Wavelet Moments (GMWM) and the Wavelet Variance Information Criterion (WVIC), where we proposed a procedure to compute a Cross-Validation (CV) like estimator of the goodness-of-fit of a candidate model. This estimator provides by construction a tradeoff between model fit and model complexity, therefore allowing rank all candidate models and select the one (or the ones) that appears to be the most appropriate for the task of stochastic sensor calibration.
  2. An automatic calibration approach for the stochastic parameters of inertial sensors
    Ahmed Radi, Gaetan Bakalli, Naser El-Sheimy, Stéphane Guerrier and Roberto Molinari.

    Proceedings of the 30th International Technical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS+ 2017), 3028-3038.

    The task of inertial sensor calibration has become increasingly important due to the growing use of low-cost inertial measurement units which are however characterized by measurement errors. Being widely employed in a variety of mass-market applications, there is considerable focus on compensating for these errors by taking into account the deterministic and stochastic factors that characterize them. In this paper we focus on the stochastic part of the error signal where it is customary to register the latter and use the observed error signal to identify and estimate the stochastic models, often complex in nature, that underlie this process. However, it is often the case that these error signals are observed through a series of replicates for the same inertial sensor and equally often it can be noticed that these replicates have the same model structure but their parameters appear to be different between replicates. This phenomenon has not been taken into account by current stochastic calibration procedures which therefore can be conditioned by flawed parameter estimation. For this reason, this paper aims at delivering an approach that takes into account the parameter variation between replicates by delivering an estimator that minimizes a loss function that considers each replicate, thereby improving measurement precision on the long run, and allows to build a statistical test to determine the presence of parameter variation between replicates.
  3. © Copyright 2021 Gaetan Bakalli.