Modeling a Gaussian Process together with calibration noise =========================================================== We consider a time series showing quasiperiodic oscillations and affected by white noise as well as calibration noise. The calibration noise is introduced by the fact that the instrument is calibrated periodically (e.g. once a day) with some error. Measurements using the same calibration share the same calibration error. The aim is to use a gaussian process to fit the quasiperiodic oscillations and be able to predict missing observations. We first generate the simulated time series: .. plot:: :context: close-figs import numpy as np import matplotlib.pyplot as plt np.random.seed(0) # Generate calendar nt = 100 tmax = 20 t = np.sort(np.concatenate(( np.random.uniform(0, tmax/3, nt//2), np.random.uniform(2*tmax/3, tmax, (nt+1)//2)))) # Quasiperiodic signal amp = 3.0 P0 = 5.2 dP = 0.75 P = P0 + dP*(t/tmax-1/2) y = amp*np.sin(2*np.pi*t/P) # Truth tsmooth = np.linspace(0, tmax, 2000) Psmooth = P0 + dP*(tsmooth/tmax-1/2) ysignal = amp*np.sin(2*np.pi*tsmooth/Psmooth) dysignal = amp*2*np.pi/Psmooth*(1-tsmooth*dP/(tmax*Psmooth))*np.cos(2*np.pi*tsmooth/Psmooth) # Measurement errors (white noise) yerr_meas = np.random.uniform(0.5, 1.5, nt) y = y + np.random.normal(0, yerr_meas) # Calibration errors (correlated noise) calib_id = (t//1).astype(int) # One calibration per day caliberr = np.random.uniform(0.5, 1.5, calib_id[-1]+1) yerr_calib = caliberr[calib_id] y += np.random.normal(0, caliberr)[calib_id] # Total errorbar yerr = np.sqrt(yerr_meas**2 + yerr_calib**2) # Plot plt.figure() plt.plot(tsmooth, ysignal, 'r', label='truth') plt.errorbar(t, y, yerr, fmt='.', color='k', label='meas.') plt.xlabel('t') plt.ylabel('y') plt.legend() We now fit the data using S+LEAF. As a first approach, we ignore the correlation induced by the calibration noise, and treat it as if it was white noise: .. plot:: :context: close-figs from spleaf.cov import Cov from spleaf.term import * from scipy.optimize import fmin_l_bfgs_b # Initialize the S+LEAF model cov = Cov(t, err = Error(yerr), sho = SHOKernel(0.5, 5.0, 1.0)) # We now fit the hyperparameters using the fmin_l_bfgs_b function from scipy.optimize. # Define the function to minimize def negloglike(x, y, cov): cov.set_param(x) nll = -cov.loglike(y) # gradient nll_grad = -cov.loglike_grad()[1] return(nll, nll_grad) # Fit xbest,_,_ = fmin_l_bfgs_b(negloglike, cov.get_param(), args=(y, cov)) # We now use S+LEAF to predict the missing data cov.set_param(xbest) mu, var = cov.conditional(y, tsmooth, calc_cov='diag') # Plot plt.figure() plt.plot(tsmooth, ysignal, 'r', label='truth') plt.errorbar(t, y, yerr, fmt='.', color='k', label='meas.') plt.fill_between(tsmooth, mu-np.sqrt(var), mu+np.sqrt(var), color='g', alpha=0.5) plt.plot(tsmooth, mu, 'g', label='predict.') plt.xlabel('t') plt.ylabel('y') plt.legend() We see that the gaussian process is not completely wrong but tend to absorb the correlated noise due to the calibration. The prediction in the gap is not very satisfying. Let us now correctly model the calibration noise with S+LEAF: .. plot:: :context: close-figs # We define a new covariance matrix including calibration error cov = Cov(t, err = Error(yerr_meas), calerr = CalibrationError(calib_id, yerr_calib), sho = SHOKernel(0.5, 5.0, 1.0)) # Fit xbest,_,_ = fmin_l_bfgs_b(negloglike, cov.get_param(), args=(y, cov)) # Predict cov.set_param(xbest) mu, var = cov.conditional(y, tsmooth, calc_cov='diag') # Plot plt.figure() plt.plot(tsmooth, ysignal, 'r', label='truth') plt.errorbar(t, y, yerr, fmt='.', color='k', label='meas.') plt.fill_between(tsmooth, mu-np.sqrt(var), mu+np.sqrt(var), color='g', alpha=0.5) plt.plot(tsmooth, mu, 'g', label='predict.') plt.xlabel('t') plt.ylabel('y') plt.legend() The results are indeed much better! S+LEAF also allows to predict the derivative of the gaussian process: .. plot:: :context: close-figs # Predict derivative dmu, dvar = cov.conditional_derivative(y, tsmooth, calc_cov='diag') # Plot plt.figure() plt.plot(tsmooth, dysignal, 'r', label='truth') plt.fill_between(tsmooth, dmu-np.sqrt(dvar), dmu+np.sqrt(dvar), color='g', alpha=0.5) plt.plot(tsmooth, dmu, 'g', label='predict.') plt.xlabel('t') plt.ylabel('dy/dt') plt.legend()