LCLIB Example
In this notebook we look at how we could take light curves from a LCLIB file and simulate them under different observing cadences and noise conditions.
[1]:
import numpy as np
from lightcurvelynx import _LIGHTCURVELYNX_BASE_DATA_DIR
from lightcurvelynx.astro_utils.passbands import PassbandGroup
from lightcurvelynx.math_nodes.np_random import NumpyRandomFunc
from lightcurvelynx.math_nodes.ra_dec_sampler import ObsTableRADECSampler
from lightcurvelynx.obstable.opsim import OpSim
from lightcurvelynx.simulate import simulate_lightcurves
from lightcurvelynx.survey_info import SurveyInfo
from lightcurvelynx.models.lightcurve_template_model import MultiLightcurveTemplateModel
from lightcurvelynx.utils.plotting import plot_lightcurves
Load Data Files
We start by loading the files we will need for running the simulation: the OpSim database and the passband information. Both of these live in the data/ directory in the root directory. Note that nothing in this directory is saved to github, so the files might have to be downloaded initially.
For Rubin, a large number of OpSims can be found at https://s3df.slac.stanford.edu/data/rubin/sim-data/. You can download an OpSim manually or using the from_url() helper function:
opsim_data = OpSim.from_url(opsim_url)
We only care about the observations in the OpSim in the filters we wish to simulate. So we use OpSim.filter_rows() to remove those rows that do not match.
[2]:
# Choose which filters to simulate.
filters = ["g", "r", "i", "z"]
# Load the OpSim data.
opsim_db = OpSim.from_db(_LIGHTCURVELYNX_BASE_DATA_DIR / "opsim" / "baseline_v5.3.0_10yrs.db")
print(f"Loaded OpSim with {len(opsim_db)} rows.")
# Filter to only the rows that match the filters we want to simulate.
filter_mask = np.isin(opsim_db["filter"], filters)
opsim_db = opsim_db.filter_rows(filter_mask)
# Print the number of rows and time bounds after filtering.
t_min, t_max = opsim_db.time_bounds()
print(f"Filtered OpSim to {len(opsim_db)} rows and times [{t_min}, {t_max}]")
Loaded OpSim with 2146797 rows.
Filtered OpSim to 1630613 rows and times [60796.00143922635, 64448.429406568604]
Create the model
We want to create models based on an existing LCLIB file. Again we will need to download the file of interest to the data directory. In this example we use the LCLIB_RRL-LSST.TEXT.gz data from https://zenodo.org/records/6672739. We load this into a MultiLightcurveTemplateModel object, which represents a set of light curves from which we can sample.
The MultiLightcurveTemplateModel stores a series of multi-band light curves, each corresponding to the observer frame bandfluxes for a single real or simulated objects. New observations are created by randomly choosing one of the light curves and interpolating it at new times. The starting time of the activity is controlled by the t0 parameter in the model. So we can generate a simulation where the object’s activity starts halfway through our observations.
Note that currently only periodic and non-reoccurring non-periodic light curves are supported. We treat reoccurring non-periodic light curves as non-reoccurring non-periodic (they will only occur once in the simulated output).
Since MultiLightcurveTemplateModel is a PhysicalModel, we can specify other parameters such as the RA and dec. In this examples, we generate this position information by sampling from the OpSim fields (using an ObsTableRADECSampler node). We sample the starting time of the light curve uniformly from the time covered by the OpSim.
[3]:
lc_file = _LIGHTCURVELYNX_BASE_DATA_DIR / "model_files" / "LCLIB_RRL-LSST.TEXT.gz"
# Use an OpSim based sampler for position.
ra_dec_sampler = ObsTableRADECSampler(
opsim_db,
radius=3.0, # degrees
node_label="ra_dec_sampler",
)
# Use a uniform sampler for the starting time (t0) of activity.
time_sampler = NumpyRandomFunc("uniform", low=t_min, high=t_max, node_label="time_sampler")
# Load the light curves from the LCLIB file. Only load the filters that are present in the OpSim data.
source = MultiLightcurveTemplateModel.from_lclib_file(
lc_file,
None, # We don't need the filter info since we are simulating at the passband level.
ra=ra_dec_sampler.ra,
dec=ra_dec_sampler.dec,
t0=time_sampler,
filters=filters,
node_label="source",
)
print(f"Loaded {len(source)} light curves from {lc_file}")
Loading: 49130lc [00:22, 2165.75lc/s]
Loaded 49130 light curves from /Users/jkubica/h/lightcurvelynx/data/model_files/LCLIB_RRL-LSST.TEXT.gz
Generate the simulations
We can now generate random simulations with all the information defined above. The simulate_lightcurves function takes three parameters: the source from which we want to sample (here the collection of lightcurves), the number of results to simulate (1,000), and the survey information.
Note: The survey information will load the default passbands for the LSST survey, but they will not be used since the model is creating simulations at the bandflux level.
[4]:
survey_info = SurveyInfo(obstable=opsim_db, survey_name="LSST")
lightcurves = simulate_lightcurves(source, 1_000, survey_info)
Simulating: 100%|██████████| 1000/1000 [00:01<00:00, 826.75obj/s]
The results are written in the nested-pandas format for easy analysis. Each row corresponds to a single simulated object, with a unique id, ra, dec, etc. The column params include all internal state, including hyperparameter settings, that was used to generate this object.
We can print the first row:
[5]:
print(lightcurves.loc[0])
id 0
ra 215.201942
dec -4.506222
nobs 674
t0 62834.956177
z None
params {'NumpyRandomFunc:integers_2.low': 0, 'NumpyRa...
lightcurve mjd filter flux fl...
Name: 0, dtype: object
The nested lightcurve column contains the times, filters, and fluxes for each observation of that object. We can treat it as a table:
[6]:
print(lightcurves.loc[0]["lightcurve"])
mjd filter flux fluxerr flux_perfect \
0 60800.147008 g 253150.426506 388.086661 253474.657302
1 60800.148806 g 254061.733267 398.370258 253810.451278
.. ... ... ... ... ...
672 64447.221323 i 346112.478726 504.587759 347002.104763
673 64448.153701 r 339159.015356 472.100270 339423.189543
survey_idx obs_idx is_saturated
0 0 3122 False
1 0 3126 False
.. ... ... ...
672 0 1629266 False
673 0 1630088 False
[674 rows x 8 columns]
Now let’s plot some random light curves. Note that all of the light curves in the “LCLIB_RRL-LSST.TEXT.gz” file are periodic, so we expect to see observations throughout the time range of the survey.
[7]:
random_ids = np.random.choice(len(lightcurves), 5)
for random_id in random_ids:
# Extract the row for this object.
lc = lightcurves.loc[random_id]
if lc["nobs"] > 0:
# Unpack the nested columns (filters, mjd, flux, and flux error).
lc_filters = np.asarray(lc["lightcurve"]["filter"], dtype=str)
lc_mjd = np.asarray(lc["lightcurve"]["mjd"], dtype=float)
lc_flux = np.asarray(lc["lightcurve"]["flux"], dtype=float)
lc_fluxerr = np.asarray(lc["lightcurve"]["fluxerr"], dtype=float)
# Look up which lightcurve was used.
graph_state = lc["params"]
lc_id = graph_state["source.selected_lightcurve"]
ra = graph_state["source.ra"]
dec = graph_state["source.dec"]
plot_lightcurves(
fluxes=lc_flux,
times=lc_mjd,
fluxerrs=lc_fluxerr,
filters=lc_filters,
title=f"Sample {random_id} from Lightcurve {lc_id} at ({ra:.2f}, {dec:.2f})",
)