Customizing the flow
# !pip install pzflow matplotlib corner
Customizing the bijector and latent space¶
This example notebook demonstrates how to chain multiple bijectors to make a custom bijector, and how to choose the latent space We will build the default bijector that PZFlow automatically uses. You don't need to do this, as PZFlow will automatically build this bijector for you, but it's good to see how it is done, plus it allows me to explain the design decisions a little bit.
See the following notebooks for genuine examples of when you might want to build a custom bijector.
import jax.numpy as jnp
import matplotlib.pyplot as plt
import corner
import pzflow
from pzflow import Flow
from pzflow.bijectors import Chain, ShiftBounds, RollingSplineCoupling
from pzflow.distributions import Uniform
from pzflow.examples import get_galaxy_data
plt.rcParams["figure.facecolor"] = "white"
For this example, we will use the example galaxy data included with PZFlow.
It's okay if you don't know what this data is - it's just a 7-dimensional space of galaxy data.
If you are familiar with galaxy data:
- the first column is the galaxy's true redshift
- the following 6 columns are the galaxy's true apparent magnitudes in the LSST photometric bands
Again, don't worry if you are unfamiliar with this kind of data - the astronomy details are unimportant for understanding this notebook.
data = get_galaxy_data()
data
redshift | u | g | r | i | z | y | |
---|---|---|---|---|---|---|---|
0 | 0.287087 | 26.759261 | 25.901778 | 25.187710 | 24.932318 | 24.736903 | 24.671623 |
1 | 0.293313 | 27.428358 | 26.679299 | 25.977161 | 25.700094 | 25.522763 | 25.417632 |
2 | 1.497276 | 27.294001 | 26.068798 | 25.450055 | 24.460507 | 23.887221 | 23.206112 |
3 | 0.283310 | 28.154075 | 26.283166 | 24.599570 | 23.723491 | 23.214108 | 22.860012 |
4 | 1.545183 | 29.276065 | 27.878301 | 27.333528 | 26.543374 | 26.061941 | 25.383056 |
... | ... | ... | ... | ... | ... | ... | ... |
99995 | 0.661883 | 25.665363 | 25.703183 | 25.362422 | 24.936967 | 24.771529 | 24.691300 |
99996 | 0.864290 | 28.342505 | 27.654294 | 26.638448 | 25.615190 | 25.108037 | 24.938998 |
99997 | 0.973176 | 27.781264 | 27.386962 | 26.760144 | 26.107564 | 25.584028 | 25.403601 |
99998 | 2.281685 | 29.324894 | 28.148004 | 27.312318 | 26.883849 | 26.416446 | 25.762161 |
99999 | 0.254427 | 22.077757 | 20.985672 | 20.143253 | 19.805022 | 19.594315 | 19.471030 |
100000 rows × 7 columns
We will build a Flow to model this data.
You must understand a little about how normalizing flows work. Normalizing flows start with input data that is from some complicated data distribution. They then apply a series of transformations (bijections) that transform the data distribution into a simple latent distribution.
All of the heavy lifting of transforming the data distribution into the latent distribution is handled by a RollingSplineCoupling
. This bijection alternates Neural Spline Couplings and "Rolling" layers. RollingSplineCoupling
takes several parameters, which you can see in the documentation/API, but the only required parameter is nlayers
.
nlayers
sets the number of (Neural Spline, Roll)
couplets that are chained together inside of RollingSplineCoupling
. We will set nlayers = 7
to match the dimension of the input data.
The other really important parameter is the range of the splines, B
. You don't have to provide B
-- it's set to B=5
by default. However, it is important to understand that the splines are only defined on the range [-B, B]. Any data outside of this range will not be transformed (it just has the identity applied).
If you look at the DataFrame above, you will see that lots of our data lies outside the range [-5, 5]!
It is our job to make sure that our data lies within that range before it is passed into the RollingSplineCoupling
.
We can do this using the ShiftBounds
bijector. ShiftBounds
has 3 parameters:
mins
- the minima of the columns in your datamaxs
- the maxima of the columns in your dataB
- the range your data will be mapped onto: [-B, B].
In other words, if we had some data x
that ran from 3 to 7, then ShiftBounds(3, 7, 2)
would transform x
so that its range ran from [-2, 2].
In our case, since we left B=5
in the RollingSplineCoupling
, we need to make sure that all of our data columns lay in the range [-5, 5]. However, RollingSplineCoupling
performs better when the data isn't too close to the edge of its support (it will still work - you will just see some weird edge artifacts when sampling from the flow). To avoid edge effects, we will map all of our data columns into the range [-4, 4].
So our final bijection will be:
ShiftBounds(mins, maxs, B=4) --> RollingSplineCoupling(7, B=5)
.
In PZFlow, you chain these together using the Chain
bijector:
# get minima and maxima for each column
mins = jnp.array(data.min(axis=0))
maxs = jnp.array(data.max(axis=0))
# get the number of dimensions
ndim = data.shape[1]
# build the bijector
bijector = Chain(
ShiftBounds(mins, maxs, B=4),
RollingSplineCoupling(ndim, B=5),
)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Now we have our bijector!
We also need a latent space. By default, PZFlow uses a uniform distribution over the range [-5, 5].
This is because the RollingSplineCoupling
has a default range [-5, 5], and we need our bijector to match this range.
You could choose any compact distribution that matches this range, but uniform is the simplest choice.
PZFlow uses this distribution by default, but just like the bijector, we will manually construct it just for demonstration:
latent = Uniform(input_dim=ndim, B=5)
We are finally ready to construct the normalizing flow! Just like before, we must tell it the names of our data columns, but this time we will explicitly pass a bijector and a latent distribution
Note: the order of the column names you give to PZFlow MUST match the order of the columns implicit in the mins
and maxs
you gave to ShiftBounds
above. This is the ONLY time you have to keep track of the column ordering. After creating the flow, PZFlow will always use the column names to access the correct data.
flow = Flow(data.columns, bijector=bijector, latent=latent)
Now let's train the flow!
%%time
losses = flow.train(data, epochs=200, verbose=True)
Training 200 epochs Loss: (0) 20.4908 (1) 2.1882 (11) -3.0561 (21) -3.7004 (31) -3.5159 (41) -4.3430 (51) -4.7489 (61) -4.0210 (71) -5.0300 (81) -3.9104 (91) 0.3195 (101) -5.2164 (111) -5.3040 (121) -4.9196 (131) -6.1689 (141) -5.9213 (151) -6.3125 (161) -6.4114 (171) -6.6141 (181) -6.0520 (191) -6.2788 (200) -5.7158 CPU times: user 1h 7min 33s, sys: 44min 50s, total: 1h 52min 24s Wall time: 21min 29s
Now let's plot the training losses to make sure everything looks like we expect it to...
plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("Training loss")
plt.show()
Great! It looks like we could get better performance if we train for longer or use some kind of regularization, but this is sufficient for this demo.
Now we can draw samples from the flow, and compare these samples to the original data set.
samples = flow.sample(10000, seed=0)
fig = plt.figure(figsize=(12,12))
ranges = [(-0.1,2.4), (19.5,33), (19,32), (19,29), (19,29), (19,28), (19,28)]
corner.corner(samples, fig=fig, color='C3', bins=20, range=ranges, hist_bin_factor=2, data_kwargs={'ms':3}, contour_kwargs={'linewidths':2})
corner.corner(data[:10000], fig=fig, bins=20, range=ranges, hist_bin_factor=2, color='C0', data_kwargs={'ms':3}, show_titles=True, labels=data.columns);
The original data is in blue, while the distribution learned by the normalizing flow is in red.
This looks great!
We can also use the flow to calculate redshift posteriors for galaxies:
grid = jnp.linspace(0, 2, 100)
pdfs = flow.posterior(samples, column="redshift", grid=grid)
Let's plot a few different posteriors (in blue), along with the true redshifts of the galaxies (in red):
fig, axes = plt.subplots(1,3,figsize=(7,2.2), dpi=120)
idx = [0, 3, 12]
for i,ax in zip(idx, axes):
true_z = samples['redshift'][i]
ax.axvline(true_z, 0, 1, c="C3",
label='True z')
ax.plot(grid, pdfs[i])
ax.set(xlabel="redshift",
xticks=[0,0.5,1,1.5,2],
yticks=[])
axes[0].legend()
axes[0].set(ylabel='$p(z)$')
plt.show()
We can also draw posteriors for magnitudes. For example, lets draw posteriors for the $u$ band:
u_grid = jnp.arange(24, 32, 0.1)
u_pdfs = flow.posterior(samples, column='u', grid=u_grid)
Let's plot the $u$ band posteriors for the same galaxies above:
fig, axes = plt.subplots(1,3,figsize=(7,2.2), dpi=120)
idx = [0, 3, 12]
for i,ax in zip(idx, axes):
true_u = samples['u'][i]
ax.axvline(true_u, 0, 1, c="C3",
label='True $u$')
ax.plot(u_grid, u_pdfs[i])
ax.set(xlabel="$u$ mag",
yticks=[])
axes[2].legend()
axes[0].set(ylabel='$p(u)$')
plt.show()
Now let's store some information with the flow about the data it was trained on.
flow.info = f"""
This is an example flow, trained on 100,000 simulated galaxies with
redshifts in the range (0, 2.3) and photometry in the LSST ugrizy bands.
The data set used to train this flow is available in the `examples` module:
>>> from pzflow.examples import get_galaxy_data
>>> data = get_galaxy_data()
This flow was created with pzflow version {pzflow.__version__}
"""
print(flow.info)
This is an example flow, trained on 100,000 simulated galaxies with redshifts in the range (0, 2.3) and photometry in the LSST ugrizy bands. The data set used to train this flow is available in the `examples` module: >>> from pzflow.examples import get_galaxy_data >>> data = get_galaxy_data() This flow was created with pzflow version 3.0.0
This example flow is also packaged with PZFlow, so you can load it and play with it whenever you want:
from pzflow.examples import get_example_flow
flow = get_example_flow()
See! It's the same one:
print(flow.info)
This is an example flow, trained on 100,000 simulated galaxies with redshifts in the range (0, 2.3) and photometry in the LSST ugrizy bands. The data set used to train this flow is available in the `examples` module: >>> from pzflow.examples import get_galaxy_data >>> data = get_galaxy_data() This flow was created with pzflow version 3.0.0