Hiking the FOCEI Maximum Likelihood Surface

Growing up nestled in the heart of the Rocky Mountains, I had the opportunity to hike some of the most rugged mountains in the western United States. As I help code the nlmixr FOCEI likelihood surfaces I can’t help but draw some parallels between two of my hikes in these mountains and what we are currently implementing in the current nlmixr FOCEI implementation.

As a young man, our scout master announced that we would be taking an emergency preparedness hike where anywhere and any night. We were to be ready to go with a single phone-call, as if an earthquake or natural disaster forced us to hike in the middle of the night to a far-off destination. Of course, our scout-master was wise and took us on a hike we had done before on a clear, full-moon night (to the three sister lakes or Lake Blanche above the S-curve in Big Cottonwood canyon).  While it was a difficult hike up the mountain (11.1 km or 6.9 miles up 836 meters or 2746 feet), we could see where we were going and had general experience about the hike.

I will contrast this with a hike I did when I was a little older, and supposedly wiser. On this particular night we hiked up Mt. Timpanogos with a few of my friends to see the sun-rise, and possibly impress a few of the girls on the outing. I had never hiked Mt. Timpanogos before, and we chose a night when rain was falling from a moonless sky.  Although only a little longer and steeper (12.1 km or 7.5 miles up 979 meters or 3,210 feet) it felt 10x longer and harder than the Lake Blanche trail because I didn’t know where I was going and couldn’t see very well.

The Mt. Timpanogos trail is much like the current NONMEM 7.3 FOCEI implementation – you cannot see where you are going in the overall likelihood surface; You have to timidly put one foot in-front of another to make sure you are still getting a higher and higher likelihood. You hope you get to the maximum likelihood summit before the sun rises.  The current methods in NONMEM feel around for the next step for the population estimates because it does not currently use the gradient of the surface.  This makes the Likelihood climb much harder than it could be. (However I believe NONMEM does implement the gradient when figuring out each individual’s deviation from the population).  This implementation is preserved in nlmixr FOCEI prototype as well (though it is not the default).

In contrast, the current nlmixr FOCEI prototype (and possibly NONMEM 7.4), is like having moon-light shining around likelihood surface because you can calculate the gradient at any point. This way, you are not looking at your feet, but at the mountain you are climbing and are much better positioned to make an educated guess at the direction you should be going.  This method is described by Almquist 2015, but has been slightly modified to allow you to use the same Hessian approximation that NONMEM uses and also use a ridge penalty (if you wish).  In a single test problem, this translated to about a 1.5 fold improvement in minimization speed.  This speed gain comes with up-front computation costs (which are included in the time difference) where symbolic math (CAS) is done to calculate the gradient of the specific model you are running.  Additionally there are more differential equations to solve because the so-called “sensitivity” equations need to be solved at the same time as the general ODEs that are typically specified. (Don’t worry about calculating these sensitivity equations by hand, the prototype nlmixr does this for you as well).


The last piece that I would like to briefly comment on is the “prior” information that I had in the Lake Blanche hike. This allowed me to be guided to my ultimate destination by relying on what I knew about the hike. This prior information is also implemented in nlmixr as a “precision” criterion.  This specifies how much confidence you have in your initial estimates by adding a ridge penalty equivalent to the parameter’s precision if the parameters were all scaled to 0.1.  (The objective function has an additional ∑0.01× θ2s,i × precision added, where  θ2s,i is the ith parameter that was initially scaled to 1).  This adds some stability to the minimization algorithm at the cost of a little bias (some of the colinearity is reduced).  This penalty can always removed by setting precision=0.

To allow the stability without the bias, nlmixr allows re-estimating without the ridge penalty or even allowing the ridge penalty to decay as the objective function decreases. You can even keep the ridge penalty in the estimate, if you wish.  This allows your initial parameter specification to affect the final parameter estimates; if your faith in the parameter estimates is accurate, this should give a better predictive performance of other trials.  In fact, in the linear regression case, a rigde penalty can be directly associated with specifying a Bayesian prior distribution to inform the regression.

In the single problem tested using a ridge penalty helped find a lower overall objective function than using the same initial parameter estimates, but this is likely problem dependent.

Perhaps these new modeling features in nlmixr can enable me to run up a likelihood mountain with my own models. I may even find a more likely model with the same initial parameter estimates.  I’m excited for the levels of fun I can have with new parameters to tweak to give me an even better model.