Monday, November 4, 2024

Improve Model Efficiency - Part 4

Start with a Good Conceptual Model

A lot of my consulting work involves reviews of client models. Usually, the client is experiencing either slow performance, or there is some aspect of the model that is making it infeasible, and they want me to show them how to fix it. In many cases, the problem isn't so much in the Woodstock syntax as it is in the logic that was used to construct it (i.e., the conceptual model). Woodstock training programs largely focus on syntax and how to get a model working. But other than the problem statement provided in the training handbook, there is little guidance in how to develop an efficient conceptual model. So, let's talk about that.

Years ago, we used to spend a lot more time training analysts on how to model specific kinds of silviculture. Often, we would have students draw flow-charts on the whiteboard that represent the actions, outputs and transitions of their forest models. It was a good exercise because it decomposes a series of activities into discrete events and outcomes. However, new analysts usually struggle with the process, and flow-charts can end up looking like this:


Poorly-conceived conceptual model

Consider Time

Every Woodstock model has a planning horizon divided into planning periods. The length of a planning period x the number of planning periods = the planning horizon. But how do you determine the length of the planning period? Usually, it is based on the desired resolution for the harvest level, or annual allowable cut (AAC). For most of my clients building strategic planning models, a planning period is 1 year in length. The model results then report harvests, expenses and revenues on an annual basis.

Another consideration, however, are the yield tables available. In more northern locations, tree growth is sufficiently slow that annual estimates are unreliable and so growth is incremented in 5- or 10-year time steps. While you can still use annual planning periods with these yields, you need to rely on some form of linear interpolation or spline-fitting functions which introduce biases to the estimates. In my opinion, it is best to match planning period lengths to the natural increments of your growth model.

Consider Your Actions

Once you have settled on your planning period length, next you need to consider the actions in your model. The first consideration is whether the action is a choice. Clearly, a final harvest, such as a clearcut, is a choice. Do I need a different action if I'm harvesting in conifer stands versus deciduous stands? That depends on your reporting needs. Differentiating harvest costs by species group is easily handled with different outputs and doesn't require two harvest actions. However, suppose a product is differentiated by how it is harvested (e.g., tree length operation versus cut-to-length operation where there is a price differential). In this case, you WILL need to have different final harvest actions.

Reforestation is almost always a policy requirement, but whether planting is a choice depends on the existence of alternatives. If you are using 5-year or decadal planning periods, many of the reforestation actions that occur in the forest can be collapsed into a single decision variable. Defining a planting action to occur at _AGE = 0 is unnecessary. You could just as easily consider it part of the clearcut action and assume the transition to the planted regen condition.

If you always plant a stand following harvest, regardless of different site preparation steps or planting densities, you may not need a decision variable for planting. Instead, you could rely on REGIMES. Many of my clients have different treatment and cost structures for different site conditions, but these are all handled through prescriptions in the REGIMES section. The important thing to remember is that there is a single decision variable for each alternative. 

Why is this important? Every action results in two outcomes: either the action is performed, or it isn't. If you model something that is required through actions, you need to add constraints to force the outcome you want. This is very inefficient because you add to the number of decision variables and non-zero elements, and constraints add rows.

Consider Your Transitions

The trickiest part of any conceptual model is predicting how a future stand will behave following a final harvest. If everything gets planted to a single species, it is straightforward with a single 100% transition to a regenerated development type (DevType). But what about plantation failures? Shouldn't we model those? You may have a good handle on how often plantation failures occur, but I'm betting you can't predict which harvest areas will fail. Why? Because if you could predict them, you'd change your site preparation methods to avoid the failure.

Instead, transitions should focus on outcomes that reflect your management intent and that you can predict with certainty. If 2% of plantations fail on average, you can account for the area and cost to correct them with in-fill planting, without a transition to a failed state that would then require an action to correct it. Similarly, some stands regenerate in an over-stocked condition, and require precommercial thinning (PCT). Again, you can account for the area and cost of PCT without explicitly recognizing the transition to an overstocked state and the subsequent PCT to correct it. Your management intent is not to produce defective stands. You shouldn't bother modeling things that you do not have adequate data for.

A large number of my clients build "stand-based models", which feature the stand-ID from inventory as a theme in the model. But stand-based is only relevant for existing stands - once they are final harvested, they transition to a stratum based on a handful of stand characteristics like site quality, species, etc. But time and time again, I encounter models where they do not void the stand-ID after the final harvest in the TRANSITIONS section. This results in a combinatorial explosion of future decision variables that are completely unnecessary. The example below is from a model with a lot of commercial thinning options.


Column generation without collapsed stand-ID

For future yields, the stand-ID provides nothing to the objective function or constraints - it just increases the number of DevTypes that Woodstock has to keep track of. If you collapse the stand-ID after final harvest, you'll get the same answer in far less time. The example below shows that about 20% of the decision variables in later periods can be eliminated by collapsing the stand-ID.


Column generation with collapsed stand-ID

Yes, I've heard the arguments many times that you NEED the stand-ID for future stands, so you know the exact location of harvests 40 years into the future. Forgive my cynicism, but I doubt most of you follow a plan exactly for the first year, never mind the first decade or more.

Discussion

If you are running a model that you developed years ago, or worse, you inherited from someone else years ago, and that model is slow and cumbersome, maybe it is time to toss the baby with the bath water and start over. Starting fresh does require time, but how much time are you wasting waiting for a model that generates unnecessary decision variables, has transitions that are impossible to trace, and so on. 

Afraid of tossing the baby out with the bath water? Contact me!

If you need help revamping your model, or looking to start over from scratch, I'm more than happy to help. Give me a shout! Nobody knows more about Woodstock modeling!


No comments:

Post a Comment

Why are MIP models difficult to solve (or not)?

Introduction I recently joined a conversation about why a mixed-integer programming (MIP) problem is so much harder to solve than a regular ...