Monday, November 25, 2024

Is Your Conceptual Model Logically Sound?

Introduction

In the last few posts, I've been talking about flaws in your conceptual model that can make your model run slowly (i.e., it is an inefficient use of your time). Making poor choices about landscape themes can blow up the size of your model dramatically and make it nigh on impossible to properly do QA (or even figure out what is happening).

However, it is also possible to have a reasonable implementation of a flawed conceptual model. This is a model that, while semantically and syntactically correct, doesn't pass the smell test for how the real world works. For this exercise, I'm going to present snippets from real client models that illustrate both kinds of problems, without disclosing where they came from.

Exhibit 1 - 9 Theme Model


Exponential Growth in Columns


Almost 12 GB of RAM required

We see by the exponential growth in the number of columns by planning period, that there's something undesirable going on. For a model with only 9 themes and 20 planning periods, that is a very large number of columns that requires a lot of computer memory (almost 12 GB). Where to start?

Landscape Woes

The model identifies harvest settings, which makes this essentially a stand-based model. They used _INDEXing in the LANDSCAPE section to associate net area, and harvest method proportions to each unit, rather than using themes and the AREAS section. 

1431 _INDEX(yOpArea=56.0,yGroundProp=0.00,yLineProp=1.00,yHeliProp=0.00)
1433 _INDEX(yOpArea=96.9,yGroundProp=0.00,yLineProp=0.28,yHeliProp=0.72)
1434 _INDEX(yOpArea=62.3,yGroundProp=0.05,yLineProp=0.33,yHeliProp=0.62)

I dislike this approach because you cram too much information into a single attribute (stand ID), and so when you want to identify sub-groups, you need a bunch of *AGGREGATE definitions. This just makes things unwieldy.

*AGGREGATE ag1 ; Primary management base
*AGGREGATE ag2 ; Secondary management base
*AGGREGATE agNoSBI
*AGGREGATE agSBI

Worse, as we see above, there are clearly two distinct divisions in the land base: primary versus secondary, and the presence or absence of SBI data. This could have been much more efficiently handled as two themes with 2 attributes each:

*THEME Landbase
  P Primary
  S Secondary
*THEME SBI data
  Y SBI available
  N SBI not available

There is another theme that apparently is used to assign a scenario name to every development type (DevType) in the model using action aTH7. Woodstock already has a scenario management system, so the utility of this "feature" eludes me. However, it adds unnecessary decision variables to an already large LP matrix because everything needs to be relabeled from NA to something else. If it isn't a real decision, it shouldn't be modeled as an action.

Name That Theme!

When I train new analysts, I like to emphasize the idea that landscape themes should be pure, encapsulating only a single type of data, such as stand status vs rotation vs stand origin vs stocking vs planting stock. Unfortunately, the modeler did the complete opposite here, concatenating 5 different kinds of data into a single code:

NA Unknown or as initialized for all existing forest
CC Clearcut      
OR Overstory Removal
CT Commercial Thin
ST Seed tree cut
SW Shelterwood
SL Selection Cut                 
NA Natural
NP Natural with PCT    
PL Planted
PP Planted with PCT      
TP Tube Plant (nonstocked)                              
RP Regular Plant (nonstocked)  

Those of you who've had me as an instructor probably remember me asking you how many possible DevTypes can be represented by x themes with y attributes each? Here we only have 5 potential themes with 2-6 attributes each, but the model listed 172 valid combinations. Here's a sample:

SW20
CC_PL_NA
CC_PL_ST
OR_NA_SW
OR_NP_NA 
OR_NP_ST 

And, of course, you need to define a bunch of *AGGREGATES to identify the areas represented by the basic data types, like planted versus natural stands. Again, a better approach would be to use themes to explicitly identify silviculture state (cutover, stocked, shelterwood), stand origin (NA, PL), precommercial thin (Y, N), commercial thin (Y,N), planting stock (NA, TP, RP). It takes a single DevType mask to identify any of these things, versus hundreds of lines of code devoted to attribute listings.

Overall Impression


LP Matrix Statistics for the Model

Let's review a few things. First, the number of active DevTypes in this model is 316,572. For those that forgot their introductory training, Woodstock only allows 500,000 active DevTypes. The reason there are so many is a combination of retaining the original harvest unit ID after final harvest with the concatenated attributes we just discussed. There's just an explosion of new development types after each action. It almost appears that the motivation to formulating this model was to minimize the number of themes and to make the model as opaque as possible.

How to fix it? Jettison the whole thing and start over.

Exhibit 2 - 11 Theme Model


No Combinatorial Explosion of Decision Variables



Fast Generation Time with Low Memory Requirements

By all accounts, this model seems to be well formulated because it generates the matrix quickly, and the number of decision variables for existing and regenerated DevTypes is quite reasonable. Unlike Exhibit 1, the landscape section in this model is well thought out, with clearly defined themes and a reasonable number of attributes per theme. Clearcutting is the only harvest action with 3 regeneration options: natural regen, planting or direct seeding.

Transitions Section

Where I see potential problems arising is in the TRANSITIONS section. These are basically the responses to different DevTypes after final harvest using natural regeneration, with the extraneous global attributes stripped out:

*SOURCE BFDOM      
*TARGET BFDOM 35
*TARGET HRDMX 15            
*TARGET SBMX1 15
*TARGET PJMX1 10                
*TARGET CONMX 25                 

*SOURCE HRDMX 
*TARGET HRDMX 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE SBMX1 
*TARGET SBMX1 35
*TARGET PJDOM 10
*TARGET SBDOM 25
*TARGET PJMX1 15
*TARGET CONMX 15          

*SOURCE PJMX1 
*TARGET PJMX1 25
*TARGET PJDOM 30
*TARGET CONMX 15
*TARGET HRDMX 15
*TARGET SBMX1 15                 

*SOURCE CONMX 
*TARGET CONMX 34
*TARGET HRDMX 33
*TARGET SBMX1 33                      

*SOURCE HRDOM 
*TARGET HRDOM 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE PODOM  
*TARGET PODOM 55
*TARGET HRDMX 15
*TARGET HRDOM 15
*TARGET CONMX 15                                              

*SOURCE PJDOM 
*TARGET PJDOM 45
*TARGET PJMX1 20
*TARGET CONMX 25
*TARGET HRDMX 10

*SOURCE SBDOM  
*TARGET SBDOM 35
*TARGET HRDMX 15
*TARGET SBMX1 15
*TARGET PJMX1 10
*TARGET CONMX 25

I suspect most of you are not used to seeing transition matrices that are not 1:1 (100%) but it was one of the earliest features of Woodstock to allow multiple outcomes after an action. The idea is if you harvest a mixed wood stand, it is quite likely that, after harvest, parts of the stand may be dominated by hardwoods and others by softwood species, even though a majority of the area contains both.

The potential problem I see here is that over time, we could see an overall shift from one forest type to another. Now if this is a landscape level model, that may not be overly problematic. That said, even at the landscape level I have trouble believing that all of these possible shifts make sense. Let's change the transitions matrix to a graphic to see what I mean. Within 3 rotations, a single DevType creates 9:


BFDOM Type Traced Through 3 Transitions

I see two issues with these transitions. First, BFDOM type will continually decrease with time because you'll notice there is no transition back to it from other types. Second, do the silvics of these species groups really allow such conversions, say, from black spruce muskeg to upland hardwood dominant?

What is even more alarming is when I see these types of transitions not in a landscape level analysis, but in a spatially explicit harvest schedule. Remember, LP is a deterministic model. These percentages may represent probabilities in a landscape level analysis, but in a harvest scheduling model, they represent exact proportions. It really doesn't make much sense to think a BFDOM stand would transition to five other types after harvesting. What would be the source of the regen for the other species? Poplar regenerates largely from root suckers, but if they were not present in the harvested stand, how would they appear later? For other species, what would be the seed source? Unless there's an adjacent stand with the right species present, it is very unlikely that a new species will just appear.

Harvest scheduling models should represent management intent, not necessarily what might happen on a particular harvest block. We don't plan for plantation failures but when they occur, we mitigate them with additional planting or other silvicultural amendments. Clearly, we can't predict them because we would avoid the outcome by doing something different. When developing a conceptual model, it is important to keep this in mind.

How To Fix It? Be careful using multiple outcomes.

There are issues with multiple transitions that affect the solver's ability to produce an optimal solution. Fractional quantities that get progressively smaller over time can be particularly bad with longer planning horizons. But even if that isn't a problem, you do need to recognize a simplification of the real world in your model. If the silvics or logic of your model is more complex than reality, chances are it isn't going to make sense to your colleagues or management team. What you'll end up with is an efficient but very bad model.

Contact Me!

If your model is behaving badly and you have no idea why, maybe it is time to consider an overhaul. I can work with you to determine requirements, suggest formulation efficiencies, and help you with ways to automate the process going forward. Give me a shout!

Monday, November 11, 2024

Improve Model Efficiency - Part 5

Introduction

Previously, one of my blog posts was about how the outputs you create can cause performance issues in a Woodstock model. Specifically, the use of inventory-based outputs can seriously degrade matrix generation time because of the overhead they incur. Often, these outputs can be avoided through the use of REGIMES. 

I also wrote a blog post recently on conceptual models. This time, I want to continue considering conceptual models by going even more basic. Let's talk about conceptual models and what questions the model is supposed to answer. Poorly conceived models are doomed to be slow from the outset.

What Kind of Model is it?

In the hierarchical planning framework, forest planning models can be strategic, tactical or operational. Strategic planning models are focused on the concepts of sustainability, capital planning, silviculture investment, volume allocation and return on investment. The questions they answer are of the how, what and when variety, such as "when should we harvest?", "what stand should we harvest?" and "how should we manage these stands (thin or not)?", etc. Because it is strategic, many details of management are omitted or unavailable. And yet, clients often make choices that pretend to provide detail or certainty where none is warranted. By focusing on the questions at hand, you can pare back the conceptual model and in turn, improve model performance.

A Bad Conceptual Model

Years ago, a client approached me and said that he was told by a university professor (who shall remain nameless) that his stand-based Woodstock model was too large to solve because it relied on a Model II formulation. The professor said that a Model I formulation would be a better choice because it would not suffer the excessive size problem. Of course, this was nonsense! I explained that a properly formulated model with the same yield tables, actions and timing choices would yield the same solution, regardless of the formulation. Depending on the solver, a Model II formulation could be slower due to the higher number of transfer rows relative to a Model I formulation. Regardless, the problem had to be due to the way the client defined his actions. 

Timing Choices

It didn't take very long to start finding the culprits First, I checked the ACTIONS section:

*ACTION aCC Y Clearcut
*OPERABLE aCC
.MASK() _AGE >= 40

Next, I checked the CONTROL and LIFESPAN sections:

CONTROL
*LENGTH 40 ; 5-year periods

LIFESPAN
.MASK() 500

Nothing ever died in this model because the oldest permitted age class would be 2500 years! Every existing development type (DevType) older than 40 years generated a decision variable in every planning period, even though the existing yield tables stopped at age 150. This resulted in a bunch of decision variables representing very old stands and all with the same volume coefficients. Even if you could solve it, these decision variables represent poor choices and are never chosen.

Transitions

Because this was a stand-based model, I checked the TRANSITIONS section. Sure enough, they retained the StandID after the clearcut. 

TRANSITIONS
*CASE aCC
*SOURCE .MASK(_TH10(EX))
*TARGET .MASK(_TH10(RG)) 100

This prevented Woodstock from pooling acres into one of the regen DevTypes represented by a total of 12 ( yes, twelve!) yield tables. Instead of ending up with 12 DevTypes with multiple ages in each, they ended up with thousands and thousands of DevTypes that overwhelmed the matrix generator. The client asked about *COMPRESSTIME. I said it could eliminate some decision variables for each DevType, but the real problem was the excessive number of development types. By replacing the StandID theme with a generic ID (000000) associated with regen yield tables, the combinatorial explosion of DevTypes was averted. 

ACTIONS 
*ACTION aCC Y Clearcut
*OPERABLE aCC
.MASK() _AGE >= 40 AND _AGE <= 150

TRANSITIONS
*CASE aCC
*SOURCE .MASK(_TH10(EX))
*TARGET .MASK(_TH10(RG),_TH12(000000)) 100

The revised model ran in minutes and was trivial to solve. There was no need for *COMPRESSTIME.

What About the Model I Issue?

The reason the professor's Model I model ran and the Woodstock model didn't wasn't due to model formulation. The professor's model excluded a lot of choices because they were enumerated by hand. So even though the yield tables and DevTypes were the same in both models, the choices represented in the two models were different. After the changes were implemented and the Woodstock model ran, the Woodstock model yielded a slightly better solution. Not because Model II is a better formulation, but because it contained choices that the Model I model lacked. Thus, another reason why you should avoid (when possible) software switches like *COMPRESSTIME that blindly omit every 2nd, 3rd, 4th,... choice. 

Model Results Got You Stumped? Contact Me!

If you have a slow, cumbersome planning model, give me a call. I can review your model and make some suggestions for improvement. If the changes are extensive, we can design a training session around them. No one has more years of Woodstock experience.

Monday, November 4, 2024

Improve Model Efficiency - Part 4

Start with a Good Conceptual Model

A lot of my consulting work involves reviews of client models. Usually, the client is experiencing either slow performance, or there is some aspect of the model that is making it infeasible, and they want me to show them how to fix it. In many cases, the problem isn't so much in the Woodstock syntax as it is in the logic that was used to construct it (i.e., the conceptual model). Woodstock training programs largely focus on syntax and how to get a model working. But other than the problem statement provided in the training handbook, there is little guidance in how to develop an efficient conceptual model. So, let's talk about that.

Years ago, we used to spend a lot more time training analysts on how to model specific kinds of silviculture. Often, we would have students draw flow-charts on the whiteboard that represent the actions, outputs and transitions of their forest models. It was a good exercise because it decomposes a series of activities into discrete events and outcomes. However, new analysts usually struggle with the process, and flow-charts can end up looking like this:


Poorly-conceived conceptual model

Consider Time

Every Woodstock model has a planning horizon divided into planning periods. The length of a planning period x the number of planning periods = the planning horizon. But how do you determine the length of the planning period? Usually, it is based on the desired resolution for the harvest level, or annual allowable cut (AAC). For most of my clients building strategic planning models, a planning period is 1 year in length. The model results then report harvests, expenses and revenues on an annual basis.

Another consideration, however, are the yield tables available. In more northern locations, tree growth is sufficiently slow that annual estimates are unreliable and so growth is incremented in 5- or 10-year time steps. While you can still use annual planning periods with these yields, you need to rely on some form of linear interpolation or spline-fitting functions which introduce biases to the estimates. In my opinion, it is best to match planning period lengths to the natural increments of your growth model.

Consider Your Actions

Once you have settled on your planning period length, next you need to consider the actions in your model. The first consideration is whether the action is a choice. Clearly, a final harvest, such as a clearcut, is a choice. Do I need a different action if I'm harvesting in conifer stands versus deciduous stands? That depends on your reporting needs. Differentiating harvest costs by species group is easily handled with different outputs and doesn't require two harvest actions. However, suppose a product is differentiated by how it is harvested (e.g., tree length operation versus cut-to-length operation where there is a price differential). In this case, you WILL need to have different final harvest actions.

Reforestation is almost always a policy requirement, but whether planting is a choice depends on the existence of alternatives. If you are using 5-year or decadal planning periods, many of the reforestation actions that occur in the forest can be collapsed into a single decision variable. Defining a planting action to occur at _AGE = 0 is unnecessary. You could just as easily consider it part of the clearcut action and assume the transition to the planted regen condition.

If you always plant a stand following harvest, regardless of different site preparation steps or planting densities, you may not need a decision variable for planting. Instead, you could rely on REGIMES. Many of my clients have different treatment and cost structures for different site conditions, but these are all handled through prescriptions in the REGIMES section. The important thing to remember is that there is a single decision variable for each alternative. 

Why is this important? Every action results in two outcomes: either the action is performed, or it isn't. If you model something that is required through actions, you need to add constraints to force the outcome you want. This is very inefficient because you add to the number of decision variables and non-zero elements, and constraints add rows.

Consider Your Transitions

The trickiest part of any conceptual model is predicting how a future stand will behave following a final harvest. If everything gets planted to a single species, it is straightforward with a single 100% transition to a regenerated development type (DevType). But what about plantation failures? Shouldn't we model those? You may have a good handle on how often plantation failures occur, but I'm betting you can't predict which harvest areas will fail. Why? Because if you could predict them, you'd change your site preparation methods to avoid the failure.

Instead, transitions should focus on outcomes that reflect your management intent and that you can predict with certainty. If 2% of plantations fail on average, you can account for the area and cost to correct them with in-fill planting, without a transition to a failed state that would then require an action to correct it. Similarly, some stands regenerate in an over-stocked condition, and require precommercial thinning (PCT). Again, you can account for the area and cost of PCT without explicitly recognizing the transition to an overstocked state and the subsequent PCT to correct it. Your management intent is not to produce defective stands. You shouldn't bother modeling things that you do not have adequate data for.

A large number of my clients build "stand-based models", which feature the stand-ID from inventory as a theme in the model. But stand-based is only relevant for existing stands - once they are final harvested, they transition to a stratum based on a handful of stand characteristics like site quality, species, etc. But time and time again, I encounter models where they do not void the stand-ID after the final harvest in the TRANSITIONS section. This results in a combinatorial explosion of future decision variables that are completely unnecessary. The example below is from a model with a lot of commercial thinning options.


Column generation without collapsed stand-ID

For future yields, the stand-ID provides nothing to the objective function or constraints - it just increases the number of DevTypes that Woodstock has to keep track of. If you collapse the stand-ID after final harvest, you'll get the same answer in far less time. The example below shows that about 20% of the decision variables in later periods can be eliminated by collapsing the stand-ID.


Column generation with collapsed stand-ID

Yes, I've heard the arguments many times that you NEED the stand-ID for future stands, so you know the exact location of harvests 40 years into the future. Forgive my cynicism, but I doubt most of you follow a plan exactly for the first year, never mind the first decade or more.

Discussion

If you are running a model that you developed years ago, or worse, you inherited from someone else years ago, and that model is slow and cumbersome, maybe it is time to toss the baby with the bath water and start over. Starting fresh does require time, but how much time are you wasting waiting for a model that generates unnecessary decision variables, has transitions that are impossible to trace, and so on. 

Afraid of tossing the baby out with the bath water? Contact me!

If you need help revamping your model, or looking to start over from scratch, I'm more than happy to help. Give me a shout! Nobody knows more about Woodstock modeling!


Why are MIP models difficult to solve (or not)?

Introduction I recently joined a conversation about why a mixed-integer programming (MIP) problem is so much harder to solve than a regular ...