Monday, November 25, 2024

Is Your Conceptual Model Logically Sound?

Introduction

In the last few posts, I've been talking about flaws in your conceptual model that can make your model run slowly (i.e., it is an inefficient use of your time). Making poor choices about landscape themes can blow up the size of your model dramatically and make it nigh on impossible to properly do QA (or even figure out what is happening).

However, it is also possible to have a reasonable implementation of a flawed conceptual model. This is a model that, while semantically and syntactically correct, doesn't pass the smell test for how the real world works. For this exercise, I'm going to present snippets from real client models that illustrate both kinds of problems, without disclosing where they came from.

Exhibit 1 - 9 Theme Model


Exponential Growth in Columns


Almost 12 GB of RAM required

We see by the exponential growth in the number of columns by planning period, that there's something undesirable going on. For a model with only 9 themes and 20 planning periods, that is a very large number of columns that requires a lot of computer memory (almost 12 GB). Where to start?

Landscape Woes

The model identifies harvest settings, which makes this essentially a stand-based model. They used _INDEXing in the LANDSCAPE section to associate net area, and harvest method proportions to each unit, rather than using themes and the AREAS section. 

1431 _INDEX(yOpArea=56.0,yGroundProp=0.00,yLineProp=1.00,yHeliProp=0.00)
1433 _INDEX(yOpArea=96.9,yGroundProp=0.00,yLineProp=0.28,yHeliProp=0.72)
1434 _INDEX(yOpArea=62.3,yGroundProp=0.05,yLineProp=0.33,yHeliProp=0.62)

I dislike this approach because you cram too much information into a single attribute (stand ID), and so when you want to identify sub-groups, you need a bunch of *AGGREGATE definitions. This just makes things unwieldy.

*AGGREGATE ag1 ; Primary management base
*AGGREGATE ag2 ; Secondary management base
*AGGREGATE agNoSBI
*AGGREGATE agSBI

Worse, as we see above, there are clearly two distinct divisions in the land base: primary versus secondary, and the presence or absence of SBI data. This could have been much more efficiently handled as two themes with 2 attributes each:

*THEME Landbase
  P Primary
  S Secondary
*THEME SBI data
  Y SBI available
  N SBI not available

There is another theme that apparently is used to assign a scenario name to every development type (DevType) in the model using action aTH7. Woodstock already has a scenario management system, so the utility of this "feature" eludes me. However, it adds unnecessary decision variables to an already large LP matrix because everything needs to be relabeled from NA to something else. If it isn't a real decision, it shouldn't be modeled as an action.

Name That Theme!

When I train new analysts, I like to emphasize the idea that landscape themes should be pure, encapsulating only a single type of data, such as stand status vs rotation vs stand origin vs stocking vs planting stock. Unfortunately, the modeler did the complete opposite here, concatenating 5 different kinds of data into a single code:

NA Unknown or as initialized for all existing forest
CC Clearcut      
OR Overstory Removal
CT Commercial Thin
ST Seed tree cut
SW Shelterwood
SL Selection Cut                 
NA Natural
NP Natural with PCT    
PL Planted
PP Planted with PCT      
TP Tube Plant (nonstocked)                              
RP Regular Plant (nonstocked)  

Those of you who've had me as an instructor probably remember me asking you how many possible DevTypes can be represented by x themes with y attributes each? Here we only have 5 potential themes with 2-6 attributes each, but the model listed 172 valid combinations. Here's a sample:

SW20
CC_PL_NA
CC_PL_ST
OR_NA_SW
OR_NP_NA 
OR_NP_ST 

And, of course, you need to define a bunch of *AGGREGATES to identify the areas represented by the basic data types, like planted versus natural stands. Again, a better approach would be to use themes to explicitly identify silviculture state (cutover, stocked, shelterwood), stand origin (NA, PL), precommercial thin (Y, N), commercial thin (Y,N), planting stock (NA, TP, RP). It takes a single DevType mask to identify any of these things, versus hundreds of lines of code devoted to attribute listings.

Overall Impression


LP Matrix Statistics for the Model

Let's review a few things. First, the number of active DevTypes in this model is 316,572. For those that forgot their introductory training, Woodstock only allows 500,000 active DevTypes. The reason there are so many is a combination of retaining the original harvest unit ID after final harvest with the concatenated attributes we just discussed. There's just an explosion of new development types after each action. It almost appears that the motivation to formulating this model was to minimize the number of themes and to make the model as opaque as possible.

How to fix it? Jettison the whole thing and start over.

Exhibit 2 - 11 Theme Model


No Combinatorial Explosion of Decision Variables



Fast Generation Time with Low Memory Requirements

By all accounts, this model seems to be well formulated because it generates the matrix quickly, and the number of decision variables for existing and regenerated DevTypes is quite reasonable. Unlike Exhibit 1, the landscape section in this model is well thought out, with clearly defined themes and a reasonable number of attributes per theme. Clearcutting is the only harvest action with 3 regeneration options: natural regen, planting or direct seeding.

Transitions Section

Where I see potential problems arising is in the TRANSITIONS section. These are basically the responses to different DevTypes after final harvest using natural regeneration, with the extraneous global attributes stripped out:

*SOURCE BFDOM      
*TARGET BFDOM 35
*TARGET HRDMX 15            
*TARGET SBMX1 15
*TARGET PJMX1 10                
*TARGET CONMX 25                 

*SOURCE HRDMX 
*TARGET HRDMX 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE SBMX1 
*TARGET SBMX1 35
*TARGET PJDOM 10
*TARGET SBDOM 25
*TARGET PJMX1 15
*TARGET CONMX 15          

*SOURCE PJMX1 
*TARGET PJMX1 25
*TARGET PJDOM 30
*TARGET CONMX 15
*TARGET HRDMX 15
*TARGET SBMX1 15                 

*SOURCE CONMX 
*TARGET CONMX 34
*TARGET HRDMX 33
*TARGET SBMX1 33                      

*SOURCE HRDOM 
*TARGET HRDOM 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE PODOM  
*TARGET PODOM 55
*TARGET HRDMX 15
*TARGET HRDOM 15
*TARGET CONMX 15                                              

*SOURCE PJDOM 
*TARGET PJDOM 45
*TARGET PJMX1 20
*TARGET CONMX 25
*TARGET HRDMX 10

*SOURCE SBDOM  
*TARGET SBDOM 35
*TARGET HRDMX 15
*TARGET SBMX1 15
*TARGET PJMX1 10
*TARGET CONMX 25

I suspect most of you are not used to seeing transition matrices that are not 1:1 (100%) but it was one of the earliest features of Woodstock to allow multiple outcomes after an action. The idea is if you harvest a mixed wood stand, it is quite likely that, after harvest, parts of the stand may be dominated by hardwoods and others by softwood species, even though a majority of the area contains both.

The potential problem I see here is that over time, we could see an overall shift from one forest type to another. Now if this is a landscape level model, that may not be overly problematic. That said, even at the landscape level I have trouble believing that all of these possible shifts make sense. Let's change the transitions matrix to a graphic to see what I mean. Within 3 rotations, a single DevType creates 9:


BFDOM Type Traced Through 3 Transitions

I see two issues with these transitions. First, BFDOM type will continually decrease with time because you'll notice there is no transition back to it from other types. Second, do the silvics of these species groups really allow such conversions, say, from black spruce muskeg to upland hardwood dominant?

What is even more alarming is when I see these types of transitions not in a landscape level analysis, but in a spatially explicit harvest schedule. Remember, LP is a deterministic model. These percentages may represent probabilities in a landscape level analysis, but in a harvest scheduling model, they represent exact proportions. It really doesn't make much sense to think a BFDOM stand would transition to five other types after harvesting. What would be the source of the regen for the other species? Poplar regenerates largely from root suckers, but if they were not present in the harvested stand, how would they appear later? For other species, what would be the seed source? Unless there's an adjacent stand with the right species present, it is very unlikely that a new species will just appear.

Harvest scheduling models should represent management intent, not necessarily what might happen on a particular harvest block. We don't plan for plantation failures but when they occur, we mitigate them with additional planting or other silvicultural amendments. Clearly, we can't predict them because we would avoid the outcome by doing something different. When developing a conceptual model, it is important to keep this in mind.

How To Fix It? Be careful using multiple outcomes.

There are issues with multiple transitions that affect the solver's ability to produce an optimal solution. Fractional quantities that get progressively smaller over time can be particularly bad with longer planning horizons. But even if that isn't a problem, you do need to recognize a simplification of the real world in your model. If the silvics or logic of your model is more complex than reality, chances are it isn't going to make sense to your colleagues or management team. What you'll end up with is an efficient but very bad model.

Contact Me!

If your model is behaving badly and you have no idea why, maybe it is time to consider an overhaul. I can work with you to determine requirements, suggest formulation efficiencies, and help you with ways to automate the process going forward. Give me a shout!

No comments:

Post a Comment

Why are MIP models difficult to solve (or not)?

Introduction I recently joined a conversation about why a mixed-integer programming (MIP) problem is so much harder to solve than a regular ...