Thursday, December 5, 2024

Why are MIP models difficult to solve (or not)?

Introduction

I recently joined a conversation about why a mixed-integer programming (MIP) problem is so much harder to solve than a regular linear programming (LP) problem. Clearly, it is more problematic than just rounding the fractional quantities up or down to the nearest integer. However, it isn't just the variability in solution time that many find confounding - the solutions themselves are often counter-intuitive. To be honest, the most valuable solutions from modeling are the counter-intuitive ones because it means you've learned something new and unexpected from your model. And to me, that's the whole point of modeling.

MIP models do not have to be large to be difficult to solve. In fact, some of the most difficult MIP problems known are trivially small. For this discussion, however, let's stick to forestry and concepts we are familiar with.

Districting

8 Districts

Suppose I have 8 districts in my forest, and I want to know if activity is occurring in each district. I want my model to report that using a binary variable, where 0 = no activity, and 1 = activity. The problem is fairly easy to formulate as a MIP: set up binary variables that each represent harvest in a district. 
*VARIABLE
  vbCut(D1) _BINARYARRAY
  vbCut(D2) _BINARYARRAY
           .
           .
           .
  vbCut(D8) _BINARYARRAY

To make the binary variable useful, I need to relate it to an output in the model (harvest area). I need a constraint that will force the binary variable to equal 1 if I harvest any amount of area in the district. In the general case, I write the constraint as:

oaHARV(xx) - bigM * vbCut(xx) <= 0 1.._LENGTH

In this case, I use a theme-based output where district is one of the themes. The bigM variable has to be greater than any amount of harvest area in a planning period.  I don't know the amount of harvest in any period, but I do know that if I harvested every acre in a district, bigM would have to be that large. In reality, since the districts are different sizes, I can set bigM slightly bigger than the largest district:

FOREACH xx IN (_TH11)
oaHARV(xx) - 15,000 * vbCut(xx) <= 0 1.._LENGTH
ENDFOR

By itself, this doesn't do very much. The binary variables could take on values of 0 or 1 and it wouldn't matter. I need an additional constraint to ensure that binary variables only take on values of 1 when harvesting occurs in the district.

vbCut(*) <= 8 1.._LENGTH

Now, the model will correctly report the districts being operated in each period. There is no effect on the solution because the same choices exist as in the LP relaxation (binary requirements relaxed). Interestingly, the solver took longer to prove optimality for the LP than the MIP.

; LP relaxation
; Elapsed time for solver 0.36s
; GNU reported :
;
; Status:     OPTIMAL
; Objective:  OBJ1MIN = -309600006.1 (MINimum)
; Elapsed time for solver 0.23 s
; GNU reported :
;
; Status:     INTEGER OPTIMAL
; Objective:  OBJ1MIN = -309600006.1 (MINimum)

No Additional Constraints, Just Change RHS

When you change the RHS values in a regular LP model, there usually isn't a big difference in solution time. For example, if you limit the total acres of commercial thinning to 500 acres and then solve it with a limit of 450, you don't see a big difference in solution time. In a MIP formulation, such a change can effect a huge change in solution times. In the following examples, I reduce the number of open districts in each period from (vbCut(*) <= 8) to 7, 6, 4, 2 and finally 1 district per period, just by changing the RHS value. All were solved with the HiGHS solver to a 0.1% gap:



Notice that the constraint on open districts really only starts to bite when 3 or more districts become unavailable. Let's look at which districts are open in the first 10 planning periods for the 6, 4 and 2 districts scenarios:


Not every district was accessed in the 4-district scenario. Presumably, enough harvest volume was available in period 3 that only 3 districts were opened. So, what do you think happened in the 1-district scenario?


Were you anticipating any districts to never be opened? Clearly, the even-flow constraint became limiting when only 1 district could be opened at a time. By opening the magenta or red districts, the objective function value would be worse than leaving them unharvested. Even though the harvest choices within districts are continuous variables, it isn't as simple as choosing the largest or smallest districts to balance the harvest flows. (Neither of the districts left unharvested were the smallest nor the largest districts.)

Additional Constraints

Suppose that in addition to limiting the number of open districts, there was also a policy requirement that you cannot enter the same district in the next period. This type of requirement is common in places like Canada (caribou ranges) but I also used it in a project in California. Basically, I need to restrict temporally adjacent binary variables from summing to more than 1:

FOREACH xx IN (_TH11)
vbCut(xx) + vbCut(xx)[-1] <= 1 2.._LENGTH
ENDFOR

If vbCut(xx) = 1 in period 2, then vbCut(xx) in period (2-1) = 0, and this progression continues until the last planning period. In period 3, vbCut(xx) can be 1 again, but could also be 0, allowing vbCut(xx) to be 1 in period 4. Thus, we have our policy requirement.

If we apply this constraint to the 8, 4 and 2 districts scenarios, we get the following results:

Adding the sequential access constraints to the scenarios dramatically increased solution times but on a percentage basis, the biggest impact was on the easiest scenario to solve, and the lowest on the most difficult. In terms of the sequence of entries into districts, we observe these results:


8-District Scenario


4-District Scenario


2-District Scenario

You'll note that the constraints would have no impact on the single district scenario, since it never exhibited any sequential access to districts originally. Also, note that in the original 2-district scenario, there was only one instance where a district was entered sequentially. However, when you compare district entries from that scenario with the no-sequential entry scenario, there are multiple changes to assigned planning periods:


> Sequential OK    < Sequential Not OK

Discussion

I hope this post has been successful in dispelling any myths about MIP vs LP. With MIP, we see that:
  • It isn't just "LP with a few integer/binary variables". It is a different beast.
  • A problem can be made much more difficult to solve by just changing a RHS value.
  • Adding constraints can increase solution time by a great deal without greatly affecting the objective function value
  • Even small problems can be hard.
  • Integer solutions can be quite counter-intuitive

Contact Me!

If you'd like to explore all-or-nothing decisions in your planning models, give me a shout so we can discuss it further. It may require that you invest in a better solver, but it will certainly test your understanding of outputs and constraints.

Monday, November 25, 2024

Is Your Conceptual Model Logically Sound?

Introduction

In the last few posts, I've been talking about flaws in your conceptual model that can make your model run slowly (i.e., it is an inefficient use of your time). Making poor choices about landscape themes can blow up the size of your model dramatically and make it nigh on impossible to properly do QA (or even figure out what is happening).

However, it is also possible to have a reasonable implementation of a flawed conceptual model. This is a model that, while semantically and syntactically correct, doesn't pass the smell test for how the real world works. For this exercise, I'm going to present snippets from real client models that illustrate both kinds of problems, without disclosing where they came from.

Exhibit 1 - 9 Theme Model


Exponential Growth in Columns


Almost 12 GB of RAM required

We see by the exponential growth in the number of columns by planning period, that there's something undesirable going on. For a model with only 9 themes and 20 planning periods, that is a very large number of columns that requires a lot of computer memory (almost 12 GB). Where to start?

Landscape Woes

The model identifies harvest settings, which makes this essentially a stand-based model. They used _INDEXing in the LANDSCAPE section to associate net area, and harvest method proportions to each unit, rather than using themes and the AREAS section. 

1431 _INDEX(yOpArea=56.0,yGroundProp=0.00,yLineProp=1.00,yHeliProp=0.00)
1433 _INDEX(yOpArea=96.9,yGroundProp=0.00,yLineProp=0.28,yHeliProp=0.72)
1434 _INDEX(yOpArea=62.3,yGroundProp=0.05,yLineProp=0.33,yHeliProp=0.62)

I dislike this approach because you cram too much information into a single attribute (stand ID), and so when you want to identify sub-groups, you need a bunch of *AGGREGATE definitions. This just makes things unwieldy.

*AGGREGATE ag1 ; Primary management base
*AGGREGATE ag2 ; Secondary management base
*AGGREGATE agNoSBI
*AGGREGATE agSBI

Worse, as we see above, there are clearly two distinct divisions in the land base: primary versus secondary, and the presence or absence of SBI data. This could have been much more efficiently handled as two themes with 2 attributes each:

*THEME Landbase
  P Primary
  S Secondary
*THEME SBI data
  Y SBI available
  N SBI not available

There is another theme that apparently is used to assign a scenario name to every development type (DevType) in the model using action aTH7. Woodstock already has a scenario management system, so the utility of this "feature" eludes me. However, it adds unnecessary decision variables to an already large LP matrix because everything needs to be relabeled from NA to something else. If it isn't a real decision, it shouldn't be modeled as an action.

Name That Theme!

When I train new analysts, I like to emphasize the idea that landscape themes should be pure, encapsulating only a single type of data, such as stand status vs rotation vs stand origin vs stocking vs planting stock. Unfortunately, the modeler did the complete opposite here, concatenating 5 different kinds of data into a single code:

NA Unknown or as initialized for all existing forest
CC Clearcut      
OR Overstory Removal
CT Commercial Thin
ST Seed tree cut
SW Shelterwood
SL Selection Cut                 
NA Natural
NP Natural with PCT    
PL Planted
PP Planted with PCT      
TP Tube Plant (nonstocked)                              
RP Regular Plant (nonstocked)  

Those of you who've had me as an instructor probably remember me asking you how many possible DevTypes can be represented by x themes with y attributes each? Here we only have 5 potential themes with 2-6 attributes each, but the model listed 172 valid combinations. Here's a sample:

SW20
CC_PL_NA
CC_PL_ST
OR_NA_SW
OR_NP_NA 
OR_NP_ST 

And, of course, you need to define a bunch of *AGGREGATES to identify the areas represented by the basic data types, like planted versus natural stands. Again, a better approach would be to use themes to explicitly identify silviculture state (cutover, stocked, shelterwood), stand origin (NA, PL), precommercial thin (Y, N), commercial thin (Y,N), planting stock (NA, TP, RP). It takes a single DevType mask to identify any of these things, versus hundreds of lines of code devoted to attribute listings.

Overall Impression


LP Matrix Statistics for the Model

Let's review a few things. First, the number of active DevTypes in this model is 316,572. For those that forgot their introductory training, Woodstock only allows 500,000 active DevTypes. The reason there are so many is a combination of retaining the original harvest unit ID after final harvest with the concatenated attributes we just discussed. There's just an explosion of new development types after each action. It almost appears that the motivation to formulating this model was to minimize the number of themes and to make the model as opaque as possible.

How to fix it? Jettison the whole thing and start over.

Exhibit 2 - 11 Theme Model


No Combinatorial Explosion of Decision Variables



Fast Generation Time with Low Memory Requirements

By all accounts, this model seems to be well formulated because it generates the matrix quickly, and the number of decision variables for existing and regenerated DevTypes is quite reasonable. Unlike Exhibit 1, the landscape section in this model is well thought out, with clearly defined themes and a reasonable number of attributes per theme. Clearcutting is the only harvest action with 3 regeneration options: natural regen, planting or direct seeding.

Transitions Section

Where I see potential problems arising is in the TRANSITIONS section. These are basically the responses to different DevTypes after final harvest using natural regeneration, with the extraneous global attributes stripped out:

*SOURCE BFDOM      
*TARGET BFDOM 35
*TARGET HRDMX 15            
*TARGET SBMX1 15
*TARGET PJMX1 10                
*TARGET CONMX 25                 

*SOURCE HRDMX 
*TARGET HRDMX 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE SBMX1 
*TARGET SBMX1 35
*TARGET PJDOM 10
*TARGET SBDOM 25
*TARGET PJMX1 15
*TARGET CONMX 15          

*SOURCE PJMX1 
*TARGET PJMX1 25
*TARGET PJDOM 30
*TARGET CONMX 15
*TARGET HRDMX 15
*TARGET SBMX1 15                 

*SOURCE CONMX 
*TARGET CONMX 34
*TARGET HRDMX 33
*TARGET SBMX1 33                      

*SOURCE HRDOM 
*TARGET HRDOM 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE PODOM  
*TARGET PODOM 55
*TARGET HRDMX 15
*TARGET HRDOM 15
*TARGET CONMX 15                                              

*SOURCE PJDOM 
*TARGET PJDOM 45
*TARGET PJMX1 20
*TARGET CONMX 25
*TARGET HRDMX 10

*SOURCE SBDOM  
*TARGET SBDOM 35
*TARGET HRDMX 15
*TARGET SBMX1 15
*TARGET PJMX1 10
*TARGET CONMX 25

I suspect most of you are not used to seeing transition matrices that are not 1:1 (100%) but it was one of the earliest features of Woodstock to allow multiple outcomes after an action. The idea is if you harvest a mixed wood stand, it is quite likely that, after harvest, parts of the stand may be dominated by hardwoods and others by softwood species, even though a majority of the area contains both.

The potential problem I see here is that over time, we could see an overall shift from one forest type to another. Now if this is a landscape level model, that may not be overly problematic. That said, even at the landscape level I have trouble believing that all of these possible shifts make sense. Let's change the transitions matrix to a graphic to see what I mean. Within 3 rotations, a single DevType creates 9:


BFDOM Type Traced Through 3 Transitions

I see two issues with these transitions. First, BFDOM type will continually decrease with time because you'll notice there is no transition back to it from other types. Second, do the silvics of these species groups really allow such conversions, say, from black spruce muskeg to upland hardwood dominant?

What is even more alarming is when I see these types of transitions not in a landscape level analysis, but in a spatially explicit harvest schedule. Remember, LP is a deterministic model. These percentages may represent probabilities in a landscape level analysis, but in a harvest scheduling model, they represent exact proportions. It really doesn't make much sense to think a BFDOM stand would transition to five other types after harvesting. What would be the source of the regen for the other species? Poplar regenerates largely from root suckers, but if they were not present in the harvested stand, how would they appear later? For other species, what would be the seed source? Unless there's an adjacent stand with the right species present, it is very unlikely that a new species will just appear.

Harvest scheduling models should represent management intent, not necessarily what might happen on a particular harvest block. We don't plan for plantation failures but when they occur, we mitigate them with additional planting or other silvicultural amendments. Clearly, we can't predict them because we would avoid the outcome by doing something different. When developing a conceptual model, it is important to keep this in mind.

How To Fix It? Be careful using multiple outcomes.

There are issues with multiple transitions that affect the solver's ability to produce an optimal solution. Fractional quantities that get progressively smaller over time can be particularly bad with longer planning horizons. But even if that isn't a problem, you do need to recognize a simplification of the real world in your model. If the silvics or logic of your model is more complex than reality, chances are it isn't going to make sense to your colleagues or management team. What you'll end up with is an efficient but very bad model.

Contact Me!

If your model is behaving badly and you have no idea why, maybe it is time to consider an overhaul. I can work with you to determine requirements, suggest formulation efficiencies, and help you with ways to automate the process going forward. Give me a shout!

Monday, November 11, 2024

Improve Model Efficiency - Part 5

Introduction

Previously, one of my blog posts was about how the outputs you create can cause performance issues in a Woodstock model. Specifically, the use of inventory-based outputs can seriously degrade matrix generation time because of the overhead they incur. Often, these outputs can be avoided through the use of REGIMES. 

I also wrote a blog post recently on conceptual models. This time, I want to continue considering conceptual models by going even more basic. Let's talk about conceptual models and what questions the model is supposed to answer. Poorly conceived models are doomed to be slow from the outset.

What Kind of Model is it?

In the hierarchical planning framework, forest planning models can be strategic, tactical or operational. Strategic planning models are focused on the concepts of sustainability, capital planning, silviculture investment, volume allocation and return on investment. The questions they answer are of the how, what and when variety, such as "when should we harvest?", "what stand should we harvest?" and "how should we manage these stands (thin or not)?", etc. Because it is strategic, many details of management are omitted or unavailable. And yet, clients often make choices that pretend to provide detail or certainty where none is warranted. By focusing on the questions at hand, you can pare back the conceptual model and in turn, improve model performance.

A Bad Conceptual Model

Years ago, a client approached me and said that he was told by a university professor (who shall remain nameless) that his stand-based Woodstock model was too large to solve because it relied on a Model II formulation. The professor said that a Model I formulation would be a better choice because it would not suffer the excessive size problem. Of course, this was nonsense! I explained that a properly formulated model with the same yield tables, actions and timing choices would yield the same solution, regardless of the formulation. Depending on the solver, a Model II formulation could be slower due to the higher number of transfer rows relative to a Model I formulation. Regardless, the problem had to be due to the way the client defined his actions. 

Timing Choices

It didn't take very long to start finding the culprits First, I checked the ACTIONS section:

*ACTION aCC Y Clearcut
*OPERABLE aCC
.MASK() _AGE >= 40

Next, I checked the CONTROL and LIFESPAN sections:

CONTROL
*LENGTH 40 ; 5-year periods

LIFESPAN
.MASK() 500

Nothing ever died in this model because the oldest permitted age class would be 2500 years! Every existing development type (DevType) older than 40 years generated a decision variable in every planning period, even though the existing yield tables stopped at age 150. This resulted in a bunch of decision variables representing very old stands and all with the same volume coefficients. Even if you could solve it, these decision variables represent poor choices and are never chosen.

Transitions

Because this was a stand-based model, I checked the TRANSITIONS section. Sure enough, they retained the StandID after the clearcut. 

TRANSITIONS
*CASE aCC
*SOURCE .MASK(_TH10(EX))
*TARGET .MASK(_TH10(RG)) 100

This prevented Woodstock from pooling acres into one of the regen DevTypes represented by a total of 12 ( yes, twelve!) yield tables. Instead of ending up with 12 DevTypes with multiple ages in each, they ended up with thousands and thousands of DevTypes that overwhelmed the matrix generator. The client asked about *COMPRESSTIME. I said it could eliminate some decision variables for each DevType, but the real problem was the excessive number of development types. By replacing the StandID theme with a generic ID (000000) associated with regen yield tables, the combinatorial explosion of DevTypes was averted. 

ACTIONS 
*ACTION aCC Y Clearcut
*OPERABLE aCC
.MASK() _AGE >= 40 AND _AGE <= 150

TRANSITIONS
*CASE aCC
*SOURCE .MASK(_TH10(EX))
*TARGET .MASK(_TH10(RG),_TH12(000000)) 100

The revised model ran in minutes and was trivial to solve. There was no need for *COMPRESSTIME.

What About the Model I Issue?

The reason the professor's Model I model ran and the Woodstock model didn't wasn't due to model formulation. The professor's model excluded a lot of choices because they were enumerated by hand. So even though the yield tables and DevTypes were the same in both models, the choices represented in the two models were different. After the changes were implemented and the Woodstock model ran, the Woodstock model yielded a slightly better solution. Not because Model II is a better formulation, but because it contained choices that the Model I model lacked. Thus, another reason why you should avoid (when possible) software switches like *COMPRESSTIME that blindly omit every 2nd, 3rd, 4th,... choice. 

Model Results Got You Stumped? Contact Me!

If you have a slow, cumbersome planning model, give me a call. I can review your model and make some suggestions for improvement. If the changes are extensive, we can design a training session around them. No one has more years of Woodstock experience.

Monday, November 4, 2024

Improve Model Efficiency - Part 4

Start with a Good Conceptual Model

A lot of my consulting work involves reviews of client models. Usually, the client is experiencing either slow performance, or there is some aspect of the model that is making it infeasible, and they want me to show them how to fix it. In many cases, the problem isn't so much in the Woodstock syntax as it is in the logic that was used to construct it (i.e., the conceptual model). Woodstock training programs largely focus on syntax and how to get a model working. But other than the problem statement provided in the training handbook, there is little guidance in how to develop an efficient conceptual model. So, let's talk about that.

Years ago, we used to spend a lot more time training analysts on how to model specific kinds of silviculture. Often, we would have students draw flow-charts on the whiteboard that represent the actions, outputs and transitions of their forest models. It was a good exercise because it decomposes a series of activities into discrete events and outcomes. However, new analysts usually struggle with the process, and flow-charts can end up looking like this:


Poorly-conceived conceptual model

Consider Time

Every Woodstock model has a planning horizon divided into planning periods. The length of a planning period x the number of planning periods = the planning horizon. But how do you determine the length of the planning period? Usually, it is based on the desired resolution for the harvest level, or annual allowable cut (AAC). For most of my clients building strategic planning models, a planning period is 1 year in length. The model results then report harvests, expenses and revenues on an annual basis.

Another consideration, however, are the yield tables available. In more northern locations, tree growth is sufficiently slow that annual estimates are unreliable and so growth is incremented in 5- or 10-year time steps. While you can still use annual planning periods with these yields, you need to rely on some form of linear interpolation or spline-fitting functions which introduce biases to the estimates. In my opinion, it is best to match planning period lengths to the natural increments of your growth model.

Consider Your Actions

Once you have settled on your planning period length, next you need to consider the actions in your model. The first consideration is whether the action is a choice. Clearly, a final harvest, such as a clearcut, is a choice. Do I need a different action if I'm harvesting in conifer stands versus deciduous stands? That depends on your reporting needs. Differentiating harvest costs by species group is easily handled with different outputs and doesn't require two harvest actions. However, suppose a product is differentiated by how it is harvested (e.g., tree length operation versus cut-to-length operation where there is a price differential). In this case, you WILL need to have different final harvest actions.

Reforestation is almost always a policy requirement, but whether planting is a choice depends on the existence of alternatives. If you are using 5-year or decadal planning periods, many of the reforestation actions that occur in the forest can be collapsed into a single decision variable. Defining a planting action to occur at _AGE = 0 is unnecessary. You could just as easily consider it part of the clearcut action and assume the transition to the planted regen condition.

If you always plant a stand following harvest, regardless of different site preparation steps or planting densities, you may not need a decision variable for planting. Instead, you could rely on REGIMES. Many of my clients have different treatment and cost structures for different site conditions, but these are all handled through prescriptions in the REGIMES section. The important thing to remember is that there is a single decision variable for each alternative. 

Why is this important? Every action results in two outcomes: either the action is performed, or it isn't. If you model something that is required through actions, you need to add constraints to force the outcome you want. This is very inefficient because you add to the number of decision variables and non-zero elements, and constraints add rows.

Consider Your Transitions

The trickiest part of any conceptual model is predicting how a future stand will behave following a final harvest. If everything gets planted to a single species, it is straightforward with a single 100% transition to a regenerated development type (DevType). But what about plantation failures? Shouldn't we model those? You may have a good handle on how often plantation failures occur, but I'm betting you can't predict which harvest areas will fail. Why? Because if you could predict them, you'd change your site preparation methods to avoid the failure.

Instead, transitions should focus on outcomes that reflect your management intent and that you can predict with certainty. If 2% of plantations fail on average, you can account for the area and cost to correct them with in-fill planting, without a transition to a failed state that would then require an action to correct it. Similarly, some stands regenerate in an over-stocked condition, and require precommercial thinning (PCT). Again, you can account for the area and cost of PCT without explicitly recognizing the transition to an overstocked state and the subsequent PCT to correct it. Your management intent is not to produce defective stands. You shouldn't bother modeling things that you do not have adequate data for.

A large number of my clients build "stand-based models", which feature the stand-ID from inventory as a theme in the model. But stand-based is only relevant for existing stands - once they are final harvested, they transition to a stratum based on a handful of stand characteristics like site quality, species, etc. But time and time again, I encounter models where they do not void the stand-ID after the final harvest in the TRANSITIONS section. This results in a combinatorial explosion of future decision variables that are completely unnecessary. The example below is from a model with a lot of commercial thinning options.


Column generation without collapsed stand-ID

For future yields, the stand-ID provides nothing to the objective function or constraints - it just increases the number of DevTypes that Woodstock has to keep track of. If you collapse the stand-ID after final harvest, you'll get the same answer in far less time. The example below shows that about 20% of the decision variables in later periods can be eliminated by collapsing the stand-ID.


Column generation with collapsed stand-ID

Yes, I've heard the arguments many times that you NEED the stand-ID for future stands, so you know the exact location of harvests 40 years into the future. Forgive my cynicism, but I doubt most of you follow a plan exactly for the first year, never mind the first decade or more.

Discussion

If you are running a model that you developed years ago, or worse, you inherited from someone else years ago, and that model is slow and cumbersome, maybe it is time to toss the baby with the bath water and start over. Starting fresh does require time, but how much time are you wasting waiting for a model that generates unnecessary decision variables, has transitions that are impossible to trace, and so on. 

Afraid of tossing the baby out with the bath water? Contact me!

If you need help revamping your model, or looking to start over from scratch, I'm more than happy to help. Give me a shout! Nobody knows more about Woodstock modeling!


Thursday, October 24, 2024

Spatial Planning - Operational Feasibility in Map Data

Introduction

In this article, we look at the issue of map data with an eye toward operational feasibility. If GIS map preparation is sloppy, you can end up with duplicates, overlapping polygons and/or gaps in the coverage. Moreover, we can make the situation even worse when we intersect these layers in GIS or try to group stands together using analysis area control.

Intersection Considerations

Best practices suggest that GIS layers represent similar types of information, such as stand boundaries, soil series or harvest units. A simple intersection of these types of information can result in a very large number of new polygons due to near-coincident lines. This can be a problem if available yield tables only represent a portion of those new combinations of attributes. So how do you avoid this? 


A harvest unit may incorporate multiple stands.



Stands that can be harvested together have similar ages. Outliers are not operable.

Coverages are hierarchical in nature. Property boundaries, surveyed features like right-of-ways, etc. should be considered inviolate, and trump any other coverage. A clipping function should be applied to ensure that only stands within the boundary extents are counted.

Stand boundaries have an inherent fuzziness to them such that sliver removal procedures should not cause too much change overall. For soil series, I recommend that instead of a topological polygon intersection, clients perform a spatial overlay, assigning the stand polygons the majority soil type within each boundary. That way, there is only one soil type per stand, with consistency in site quality and productivity. In other words, only one yield table per stand.

Finally, consider what I call my "1%/1000 acre rule": if a development type is less than 1% of the total land base or less than 1000 acres (whichever is smaller), it is too small for consideration in the model. For example, if you are modeling a property of 500,000 acres, does it make sense to spend resources on yield table generation for 150 acres (0.03%)? Even if it contains the most valuable timber, its contribution to the objective function is going to be negligible. For small projects, the thresholds may not work as-is, but the concept still holds. You can, of course, choose different thresholds, but you shouldn't just accept every possible permutation of thematic attributes that is presented to you.

Harvest Blocking Considerations

For harvest unit boundaries, focus on retaining stand boundaries as much as possible when you do layout. An arbitrary straight line between two similar stands often will not cause problems. However, if the stands are very different in age, the solver may find it difficult to schedule the resulting harvest unit due to operability criteria. Why would this happen?

If you are using Woodstock's analysis area control feature (AAControl), you need to be aware that the a harvest unit or block can only be assigned a single action (or prescription, if using Regimes). While the scheduling is at the block level, operability remains at the DevType level. That means if you combine stands where the operability conditions for each member polygon do not overlap (due to age or activity differences), the entire unit is never feasible to harvest. Some examples, you ask? How about when you include an area of reforestation in a unit that should be harvested? Or you combine two stands for commercial thinning where one portion must be fertilized and another does not?

A Software Fix for Sloppy Work

When you set up analysis area units (AAUnits) in Woodstock, you want the model to choose a single activity for the entire unit, preferably all in one planning period. Early on when clients tried AAControl, a flurry of tech-support calls came in complaining that units were never being cut. 

The culprit was sloppy layout, with incompatible stands grouped together. The solution was a software fix where units can be cut without 100% of the area treated. With this new approach, the units were scheduled but some of the areas within those units were not harvested. Ultimately, Remsoft set the default threshold for operability of AAUnits at 20% of the total area. Users can change that with higher or lower limits up to 100%. 

Frankly, I think this default is way too low. Sure, the sloppy units don’t create tech support calls anymore, but if you only harvest a minor fraction of a unit, what is the point of using AAControl? Worse, I don’t recall reading anything about what happens to the stands that were not scheduled. So I decided to investigate this operational feasibility issue for myself.

How to Control Operability of AAUnits

You activate AAControl and specify the threshold for operability of harvest units in Woodstock in the Control Section.

*AACONTROL ON ; minimum operable area is 20% of the unit are
*AACONTROL ON aCC ; schedule only the aCC action using AAControl
*AACONTROL ON 99% ; minimum operable area is 99% of the unit area

Remember, the default operability threshold is 20%. If you specify a threshold of 100%, all member polygons must be operable at the same time for Woodstock to create a decision variable for the unit. However, if some part of the unit is always inoperable, the unit will never be harvested. Conversely, if you accept the default, at least 20% of the area of a unit must be operable in order for Woodstock to generate a harvest choice for it. But that choice only includes the operable stands! The inoperable ones are omitted if that choice is part of the optimal solution. Look at the schedule section for omitted DevTypes:

  0.017831 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.2 100.0%  
 31.863753 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.6 100.0%    
  0.034987 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.3 100.0%     
 68.775675 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.4 100.0%     
  1.688875 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.7 100.0%     
  5.554339 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.5 100.0%      
  0        aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.1 100.0% Area = 0.77953                                                                 not operable in 4

The last record shows that 0.77953 acres of unit U343 is not operable in period 4, so it was not harvested, even though 100% of it was scheduled as part of unit U343!

What Happens to the Inoperable Parts?

The example above comes from a solution where I set the AAControl threshold to 99%. Since the unit is larger than 100 acres and the inoperable portion is less than 1 acre, clearly the unit meets the threshold for operability.

The inoperable areas are part of a larger development type (DevTYpe) that is split between two harvest units (U343 and U338). If we scan for the inoperable DevType acres later in the harvest schedule, we only find the acres in unit U338, not the residual from U343. The residual part of U343 never appears in the schedule section again. Note, this is not due to a constraint where only harvest units count. Also, I cannot say for sure that this happens in every model, but I have seen this behavior many times over the years.

If the number of acres that remain unscheduled is small, in terms of operational feasibility, it may not be a source of concern. For example, I’ve seen lots of spatial data with small fragments of older stands scattered among large areas of recent clearcuts, so it isn’t uncommon. However, at some point you do need to harvest those areas. A model that continues to produce unharvested fragments may be problematic from an operational feasibility standpoint.

How to Avoid “Forgotten Areas”

We just saw that including stands of very different ages can be problematic for scheduling final harvests. Clearly, it is best to group similar stands together in harvest units. If that is not possible, at least grow them all long enough that the youngest and oldest cohorts are both within the operability window. It may make perfect sense to hold an old stand fragment well beyond its normal rotation for operational feasibility. To make that happen, the old stand must be a significant part of the harvest unit, and you must also set a high operability threshold.

However, similar issues can arise with commercial thinning even if the ages are similar. For example, suppose that you generated thinned yield tables with defined age (14-17) and volume removal (> 50 tons/ac) requirements. The first stand is currently 15 years old but doesn’t reach the minimum volume bar until age 17. The second stand is currently 16 years old and is OK to thin immediately. But you created a unit that includes some of both stands. Now, when scheduling a thinning, there is no operability window to treat both stands simultaneously: by the time the first stand is operable in 2 years (age 17), the other stand will be too old (age 18).

17 109.37 rCT 2 _Existing _RX(rxTH) |Aaunit:AA099| ;B82.3 100.0%
16   0    rCT 2 _Existing _RX(rxTH) |Aaunit:AA099| ;B82.2 100.0% Area = 36.8500
                                                              not operable in 2

What's The Solution? Don’t Be Sloppy!

More care and attention to operability windows can avoid many problems. Don’t hardwire minimum volume limits when generating yield tables. What do I mean by this? Suppose you are generating yield tables for stands thinned from below to some basal area limit and you only generate a yield table if at least 50 tons/ac is produced. That works fine for harvesting stands, but when you use AAControl, that might cause some of your units to no longer be viable.

Instead, generate the tables accepting whatever volume results from the thinning. You can always add a minimum volume requirement to the *OPERABLE statement in Woodstock to avoid low-volume thinning. But, if there is no yield table for the low-volume stands, you can’t create one after the fact. Also, don’t set the threshold too high! Although the DevTypes in the unit determine the operability for an AAUnit, another portion of the unit can raise the average volume above the threshold for operational feasibility.


Magenta-colored stands do not meet operability threshold

For example, in the figure above, the magenta-colored polygons in the unit shown do not meet the 20 MBF/ac threshold desired. However, the blue areas exceed the bar by a significant amount. In fact, the overall average for the unit easily exceeds 20 MBF/ac. So, rather than set the minimum volume needed to 20 MBF/ac, maybe set it at 15 so the low volume portions of the block can still be harvested. You can still apply a constraint requiring at least 20MBF/ac harvested on average.

Suggestions for Better Operational Feasibility

Be careful with GIS overlays!

  • Avoid the creation of near-coincident lines and slivers.
  • Use spatial overlays to assign soil attributes to stand polygons.
  • Follow stand boundaries as much as possible when delineating harvest units.
  • Split stands only when necessary and into significant portions, not slivers.

Review proposed units to make sure they make sense!

  • Don’t combine incompatible stands within the same unit (age 5 + age 200 is not a good combo)
  • Provide yield tables and timing choices for member DevTypes that overlap in operability.
  • Remember that DevType operability determines unit operability, not the weighted average of the unit. Choose appropriate thresholds.
  • Don’t accept the Woodstock default of 20% for AAControl! If you can't harvest the majority of a harvest block, then the block is poorly laid out. Fix it!

Having troubles with spatial planning? Contact Me!

If you are concerned about the operational feasibility of your harvest planning models, give me a shout. We can schedule a review and work on some improvements that will provide more confidence in your results. 

Tuesday, October 1, 2024

Hide & Seek In Woodstock

Introduction

Most times, Woodstock modelers are working to report on outcomes and activities. But there are also times when you don't want to see something. Luckily, the syntax provides you with a few different keywords that can do this for you. First, we'll cover a few that have been around a long time, and a new one that was introduced in the latest release. As always, you should understand what's going on when you deploy a Woodstock keyword before using it in an analysis.

*EXCLUDE Keyword

The *EXCLUDE keyword can be applied in both the OPTIMIZE and AREAS sections, but the way the work is quite different in each section. Let's go over the syntax and rules for each

Optimize Section

The keyword tells the interpreter not to generate decision variables for the specified action(s) in the specified period(s). You might want to do this for a few reasons.

  • Resources needed to perform the action are unavailable.
  • A policy directive expires after a few years.
  • A pre-planned schedule (*LPSchedule) already includes enough acres of the specific treatment and you do not want to do more.

After *EXCLUDE, on the next line you type the action code and the period(s) in which you do not want the action to appear.

*EXCLUDE
aCC 1..5        ; considered in periods 6 onward
aTH 1.._LENGTH  ; never considered

The interpreter will not generate any decision variables for the clearcut action aCC in the first 5 periods. However, it will generate decision variables for actions in the LPSCHEDULE, regardless of any directives to exclude actions. The requirement for LPSCHEDULE actions to be implemented always trumps the *EXCLUDE statement.

Although you can write constraints to force the area treated to zero, most of the time using constraints like this is undesirable. The biggest problem is that the interpreter generates decision variables that not needed, and this inflates size of the LP matrix (i.e., making it harder to solve). Using the *EXCLUDE keyword makes it obvious that there will be no outputs related to the action. 

On the other hand, if you're trying to show the potential value of an action to a skeptical client, it may be better to use the constraints because there will be shadow prices. If the shadow price is zero, that indicates that the action provides no additional value. Otherwise, non-zero shadow prices would indicate the potential for a better solution.

Areas Section

The keyword tells the interpreter not to consider portions of the forest for a specified period of time. Again, you might want to do this for a number of reasons.

  • There are leases that expire in the near term.
  • You are considering a land acquisition and want to compare the estate NPV with and without the acquisition
  • Your agency is evaluating policy options that affect public land management.

Again, the syntax is easy. The *EXCLUDE keyword must appear before any area records. Below the keyword, you write full development type masks to represent areas you want excluded. Clearly, the easiest way to achieve this is through a particular theme and set of thematic attributes (e.g., an ownership them and attributes separating fee simple from leases). These also must appear before any area records that are prefaced by A.

*EXCLUDE
 ? ? ? ? TL2022 ? ? ? ? 3.._LENGTH ; timber lease expires in 2022
 ? ? ? ? TL2023 ? ? ? ? 4.._LENGTH ; timber lease expires in 2023
 ? ? ? ? TL2024 ? ? ? ? 5.._LENGTH ; timber lease expires in 2024
*A ABC 1023 PL PE FEE SI80 TM N E3000 37 124.32 ; 2 polygons
... 

Don't be misled, however. The excluded areas do not really disappear – if they did, the model would be infeasible because initial area constraints would be violated. Instead, the excluded areas are transferred to an internal pool of acres that are cannot be accessed by the user. They are not eligible for treatments nor are they reported in inventory measures.

*COMPRESSTIME Keyword

The *COMPRESSTIME keyword is a switch used in the CONTROL section. Its purpose is to limit the number of decision variables generated by the LP matrix. 

*COMPRESSTIME 41,2

I've found that a lot of people don't really understand how it works. Here are some rules:

  • Timing choice pruning commences in the first period (x) following the *COMPRESSTIME keyword, and continues until the end of the planning horizon.
  • The number of timing choices skipped (y) is specified after the starting period of *COMPRESTIME. 
  • It only applies to future development types that have undergone at least one transition. If your model has too many timing choices in existing DevTypes, *COMPRESSTIME will have no effect.
  • If the action has only one or two timing-choices per period, *COMPRESSTIME has no effect. The first and last timing choices are always generated.

I'm not a big fan of this feature because it potentially wastes a lot of time for biometricians. Imagine generating a ton of regen yield tables for commercial thinning for ages 10-24 in one-year increments. That's a LOT of yield tables, and if you're using *COMPRESSTIME, up to half of them are not being used. Consider the number of timing choices BEFORE commencing yield generation!

The New Not (!) Operator in the LANDSCAPE section

Basic landscape attributes and aggregate attributes provide a very flexible system for identifying development types, from very specific to very general. However, it has always been somewhat cumbersome to report on the "everyone but X" category if there are lots of attributes in the theme.

*THEME 5-Lease
 TL2024
 TL2025
 TL2026
 ...
 TL2055
 FEE
*AGGREGATE LEASES
 TL2024 TL2025 TL2026 ... TL2055
*AGGREGATE NO2024 ; everyone but 2024
 TL2025 TL2026 ... TL2055

The 2024 release of Woodstock now allows you to use a ! operator in a DevType mask in place of an aggregate. For example, suppose we want to report on inventory in unexpired leases in period 1. Previously we would need to use an aggregate like NO2024 shown above. In the new release, we don't need an aggregate and instead we write this:

*OUTPUT oiNo2024
*SOURCE ? ? ? ? !TL2024 ? ? ? ? _INVENT yiMBF

Discussion

Your action definitions are not affected by *EXCLUDE; the interpreter still processes the code. It is NOT the same as if you commented out the action specifications entirely. If you remove the action specifications, the interpreter will still see the code referenced in the TRANSITIONS and OUTPUTS sections, and will generate errors.

On the other hand, when you exclude development types from the AREAS section for the entire planning horizon, the effect is the same as if you commented out area records entirely. If you exclude area as a way to preclude harvesting, be careful! Any output derived from the excluded area will be zero, including outputs like habitat or hydrological maturity. Constraints that depend on the ratio of these types of habitats may no longer be feasible once the area is excluded. Let's consider an example.

Suppose you want to evaluate reallocations of land from one management emphasis to another. It is better to recognize those options within a landscape theme. Define actions that are restricted to one management emphasis. Evaluate the impact of reallocation by characterizing your forest with and without the new designation. If needed, create separate AREAS sections for each and run the analysis with scenarios.

Is Woodstock Syntax Eluding You? Contact Me!

If you’d like a new set of eyes to look over your model(s), give me a shout. I'm happy to review your model, and if needed, we can conduct an on-site training review with your team.

Tuesday, August 13, 2024

Benchmarking - A Method for Determining Appropriate Constraint Bounds

Introduction

Whenever I train new analysts in harvest scheduling and optimization, I make a point to include the basic assumptions of linear programming. The beauty of LP is that if a feasible solution exists, the solver will provide an optimal solution. One of the key points I make is that constraints will always cost you in terms of objective function value except in rare situations where the cost is zero. In other words, you can't constraint your way to an improved objective function value. So, you would think that people confronted with difficult policy constraints would spend the time to determine reasonable bounds before plowing ahead with analysis, and, dog forbid, employing goal formulations to avoid infeasibility. A recent project has taught me that I have more to teach.

What Is Possible?

Generally, allocations to different management emphases are more typical of public land management than private, but I have seen allocations to carbon projects or conservation easements handled similarly. My point is, however, that you should know what your forest is capable of. For example, if you have identified 3000 acres of land suitable for old growth designation, you should expect an infeasible solution if you constrain the model to allocate 4000 acres. That is patently infeasible (aka obvious). Yet, I've had that situation arise more than once in my career. But what if you're not able to explicitly identify qualifying stands, and you have more than one metric to consider (e.g., critical species, hydrological maturity, and so forth)? That is when you need to consider benchmarking.

Know Your Bounds

As a graduate student, I was very fortunate to attend a seminar given by Larry Davis from UC Berkeley where he introduced us to the concept of benchmarking. His opinion was that models can only go so far and that the final determination for a management plan was political rather than analytical. Models can only provide the boundaries within which a viable solution must exist. For example, suppose we have to consider timber production, late seral birds and bird hunting opportunities. Co-production of these metrics is possible to a degree, but how much of each is possible? Davis suggested that 3 model runs, each maximizing one of the metrics, provides critical information. Clearly, production of each metric can't be increased beyond the value of the respective objective function value. However, the lowest values of each metric in the non-optimized runs provide useful floor values as well. By taking the highest and lowest values of the three metrics, you now have a bounded space that can guide constraints in your analysis. For example, metric DIV never exceeds 4.5, and at worst, it never falls below 1.75. 


DIV1 - MAXMIN Objective


DEN2 - MAXMIN Objective


WFH3 - MAXMIN Objective

Depending on the number of metrics you need to consider, a benchmark exercise may require a lot matrix generations and solution time. Luckily, there is a way to speed things up by reducing the number of matrix generations.

Smart Benchmarking

In a benchmarking exercise, the idea is to find the biological capacity of the forest to produce the output of interest. If constraints are necessary (e.g., budgetary or strict policy constraints), they are imposed regardless of the objective function. And if the constraints are always the same, you can declare multiple objective functions and generate the matrix exactly once.

In the following example, I have 9 biodiversity indices to consider. I used a MAXMIN formulation because I thought it important to avoid solutions where the worst outcomes for each metric are crammed into one or two periods so that the rest of the periods appear better. (A more technical discussion of this phenomenon is presented here.) The only constraints are the need to cover any management costs from timber revenues (cash-flow positivity) and a requirement to reforest any backlog. 

*OBJECTIVE
 _MAXMIN oidxDIV 1.._LENGTH
 _MAXMIN oidxDEN 1.._LENGTH
 _MAXMIN oidxWFH 1.._LENGTH
 _MAXMIN oidxBEE 1.._LENGTH
 _MAXMIN oidxESB 1.._LENGTH
 _MAXMIN oidxLSB 1.._LENGTH
 _MAXMIN oidxRTV 1.._LENGTH
 _MAXMIN oidxAMP 1.._LENGTH
 _MAXMIN oidxUNG 1.._LENGTH
*CONSTRAINTS
 orHARV - ocHARV - ocSILV - ocFIXD >= 0 1.._LENGTH ;cash-flow +ve
 oaNULL <= 0 1 ;plant backlog

If you use the MOSEK solver, Woodstock will present a dialog after matrix generation where you can choose which objective to optimize. Once the solution has been written, simply create a scenario from the base model to store the results. Go back to your base model, select from the menu Run | Run Model Bat file, and when the dialog appears, choose a different objective. Continue creating scenarios and rerunning the batch file until all the benchmarks have been run. Usually, I wait until all the solutions are obtained before executing the schedules because I can do that automatically using the Run | Run Scenarios menu option. Once your scenarios have all been executed, use the Scenario Compare feature to determine highest and lowest values for each metric.

Other solvers may not support multiple objective functions directly. In this case, you can edit the MPS file that Woodstock generated to change the objective function ID on the first line from OBJ1MIN to OBJ2MIN:

NAME    RSPS 2023.2.1 (64 Bit)                              
ROWS
  N OBJ2MIN
  N OBJ1MIN
  N OBJ3MIN
  N OBJ4MIN
  N OBJ5MIN
  N OBJ6MIN
  N OBJ7MIN
  N OBJ8MIN
  N OBJ9MIN

Save the change but don't close the file. Use Run | Run Model Bat file as before and when complete save the scenario. You will need to change the objective function between each solution but the edit is fast and certainly quicker than generating a new matrix each time. Also, note that Woodstock limits you to 9 objective functions due to the naming convention of the MPS file.

Summary

This approach is useful for any analyst tasked with benchmarking, but it is particularly important for public agencies issuing policy requirements. Don't impose constraints without knowing whether they are technically feasible! You should be able to demonstrate how that it can be done using real-world data.

Looking For More Efficicency Ideas? Contact Me!

Give me a shout if you'd like to learn more about benchmarking analyses or any other modeling difficulties you may be having. No one has more real-world experience with Woodstock than me.

Why are MIP models difficult to solve (or not)?

Introduction I recently joined a conversation about why a mixed-integer programming (MIP) problem is so much harder to solve than a regular ...