Monday, February 17, 2025

Constraints and Feasibility

All Constraints Are Equal...
       But Some Are More Equal Than Others

Apologies to George Orwell, but the way some people view constraints puzzles me. To an LP solver, all constraints are equally important in that a feasible solution must respect all of them simultaneously. However, as modelers we must recognize that in terms of real-world significance, all constraints are NOT equally important. 

Woodstock Constraints Are More Complicated Than Meets the Eye

Consider a simple Woodstock constraint with a 50-period planning horizon:

oaHABITAT >= 5000 1.._LENGTH

This statement represents not a single constraint but 50 of them with 50 accounting variables:

oaHABITAT[1] > = 5000
oaHABITAT[2] > = 5000
oaHABITAT[3] > = 5000

         ...

oaHABITAT[48] > = 5000
oaHABITAT[49] > = 5000
oaHABITAT[50] > = 5000

What we see as oaHABITAT in Woodstock is really an array of accounting variables, indexed by planning period. For each accounting variable, there is a target value to achieve and a planning period in which to do it. Two questions come to mind for just this single output: how important the target value is and are later planning periods as important as earlier planning periods? You should also be considering the other constraints in your model in the same fashion. 

How To Determine Relative Importance

Relative importance should be fairly easy to discern:

  1. Legal requirements
  2. Contractual requirements
  3. Policy requirements
  4. Financial requirements
  5. Desirable outcomes

If you are required by law to maintain some condition and you can reasonably model it, these are the constraints that you should verify first. Next, you may have some contractual requirements (say, a volume of wood products) that necessarily must be met. Organizational policy requirements should come next, followed by any financial obligations your organization may have. Finally, constraints that represent desirable rather than necessary outcomes should come last. If you formulate a model with all the constraints in place and it solves, great! But if it doesn't solve, then you need to determine where the problem is, by going back to your hierarchy and running scenarios! Don't just slap goals on everything and call it good!

Mitigating Infeasibility

Last time, I suggested that a goal formulation could be used to determine the minimum shortfalls in potential habitat. For example, in this case we could use:

*OBJECTIVE
_MIN _PENALTY(_ALL) 1.._LENGTH
*CONSTRAINTS
oaHABITAT >= 5000 1.._LENGTH _GOAL(G1,1)

If the solution meets or exceeds 5000, the goal penalty is zero, otherwise the penalty is 1 per hectare of shortfall. If we graph oaHABITAT over the planning horizon, we see this outcome:

Significant Shortfalls in Early Periods with Reversal

We know there is no solution where we can reduce the total shortfall area across the planning horizon. However, this is a situation the timing of the shortfall matters as much as the amount. By Period 10, the shortfall is gone but it returns in Period 16 for four periods. Perhaps, it would be better to have more shortfall early on and then no shortfalls at all. How could we effect that outcome?

We use discounting in financial models to reflect the time value of money: revenues in the future are worth less to us now than current revenues. If we turn that logic on its head, we could use lower penalties early on and higher penalties later to show that shortfalls in the future are more problematic. Unfortunately, Woodstock doesn't offer an easy way to automatically increase goal weights, but we can write constraints directly using a FOREACH loop:

FOREACH xx in (1..50)
oaHABITAT >= 5000 xx _GOAL(Gxx,xx)
ENDFOR

We have to write explicit planning periods, goal_IDs and weights, but luckily, we can just use an increasing series of numbers from 1 to 50. Now, every subsequent period increases the penalty on shortfalls. While the total amount of shortfall may increase, the period of shortfall should be as short as possible:

Higher Shortfalls Early, No Reversals

The second scenario (Series 2) increases the total shortfall over the planning horizon from 12,845 to 12,857, but it does so without reversals:

A Tradeoff Between Total Shortfalls and Reversals

Supposing that habitat is a legal requirement, and we know of a strategy to eliminate shortfalls within 10 planning periods, we should adjust the constraints to reflect these achievable habitat values and remove the goal formulation. Any additional constraints will have to maintain these habitat levels.

Summary

In this post we considered the relative importance of different kinds of constraints as well as the importance of timing of outcomes. Rather than relying on blanket goal formulations and default weights, we applied some logic to the situation to devise an improved solution. Unfortunately, not every situation will work to your favor. It may turn out that increasing weights over time has no effect on shortfalls, but at least by trying you know this to be true.

Contact Me!

If you'd like some help with a tricky formulation, or you'd like to engage in a custom training for your team to improve their analytical skills, give me a shout. And again, if there's something you'd like me to cover in a future blog post, let me know!

Thursday, February 6, 2025

Let's Talk About Infeasibility, Shall We?

I Think Many of You Are Dealing with It Improperly

In my work as a consultant, I'm often presented with models that aren't working as the client intended. That means I get to see how the model has been formulated and how the analysis has been done. And frankly, its a bit shocking sometimes what passes for a preferred alternative among a bunch of scenarios. Clearly, some of you don't really understand infeasibility and the proper way to deal with it.

What Do Feasibility and Infeasibility Mean?

In every constrained linear optimization model (i.e., LP, MIP), there must be a set of constraints in place that define a convex feasible region. Convexity just implies that there is a smooth gradient such that there are no local optima present. In the following example, the blue lines represent a convex feasible region for a minimization problem because there is only one minimum point (global optimum). The orange region is not convex because there are two local optima visible:
Feasible Regions for 2 Minimization Problems

If we have a convex set of constraints (red lines and direction of inequality) and an objective function (dashed line), we can search the corner points of the feasible region (pale green) and locate the minimum corner point, which is the global optimum. We have a feasible and optimal solution.

Locating the optimal solution

Suppose we add another constraint. Now, there is no feasible region because there is no overlapping area where the constraints can be met simultaneously. The example is patently infeasible:

Patently Infeasible Problem

If you look at the graphic, you might be wondering why someone would propose a constraint that so obviously cannot be met. Good question. However, some of you are doing this all the time. Years ago, I worked on a project for the US Forest Service. They had 9 structural stage conditions that they wanted constrained to be in fixed proportions of the forest. Think about that - how do you maintain a static structure in a dynamic forest? The youngest stands grow out of earlier structural stages into later ones. The only way to replace them is through harvesting. Alas, the earliest structural stage lasts 20 years, but it takes more than 120 years to reach the latest structural stage. You don't need an LP to determine that the desired outcome is patently infeasible.

Ah, But What About Goals?

Yes, goal formulations are very useful when trying to mitigate infeasibility, but you have to use them correctly. Many of you are not.

How Do Goals Work?

Let's say you have a constraint requiring 5000 acres of some type of habitat:

oaCaribou >= 5000 1.._LENGTH

If you know the characteristics of the habitat, you should be able to determine if at least 5000 acres currently exists in the forest. If it doesn't, your constraint is patently infeasible. The more interesting question is, can we improve the situation over time? If so, we need to know by how much and how long will it take? Finally, once we achieve the desired acreage, can we sustain it? 

From the models I have reviewed over the years, I know at least some of you are addressing this problem by applying goals. A goal applied to a constraint simply introduces slack and/or surplus variables to eliminate the infeasibility. For our example, the habitat constraint becomes this:

oaCaribou + Slack = 5000 1.._LENGTH

If oaCaribou is less than 5000 in any period, the variable Slack takes on the value of the shortfall. But since Slack is a free variable (not tied to land or activities), it could take on any value greater than zero and less than or equal to 5000. In other words, the model could harvest qualifying habitat and simply make Slack larger. To avoid this outcome, we need to penalize the use of Slack so that it only is used when there is no alternative. Said another way, we want the solver to enforce the constraint unless it is impossible, and then only to the degree required. If you think about it, this is a minimization problem. The correct way to deal with the goal is to minimize Slack, subject to the constraint.

*OBJECTIVE
_MIN Slack 1.._LENGTH
*CONSTRAINTS
oaCaribou + Slack = 5000 1.._LENGTH

If we solve this LP, we will know the total acres of shortfall across the entire planning horizon (objective function), as well as the shortfalls in each planning period (Slack). Now that you know where the shortfalls are, you can adjust the RHS values for each period down to where the constraint is just feasible. You know you can't improve the objective function which means the sum of the shortfalls cannot be improved. How do we know this? If there were a way to reduce the sum of shortfalls, then the solver would have found it. Now, there may well be alternative optimal solutions where the shortfalls are different in different planning periods, but the sum across all periods will never be less. We will return to this idea in a future blog post.

So Why Do So Many of You Leave Goals in Place?

I've reviewed models that are supposed to be preferred alternatives from a bunch of scenarios that were run, and they look like this:

*OBJECTIVE
_MAX oqTotal - _PENALTY(_ALL) 1.._LENGTH
*CONSTRAINTS
oaPL_OG - 0.045 * oaPL >= 0 1.._LENGTH _GOAL(G1,9999)
oaSB_OG - 0.045 * oaSB >= 0 1.._LENGTH _GOAL(G2,9999)
oaSW_OG - 0.045 * oaSW >= 0 1.._LENGTH _GOAL(G3,9999)
oaMX_OG - 0.045 * oaMX >= 0 1.._LENGTH _GOAL(G4,9999)
oaOC_OG - 0.045 * oaOC >= 0 1.._LENGTH _GOAL(G5,9999)
oaHW_OG - 0.045 * oaHW >= 0 1.._LENGTH _GOAL(G6,9999)

First off, there's our shining example of a patently infeasible problem, made that much worse by embedding the penalty function for the goals in the objective function, and by using arbitrary large goal weights. The use of goals implies that missing the target is generally achievable, but sometimes you miss. How is that working out? Let's look at the goal summary that Woodstock produces as part of the SCHEDULE section:

; Constraint: oaMX_OG - 0.045 * oaMX >= 0 1.._LENGTH
; Period    Period    Above    Below    
     5         5               1687.7        ; C1R11
     6         6               1687.7        ; C1R12
     7         7               1687.7        ; C1R13
     8         8               1684.3        ; C1R14
     9         9               1463.2        ; C1R15
    10        10                501.6        ; C1R16
    11        11                463.8        ; C1R17
    12        12               1687.7        ; C1R18
    13        13               1687.7        ; C1R19
    14        14               1687.7        ; C1R20
; Constraint: oaSW_OG - 0.045 * oaSW >= 0 1.._LENGTH
; Period    Period    Above    Below    
     6         6               8412.8        ; C9R6
     .         .                   .         .   .
     .         .                   .         .   .
     .         .                   .         .   .
    20        20               8412.8        ; C9R20   

Clearly, the constraints are only working for the first few periods before shortfalls appear. In the case of oaPL_OG, you are always 8412.8 ha short after period 5. So why not adjust the RHS values and accept that your policy objective is set too high?

And what about the objective function value, that is supposed to maximize total harvested volume? Well, that works out to be -3.38583928E11, or negative 338.5 billion. Is that in cubic meters? No, because it includes the sum of penalties across all the constraints, some of which is measured in hectares scaled by an arbitrary value of 9999. The objective function is meaningless other than to show that the sum of penalties completely swamps any actual harvest volume.

Please Stop Abusing _GOAL

If you work for a public lands agency and your policy directives are never achievable, why is your agency keeping them in place? Did you not do any analysis ahead of time to determine if such policies were feasible? If you are approving management plans based on model solutions that still incorporate goals, why are you doing that? If you work for a consultant or timber investment management organization, you probably have fewer policy directives to consider. Still, you need to be honest about your assumptions. If you apply goals to constraints on minimum product harvest volumes or cash-flow positivity constraints, you're simply masking the fact that you're violating these constraints. And if every constraint in the model has goals applied to them, do you really know if any of them would be feasible absent the goals?  If you're answering "I don't know" or simply "no" to any of these questions, I'd suggest you're not doing your job as well as you should. If constraints truly are infeasible, you should be honest about it and your model should reflect the true state of things. Otherwise, someone like me will call BS and do "the emperor has no clothes" thing. We will continue this discussion about constraints and infeasibility next time.

Contact Me! 

If you would be interested in a training session on how to properly do harvest scheduling analysis using goals and other techniques, let me know. And if you have any comments about this blog post, or ideas about future blog posts, add them to the comments section.

Thursday, December 5, 2024

Why are MIP models difficult to solve (or not)?

Introduction

I recently joined a conversation about why a mixed-integer programming (MIP) problem is so much harder to solve than a regular linear programming (LP) problem. Clearly, it is more problematic than just rounding the fractional quantities up or down to the nearest integer. However, it isn't just the variability in solution time that many find confounding - the solutions themselves are often counter-intuitive. To be honest, the most valuable solutions from modeling are the counter-intuitive ones because it means you've learned something new and unexpected from your model. And to me, that's the whole point of modeling.

MIP models do not have to be large to be difficult to solve. In fact, some of the most difficult MIP problems known are trivially small. For this discussion, however, let's stick to forestry and concepts we are familiar with.

Districting

8 Districts

Suppose I have 8 districts in my forest, and I want to know if activity is occurring in each district. I want my model to report that using a binary variable, where 0 = no activity, and 1 = activity. The problem is fairly easy to formulate as a MIP: set up binary variables that each represent harvest in a district. 
*VARIABLE
  vbCut(D1) _BINARYARRAY
  vbCut(D2) _BINARYARRAY
           .
           .
           .
  vbCut(D8) _BINARYARRAY

To make the binary variable useful, I need to relate it to an output in the model (harvest area). I need a constraint that will force the binary variable to equal 1 if I harvest any amount of area in the district. In the general case, I write the constraint as:

oaHARV(xx) - bigM * vbCut(xx) <= 0 1.._LENGTH

In this case, I use a theme-based output where district is one of the themes. The bigM variable has to be greater than any amount of harvest area in a planning period.  I don't know the amount of harvest in any period, but I do know that if I harvested every acre in a district, bigM would have to be that large. In reality, since the districts are different sizes, I can set bigM slightly bigger than the largest district:

FOREACH xx IN (_TH11)
oaHARV(xx) - 15,000 * vbCut(xx) <= 0 1.._LENGTH
ENDFOR

By itself, this doesn't do very much. The binary variables could take on values of 0 or 1 and it wouldn't matter. I need an additional constraint to ensure that binary variables only take on values of 1 when harvesting occurs in the district.

vbCut(*) <= 8 1.._LENGTH

Now, the model will correctly report the districts being operated in each period. There is no effect on the solution because the same choices exist as in the LP relaxation (binary requirements relaxed). Interestingly, the solver took longer to prove optimality for the LP than the MIP.

; LP relaxation
; Elapsed time for solver 0.36s
; GNU reported :
;
; Status:     OPTIMAL
; Objective:  OBJ1MIN = -309600006.1 (MINimum)
; Elapsed time for solver 0.23 s
; GNU reported :
;
; Status:     INTEGER OPTIMAL
; Objective:  OBJ1MIN = -309600006.1 (MINimum)

No Additional Constraints, Just Change RHS

When you change the RHS values in a regular LP model, there usually isn't a big difference in solution time. For example, if you limit the total acres of commercial thinning to 500 acres and then solve it with a limit of 450, you don't see a big difference in solution time. In a MIP formulation, such a change can effect a huge change in solution times. In the following examples, I reduce the number of open districts in each period from (vbCut(*) <= 8) to 7, 6, 4, 2 and finally 1 district per period, just by changing the RHS value. All were solved with the HiGHS solver to a 0.1% gap:



Notice that the constraint on open districts really only starts to bite when 3 or more districts become unavailable. Let's look at which districts are open in the first 10 planning periods for the 6, 4 and 2 districts scenarios:


Not every district was accessed in the 4-district scenario. Presumably, enough harvest volume was available in period 3 that only 3 districts were opened. So, what do you think happened in the 1-district scenario?


Were you anticipating any districts to never be opened? Clearly, the even-flow constraint became limiting when only 1 district could be opened at a time. By opening the magenta or red districts, the objective function value would be worse than leaving them unharvested. Even though the harvest choices within districts are continuous variables, it isn't as simple as choosing the largest or smallest districts to balance the harvest flows. (Neither of the districts left unharvested were the smallest nor the largest districts.)

Additional Constraints

Suppose that in addition to limiting the number of open districts, there was also a policy requirement that you cannot enter the same district in the next period. This type of requirement is common in places like Canada (caribou ranges) but I also used it in a project in California. Basically, I need to restrict temporally adjacent binary variables from summing to more than 1:

FOREACH xx IN (_TH11)
vbCut(xx) + vbCut(xx)[-1] <= 1 2.._LENGTH
ENDFOR

If vbCut(xx) = 1 in period 2, then vbCut(xx) in period (2-1) = 0, and this progression continues until the last planning period. In period 3, vbCut(xx) can be 1 again, but could also be 0, allowing vbCut(xx) to be 1 in period 4. Thus, we have our policy requirement.

If we apply this constraint to the 8, 4 and 2 districts scenarios, we get the following results:

Adding the sequential access constraints to the scenarios dramatically increased solution times but on a percentage basis, the biggest impact was on the easiest scenario to solve, and the lowest on the most difficult. In terms of the sequence of entries into districts, we observe these results:


8-District Scenario


4-District Scenario


2-District Scenario

You'll note that the constraints would have no impact on the single district scenario, since it never exhibited any sequential access to districts originally. Also, note that in the original 2-district scenario, there was only one instance where a district was entered sequentially. However, when you compare district entries from that scenario with the no-sequential entry scenario, there are multiple changes to assigned planning periods:


> Sequential OK    < Sequential Not OK

Discussion

I hope this post has been successful in dispelling any myths about MIP vs LP. With MIP, we see that:
  • It isn't just "LP with a few integer/binary variables". It is a different beast.
  • A problem can be made much more difficult to solve by just changing a RHS value.
  • Adding constraints can increase solution time by a great deal without greatly affecting the objective function value
  • Even small problems can be hard.
  • Integer solutions can be quite counter-intuitive

Contact Me!

If you'd like to explore all-or-nothing decisions in your planning models, give me a shout so we can discuss it further. It may require that you invest in a better solver, but it will certainly test your understanding of outputs and constraints.

Monday, November 25, 2024

Is Your Conceptual Model Logically Sound?

Introduction

In the last few posts, I've been talking about flaws in your conceptual model that can make your model run slowly (i.e., it is an inefficient use of your time). Making poor choices about landscape themes can blow up the size of your model dramatically and make it nigh on impossible to properly do QA (or even figure out what is happening).

However, it is also possible to have a reasonable implementation of a flawed conceptual model. This is a model that, while semantically and syntactically correct, doesn't pass the smell test for how the real world works. For this exercise, I'm going to present snippets from real client models that illustrate both kinds of problems, without disclosing where they came from.

Exhibit 1 - 9 Theme Model


Exponential Growth in Columns


Almost 12 GB of RAM required

We see by the exponential growth in the number of columns by planning period, that there's something undesirable going on. For a model with only 9 themes and 20 planning periods, that is a very large number of columns that requires a lot of computer memory (almost 12 GB). Where to start?

Landscape Woes

The model identifies harvest settings, which makes this essentially a stand-based model. They used _INDEXing in the LANDSCAPE section to associate net area, and harvest method proportions to each unit, rather than using themes and the AREAS section. 

1431 _INDEX(yOpArea=56.0,yGroundProp=0.00,yLineProp=1.00,yHeliProp=0.00)
1433 _INDEX(yOpArea=96.9,yGroundProp=0.00,yLineProp=0.28,yHeliProp=0.72)
1434 _INDEX(yOpArea=62.3,yGroundProp=0.05,yLineProp=0.33,yHeliProp=0.62)

I dislike this approach because you cram too much information into a single attribute (stand ID), and so when you want to identify sub-groups, you need a bunch of *AGGREGATE definitions. This just makes things unwieldy.

*AGGREGATE ag1 ; Primary management base
*AGGREGATE ag2 ; Secondary management base
*AGGREGATE agNoSBI
*AGGREGATE agSBI

Worse, as we see above, there are clearly two distinct divisions in the land base: primary versus secondary, and the presence or absence of SBI data. This could have been much more efficiently handled as two themes with 2 attributes each:

*THEME Landbase
  P Primary
  S Secondary
*THEME SBI data
  Y SBI available
  N SBI not available

There is another theme that apparently is used to assign a scenario name to every development type (DevType) in the model using action aTH7. Woodstock already has a scenario management system, so the utility of this "feature" eludes me. However, it adds unnecessary decision variables to an already large LP matrix because everything needs to be relabeled from NA to something else. If it isn't a real decision, it shouldn't be modeled as an action.

Name That Theme!

When I train new analysts, I like to emphasize the idea that landscape themes should be pure, encapsulating only a single type of data, such as stand status vs rotation vs stand origin vs stocking vs planting stock. Unfortunately, the modeler did the complete opposite here, concatenating 5 different kinds of data into a single code:

NA Unknown or as initialized for all existing forest
CC Clearcut      
OR Overstory Removal
CT Commercial Thin
ST Seed tree cut
SW Shelterwood
SL Selection Cut                 
NA Natural
NP Natural with PCT    
PL Planted
PP Planted with PCT      
TP Tube Plant (nonstocked)                              
RP Regular Plant (nonstocked)  

Those of you who've had me as an instructor probably remember me asking you how many possible DevTypes can be represented by x themes with y attributes each? Here we only have 5 potential themes with 2-6 attributes each, but the model listed 172 valid combinations. Here's a sample:

SW20
CC_PL_NA
CC_PL_ST
OR_NA_SW
OR_NP_NA 
OR_NP_ST 

And, of course, you need to define a bunch of *AGGREGATES to identify the areas represented by the basic data types, like planted versus natural stands. Again, a better approach would be to use themes to explicitly identify silviculture state (cutover, stocked, shelterwood), stand origin (NA, PL), precommercial thin (Y, N), commercial thin (Y,N), planting stock (NA, TP, RP). It takes a single DevType mask to identify any of these things, versus hundreds of lines of code devoted to attribute listings.

Overall Impression


LP Matrix Statistics for the Model

Let's review a few things. First, the number of active DevTypes in this model is 316,572. For those that forgot their introductory training, Woodstock only allows 500,000 active DevTypes. The reason there are so many is a combination of retaining the original harvest unit ID after final harvest with the concatenated attributes we just discussed. There's just an explosion of new development types after each action. It almost appears that the motivation to formulating this model was to minimize the number of themes and to make the model as opaque as possible.

How to fix it? Jettison the whole thing and start over.

Exhibit 2 - 11 Theme Model


No Combinatorial Explosion of Decision Variables



Fast Generation Time with Low Memory Requirements

By all accounts, this model seems to be well formulated because it generates the matrix quickly, and the number of decision variables for existing and regenerated DevTypes is quite reasonable. Unlike Exhibit 1, the landscape section in this model is well thought out, with clearly defined themes and a reasonable number of attributes per theme. Clearcutting is the only harvest action with 3 regeneration options: natural regen, planting or direct seeding.

Transitions Section

Where I see potential problems arising is in the TRANSITIONS section. These are basically the responses to different DevTypes after final harvest using natural regeneration, with the extraneous global attributes stripped out:

*SOURCE BFDOM      
*TARGET BFDOM 35
*TARGET HRDMX 15            
*TARGET SBMX1 15
*TARGET PJMX1 10                
*TARGET CONMX 25                 

*SOURCE HRDMX 
*TARGET HRDMX 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE SBMX1 
*TARGET SBMX1 35
*TARGET PJDOM 10
*TARGET SBDOM 25
*TARGET PJMX1 15
*TARGET CONMX 15          

*SOURCE PJMX1 
*TARGET PJMX1 25
*TARGET PJDOM 30
*TARGET CONMX 15
*TARGET HRDMX 15
*TARGET SBMX1 15                 

*SOURCE CONMX 
*TARGET CONMX 34
*TARGET HRDMX 33
*TARGET SBMX1 33                      

*SOURCE HRDOM 
*TARGET HRDOM 20
*TARGET HRDOM 20
*TARGET PODOM 45
*TARGET CONMX 15

*SOURCE PODOM  
*TARGET PODOM 55
*TARGET HRDMX 15
*TARGET HRDOM 15
*TARGET CONMX 15                                              

*SOURCE PJDOM 
*TARGET PJDOM 45
*TARGET PJMX1 20
*TARGET CONMX 25
*TARGET HRDMX 10

*SOURCE SBDOM  
*TARGET SBDOM 35
*TARGET HRDMX 15
*TARGET SBMX1 15
*TARGET PJMX1 10
*TARGET CONMX 25

I suspect most of you are not used to seeing transition matrices that are not 1:1 (100%) but it was one of the earliest features of Woodstock to allow multiple outcomes after an action. The idea is if you harvest a mixed wood stand, it is quite likely that, after harvest, parts of the stand may be dominated by hardwoods and others by softwood species, even though a majority of the area contains both.

The potential problem I see here is that over time, we could see an overall shift from one forest type to another. Now if this is a landscape level model, that may not be overly problematic. That said, even at the landscape level I have trouble believing that all of these possible shifts make sense. Let's change the transitions matrix to a graphic to see what I mean. Within 3 rotations, a single DevType creates 9:


BFDOM Type Traced Through 3 Transitions

I see two issues with these transitions. First, BFDOM type will continually decrease with time because you'll notice there is no transition back to it from other types. Second, do the silvics of these species groups really allow such conversions, say, from black spruce muskeg to upland hardwood dominant?

What is even more alarming is when I see these types of transitions not in a landscape level analysis, but in a spatially explicit harvest schedule. Remember, LP is a deterministic model. These percentages may represent probabilities in a landscape level analysis, but in a harvest scheduling model, they represent exact proportions. It really doesn't make much sense to think a BFDOM stand would transition to five other types after harvesting. What would be the source of the regen for the other species? Poplar regenerates largely from root suckers, but if they were not present in the harvested stand, how would they appear later? For other species, what would be the seed source? Unless there's an adjacent stand with the right species present, it is very unlikely that a new species will just appear.

Harvest scheduling models should represent management intent, not necessarily what might happen on a particular harvest block. We don't plan for plantation failures but when they occur, we mitigate them with additional planting or other silvicultural amendments. Clearly, we can't predict them because we would avoid the outcome by doing something different. When developing a conceptual model, it is important to keep this in mind.

How To Fix It? Be careful using multiple outcomes.

There are issues with multiple transitions that affect the solver's ability to produce an optimal solution. Fractional quantities that get progressively smaller over time can be particularly bad with longer planning horizons. But even if that isn't a problem, you do need to recognize a simplification of the real world in your model. If the silvics or logic of your model is more complex than reality, chances are it isn't going to make sense to your colleagues or management team. What you'll end up with is an efficient but very bad model.

Contact Me!

If your model is behaving badly and you have no idea why, maybe it is time to consider an overhaul. I can work with you to determine requirements, suggest formulation efficiencies, and help you with ways to automate the process going forward. Give me a shout!

Monday, November 11, 2024

Improve Model Efficiency - Part 5

Introduction

Previously, one of my blog posts was about how the outputs you create can cause performance issues in a Woodstock model. Specifically, the use of inventory-based outputs can seriously degrade matrix generation time because of the overhead they incur. Often, these outputs can be avoided through the use of REGIMES. 

I also wrote a blog post recently on conceptual models. This time, I want to continue considering conceptual models by going even more basic. Let's talk about conceptual models and what questions the model is supposed to answer. Poorly conceived models are doomed to be slow from the outset.

What Kind of Model is it?

In the hierarchical planning framework, forest planning models can be strategic, tactical or operational. Strategic planning models are focused on the concepts of sustainability, capital planning, silviculture investment, volume allocation and return on investment. The questions they answer are of the how, what and when variety, such as "when should we harvest?", "what stand should we harvest?" and "how should we manage these stands (thin or not)?", etc. Because it is strategic, many details of management are omitted or unavailable. And yet, clients often make choices that pretend to provide detail or certainty where none is warranted. By focusing on the questions at hand, you can pare back the conceptual model and in turn, improve model performance.

A Bad Conceptual Model

Years ago, a client approached me and said that he was told by a university professor (who shall remain nameless) that his stand-based Woodstock model was too large to solve because it relied on a Model II formulation. The professor said that a Model I formulation would be a better choice because it would not suffer the excessive size problem. Of course, this was nonsense! I explained that a properly formulated model with the same yield tables, actions and timing choices would yield the same solution, regardless of the formulation. Depending on the solver, a Model II formulation could be slower due to the higher number of transfer rows relative to a Model I formulation. Regardless, the problem had to be due to the way the client defined his actions. 

Timing Choices

It didn't take very long to start finding the culprits First, I checked the ACTIONS section:

*ACTION aCC Y Clearcut
*OPERABLE aCC
.MASK() _AGE >= 40

Next, I checked the CONTROL and LIFESPAN sections:

CONTROL
*LENGTH 40 ; 5-year periods

LIFESPAN
.MASK() 500

Nothing ever died in this model because the oldest permitted age class would be 2500 years! Every existing development type (DevType) older than 40 years generated a decision variable in every planning period, even though the existing yield tables stopped at age 150. This resulted in a bunch of decision variables representing very old stands and all with the same volume coefficients. Even if you could solve it, these decision variables represent poor choices and are never chosen.

Transitions

Because this was a stand-based model, I checked the TRANSITIONS section. Sure enough, they retained the StandID after the clearcut. 

TRANSITIONS
*CASE aCC
*SOURCE .MASK(_TH10(EX))
*TARGET .MASK(_TH10(RG)) 100

This prevented Woodstock from pooling acres into one of the regen DevTypes represented by a total of 12 ( yes, twelve!) yield tables. Instead of ending up with 12 DevTypes with multiple ages in each, they ended up with thousands and thousands of DevTypes that overwhelmed the matrix generator. The client asked about *COMPRESSTIME. I said it could eliminate some decision variables for each DevType, but the real problem was the excessive number of development types. By replacing the StandID theme with a generic ID (000000) associated with regen yield tables, the combinatorial explosion of DevTypes was averted. 

ACTIONS 
*ACTION aCC Y Clearcut
*OPERABLE aCC
.MASK() _AGE >= 40 AND _AGE <= 150

TRANSITIONS
*CASE aCC
*SOURCE .MASK(_TH10(EX))
*TARGET .MASK(_TH10(RG),_TH12(000000)) 100

The revised model ran in minutes and was trivial to solve. There was no need for *COMPRESSTIME.

What About the Model I Issue?

The reason the professor's Model I model ran and the Woodstock model didn't wasn't due to model formulation. The professor's model excluded a lot of choices because they were enumerated by hand. So even though the yield tables and DevTypes were the same in both models, the choices represented in the two models were different. After the changes were implemented and the Woodstock model ran, the Woodstock model yielded a slightly better solution. Not because Model II is a better formulation, but because it contained choices that the Model I model lacked. Thus, another reason why you should avoid (when possible) software switches like *COMPRESSTIME that blindly omit every 2nd, 3rd, 4th,... choice. 

Model Results Got You Stumped? Contact Me!

If you have a slow, cumbersome planning model, give me a call. I can review your model and make some suggestions for improvement. If the changes are extensive, we can design a training session around them. No one has more years of Woodstock experience.

Monday, November 4, 2024

Improve Model Efficiency - Part 4

Start with a Good Conceptual Model

A lot of my consulting work involves reviews of client models. Usually, the client is experiencing either slow performance, or there is some aspect of the model that is making it infeasible, and they want me to show them how to fix it. In many cases, the problem isn't so much in the Woodstock syntax as it is in the logic that was used to construct it (i.e., the conceptual model). Woodstock training programs largely focus on syntax and how to get a model working. But other than the problem statement provided in the training handbook, there is little guidance in how to develop an efficient conceptual model. So, let's talk about that.

Years ago, we used to spend a lot more time training analysts on how to model specific kinds of silviculture. Often, we would have students draw flow-charts on the whiteboard that represent the actions, outputs and transitions of their forest models. It was a good exercise because it decomposes a series of activities into discrete events and outcomes. However, new analysts usually struggle with the process, and flow-charts can end up looking like this:


Poorly-conceived conceptual model

Consider Time

Every Woodstock model has a planning horizon divided into planning periods. The length of a planning period x the number of planning periods = the planning horizon. But how do you determine the length of the planning period? Usually, it is based on the desired resolution for the harvest level, or annual allowable cut (AAC). For most of my clients building strategic planning models, a planning period is 1 year in length. The model results then report harvests, expenses and revenues on an annual basis.

Another consideration, however, are the yield tables available. In more northern locations, tree growth is sufficiently slow that annual estimates are unreliable and so growth is incremented in 5- or 10-year time steps. While you can still use annual planning periods with these yields, you need to rely on some form of linear interpolation or spline-fitting functions which introduce biases to the estimates. In my opinion, it is best to match planning period lengths to the natural increments of your growth model.

Consider Your Actions

Once you have settled on your planning period length, next you need to consider the actions in your model. The first consideration is whether the action is a choice. Clearly, a final harvest, such as a clearcut, is a choice. Do I need a different action if I'm harvesting in conifer stands versus deciduous stands? That depends on your reporting needs. Differentiating harvest costs by species group is easily handled with different outputs and doesn't require two harvest actions. However, suppose a product is differentiated by how it is harvested (e.g., tree length operation versus cut-to-length operation where there is a price differential). In this case, you WILL need to have different final harvest actions.

Reforestation is almost always a policy requirement, but whether planting is a choice depends on the existence of alternatives. If you are using 5-year or decadal planning periods, many of the reforestation actions that occur in the forest can be collapsed into a single decision variable. Defining a planting action to occur at _AGE = 0 is unnecessary. You could just as easily consider it part of the clearcut action and assume the transition to the planted regen condition.

If you always plant a stand following harvest, regardless of different site preparation steps or planting densities, you may not need a decision variable for planting. Instead, you could rely on REGIMES. Many of my clients have different treatment and cost structures for different site conditions, but these are all handled through prescriptions in the REGIMES section. The important thing to remember is that there is a single decision variable for each alternative. 

Why is this important? Every action results in two outcomes: either the action is performed, or it isn't. If you model something that is required through actions, you need to add constraints to force the outcome you want. This is very inefficient because you add to the number of decision variables and non-zero elements, and constraints add rows.

Consider Your Transitions

The trickiest part of any conceptual model is predicting how a future stand will behave following a final harvest. If everything gets planted to a single species, it is straightforward with a single 100% transition to a regenerated development type (DevType). But what about plantation failures? Shouldn't we model those? You may have a good handle on how often plantation failures occur, but I'm betting you can't predict which harvest areas will fail. Why? Because if you could predict them, you'd change your site preparation methods to avoid the failure.

Instead, transitions should focus on outcomes that reflect your management intent and that you can predict with certainty. If 2% of plantations fail on average, you can account for the area and cost to correct them with in-fill planting, without a transition to a failed state that would then require an action to correct it. Similarly, some stands regenerate in an over-stocked condition, and require precommercial thinning (PCT). Again, you can account for the area and cost of PCT without explicitly recognizing the transition to an overstocked state and the subsequent PCT to correct it. Your management intent is not to produce defective stands. You shouldn't bother modeling things that you do not have adequate data for.

A large number of my clients build "stand-based models", which feature the stand-ID from inventory as a theme in the model. But stand-based is only relevant for existing stands - once they are final harvested, they transition to a stratum based on a handful of stand characteristics like site quality, species, etc. But time and time again, I encounter models where they do not void the stand-ID after the final harvest in the TRANSITIONS section. This results in a combinatorial explosion of future decision variables that are completely unnecessary. The example below is from a model with a lot of commercial thinning options.


Column generation without collapsed stand-ID

For future yields, the stand-ID provides nothing to the objective function or constraints - it just increases the number of DevTypes that Woodstock has to keep track of. If you collapse the stand-ID after final harvest, you'll get the same answer in far less time. The example below shows that about 20% of the decision variables in later periods can be eliminated by collapsing the stand-ID.


Column generation with collapsed stand-ID

Yes, I've heard the arguments many times that you NEED the stand-ID for future stands, so you know the exact location of harvests 40 years into the future. Forgive my cynicism, but I doubt most of you follow a plan exactly for the first year, never mind the first decade or more.

Discussion

If you are running a model that you developed years ago, or worse, you inherited from someone else years ago, and that model is slow and cumbersome, maybe it is time to toss the baby with the bath water and start over. Starting fresh does require time, but how much time are you wasting waiting for a model that generates unnecessary decision variables, has transitions that are impossible to trace, and so on. 

Afraid of tossing the baby out with the bath water? Contact me!

If you need help revamping your model, or looking to start over from scratch, I'm more than happy to help. Give me a shout! Nobody knows more about Woodstock modeling!


Thursday, October 24, 2024

Spatial Planning - Operational Feasibility in Map Data

Introduction

In this article, we look at the issue of map data with an eye toward operational feasibility. If GIS map preparation is sloppy, you can end up with duplicates, overlapping polygons and/or gaps in the coverage. Moreover, we can make the situation even worse when we intersect these layers in GIS or try to group stands together using analysis area control.

Intersection Considerations

Best practices suggest that GIS layers represent similar types of information, such as stand boundaries, soil series or harvest units. A simple intersection of these types of information can result in a very large number of new polygons due to near-coincident lines. This can be a problem if available yield tables only represent a portion of those new combinations of attributes. So how do you avoid this? 


A harvest unit may incorporate multiple stands.



Stands that can be harvested together have similar ages. Outliers are not operable.

Coverages are hierarchical in nature. Property boundaries, surveyed features like right-of-ways, etc. should be considered inviolate, and trump any other coverage. A clipping function should be applied to ensure that only stands within the boundary extents are counted.

Stand boundaries have an inherent fuzziness to them such that sliver removal procedures should not cause too much change overall. For soil series, I recommend that instead of a topological polygon intersection, clients perform a spatial overlay, assigning the stand polygons the majority soil type within each boundary. That way, there is only one soil type per stand, with consistency in site quality and productivity. In other words, only one yield table per stand.

Finally, consider what I call my "1%/1000 acre rule": if a development type is less than 1% of the total land base or less than 1000 acres (whichever is smaller), it is too small for consideration in the model. For example, if you are modeling a property of 500,000 acres, does it make sense to spend resources on yield table generation for 150 acres (0.03%)? Even if it contains the most valuable timber, its contribution to the objective function is going to be negligible. For small projects, the thresholds may not work as-is, but the concept still holds. You can, of course, choose different thresholds, but you shouldn't just accept every possible permutation of thematic attributes that is presented to you.

Harvest Blocking Considerations

For harvest unit boundaries, focus on retaining stand boundaries as much as possible when you do layout. An arbitrary straight line between two similar stands often will not cause problems. However, if the stands are very different in age, the solver may find it difficult to schedule the resulting harvest unit due to operability criteria. Why would this happen?

If you are using Woodstock's analysis area control feature (AAControl), you need to be aware that the a harvest unit or block can only be assigned a single action (or prescription, if using Regimes). While the scheduling is at the block level, operability remains at the DevType level. That means if you combine stands where the operability conditions for each member polygon do not overlap (due to age or activity differences), the entire unit is never feasible to harvest. Some examples, you ask? How about when you include an area of reforestation in a unit that should be harvested? Or you combine two stands for commercial thinning where one portion must be fertilized and another does not?

A Software Fix for Sloppy Work

When you set up analysis area units (AAUnits) in Woodstock, you want the model to choose a single activity for the entire unit, preferably all in one planning period. Early on when clients tried AAControl, a flurry of tech-support calls came in complaining that units were never being cut. 

The culprit was sloppy layout, with incompatible stands grouped together. The solution was a software fix where units can be cut without 100% of the area treated. With this new approach, the units were scheduled but some of the areas within those units were not harvested. Ultimately, Remsoft set the default threshold for operability of AAUnits at 20% of the total area. Users can change that with higher or lower limits up to 100%. 

Frankly, I think this default is way too low. Sure, the sloppy units don’t create tech support calls anymore, but if you only harvest a minor fraction of a unit, what is the point of using AAControl? Worse, I don’t recall reading anything about what happens to the stands that were not scheduled. So I decided to investigate this operational feasibility issue for myself.

How to Control Operability of AAUnits

You activate AAControl and specify the threshold for operability of harvest units in Woodstock in the Control Section.

*AACONTROL ON ; minimum operable area is 20% of the unit are
*AACONTROL ON aCC ; schedule only the aCC action using AAControl
*AACONTROL ON 99% ; minimum operable area is 99% of the unit area

Remember, the default operability threshold is 20%. If you specify a threshold of 100%, all member polygons must be operable at the same time for Woodstock to create a decision variable for the unit. However, if some part of the unit is always inoperable, the unit will never be harvested. Conversely, if you accept the default, at least 20% of the area of a unit must be operable in order for Woodstock to generate a harvest choice for it. But that choice only includes the operable stands! The inoperable ones are omitted if that choice is part of the optimal solution. Look at the schedule section for omitted DevTypes:

  0.017831 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.2 100.0%  
 31.863753 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.6 100.0%    
  0.034987 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.3 100.0%     
 68.775675 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.4 100.0%     
  1.688875 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.7 100.0%     
  5.554339 aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.5 100.0%      
  0        aCC 4 _Existing _RX(rxV) |Aaunit:U343| ;B982.1 100.0% Area = 0.77953                                                                 not operable in 4

The last record shows that 0.77953 acres of unit U343 is not operable in period 4, so it was not harvested, even though 100% of it was scheduled as part of unit U343!

What Happens to the Inoperable Parts?

The example above comes from a solution where I set the AAControl threshold to 99%. Since the unit is larger than 100 acres and the inoperable portion is less than 1 acre, clearly the unit meets the threshold for operability.

The inoperable areas are part of a larger development type (DevTYpe) that is split between two harvest units (U343 and U338). If we scan for the inoperable DevType acres later in the harvest schedule, we only find the acres in unit U338, not the residual from U343. The residual part of U343 never appears in the schedule section again. Note, this is not due to a constraint where only harvest units count. Also, I cannot say for sure that this happens in every model, but I have seen this behavior many times over the years.

If the number of acres that remain unscheduled is small, in terms of operational feasibility, it may not be a source of concern. For example, I’ve seen lots of spatial data with small fragments of older stands scattered among large areas of recent clearcuts, so it isn’t uncommon. However, at some point you do need to harvest those areas. A model that continues to produce unharvested fragments may be problematic from an operational feasibility standpoint.

How to Avoid “Forgotten Areas”

We just saw that including stands of very different ages can be problematic for scheduling final harvests. Clearly, it is best to group similar stands together in harvest units. If that is not possible, at least grow them all long enough that the youngest and oldest cohorts are both within the operability window. It may make perfect sense to hold an old stand fragment well beyond its normal rotation for operational feasibility. To make that happen, the old stand must be a significant part of the harvest unit, and you must also set a high operability threshold.

However, similar issues can arise with commercial thinning even if the ages are similar. For example, suppose that you generated thinned yield tables with defined age (14-17) and volume removal (> 50 tons/ac) requirements. The first stand is currently 15 years old but doesn’t reach the minimum volume bar until age 17. The second stand is currently 16 years old and is OK to thin immediately. But you created a unit that includes some of both stands. Now, when scheduling a thinning, there is no operability window to treat both stands simultaneously: by the time the first stand is operable in 2 years (age 17), the other stand will be too old (age 18).

17 109.37 rCT 2 _Existing _RX(rxTH) |Aaunit:AA099| ;B82.3 100.0%
16   0    rCT 2 _Existing _RX(rxTH) |Aaunit:AA099| ;B82.2 100.0% Area = 36.8500
                                                              not operable in 2

What's The Solution? Don’t Be Sloppy!

More care and attention to operability windows can avoid many problems. Don’t hardwire minimum volume limits when generating yield tables. What do I mean by this? Suppose you are generating yield tables for stands thinned from below to some basal area limit and you only generate a yield table if at least 50 tons/ac is produced. That works fine for harvesting stands, but when you use AAControl, that might cause some of your units to no longer be viable.

Instead, generate the tables accepting whatever volume results from the thinning. You can always add a minimum volume requirement to the *OPERABLE statement in Woodstock to avoid low-volume thinning. But, if there is no yield table for the low-volume stands, you can’t create one after the fact. Also, don’t set the threshold too high! Although the DevTypes in the unit determine the operability for an AAUnit, another portion of the unit can raise the average volume above the threshold for operational feasibility.


Magenta-colored stands do not meet operability threshold

For example, in the figure above, the magenta-colored polygons in the unit shown do not meet the 20 MBF/ac threshold desired. However, the blue areas exceed the bar by a significant amount. In fact, the overall average for the unit easily exceeds 20 MBF/ac. So, rather than set the minimum volume needed to 20 MBF/ac, maybe set it at 15 so the low volume portions of the block can still be harvested. You can still apply a constraint requiring at least 20MBF/ac harvested on average.

Suggestions for Better Operational Feasibility

Be careful with GIS overlays!

  • Avoid the creation of near-coincident lines and slivers.
  • Use spatial overlays to assign soil attributes to stand polygons.
  • Follow stand boundaries as much as possible when delineating harvest units.
  • Split stands only when necessary and into significant portions, not slivers.

Review proposed units to make sure they make sense!

  • Don’t combine incompatible stands within the same unit (age 5 + age 200 is not a good combo)
  • Provide yield tables and timing choices for member DevTypes that overlap in operability.
  • Remember that DevType operability determines unit operability, not the weighted average of the unit. Choose appropriate thresholds.
  • Don’t accept the Woodstock default of 20% for AAControl! If you can't harvest the majority of a harvest block, then the block is poorly laid out. Fix it!

Having troubles with spatial planning? Contact Me!

If you are concerned about the operational feasibility of your harvest planning models, give me a shout. We can schedule a review and work on some improvements that will provide more confidence in your results. 

Constraints and Feasibility

All Constraints Are Equal...        But Some Are More Equal Than Others Apologies to George Orwell, but the way some people view constraints...