Monday, April 29, 2024

Regimes - Why Aren't People Using Them?

Introduction

If memory serves me correctly, Remsoft introduced the Regimes module back in 2009. At first, it did not generate a lot of interest, despite criticisms from some circles that Model I was better than Model II. It was an extra module to license, and frankly, it wasn't the easiest syntax to wrap your head around. But times have changed!

Regimes is now part of the base Woodstock license. And while the syntax isn't the easiest thing to master, I've come to appreciate how Ugo was able to incorporate Model I structure into Woodstock's inherent Model II framework. As a user of FORPLAN, I remember how meshing Model I and Model II in a single model resulted in a very clumsy interface. It was no surprise that few people used FORPLAN v.2. But it has been 15 years since Woodstock Regimes was released, and yet I still don't see a lot of clients using it. This poll I conducted confirms that:

Poll conducted week of April 22, 2024

Why Use Regimes?

In Woodstock, when you specify an action, you provide the operability criteria (development type + condition) that allows the interpreter to generate a decision variable for each DevType-condition combination. If there are 5 operable DevTypes and 5 conditions (e.g. age) when each DevType can be harvested, the interpreter will generate 25 decision variables. The beauty of Regimes is that it allows you to create decision variables for a sequence of events, rather just a single activity. 

So, for what kinds of forest management decisions are Regimes suited? I've seen many models in the US South where commercial thinning is always followed by a fertilization one year later. While the timing of the thinning is a decision, the fertilization is not. Before Regimes, modeling CT + fert required inventory outputs with somewhat unintuitive DevType masks to make it work and report the fertilization acres in the correct period:

*SOURCE .MASK(_THx(14)) @AGE(16) _INVENT _AREA
; add 1 to age because of age counter

Additionally, by relying on an inventory output, it makes your model run slower.

*REGIME rThin
*OPERABLE rThin
   .MASK(_THx(00)) _AGE >= 13 AND _AGE <= 18

 *PRESCRIPTION rxTF (thin and fert)
   _RXPERIOD _ACTION   _ENTRY 
       0       aTH    _INITIAL
       1       aFT     _FINAL

Using a regime and prescription, reporting acres treated relies only on action-based outputs and so your model runs faster.

What About Reforestation?

The poll numbers are interesting for a few reasons.

Two-thirds of the respondents have specific planting actions in their models. From a conceptual modeling perspective, this implies that there are alternatives for each harvested area (timing, species, planting density, etc.).

No one simply attributes reforestation acres and costs to harvest actions. I know there are many Woodstock users in Canada, and few of them use 1-year planning periods. Again, this implies reforestation alternatives for each harvested area.

Roughly one-third of the respondents are using regimes to model reforestation. (Disclosure: respondents work for client organizations for which I developed models recently).

Over the last 20 years, I've witnessed a shift from silvicultural evaluation models to highly prescriptive models. By this I mean that 20 years ago, a lot of the models I built evaluated multiple planting densities for each harvested area. As a result, there needed to be multiple planting actions to represent the alternatives. And because planting was a choice, there needed to be constraints to enforce reforestation policies.

These days, however, while there may be different reforestation prescriptions across the forest, there is generally only one prescription for each harvest area. And in that case, there really isn't a need to have separate decision variables for planting. Instead, one or prescriptions in a reforestation regime that begins with final harvest is all that is required. Fewer decision variables = faster matrix generation and solution. And an added bonus is that the reforestation constraints may not be required either.

So Why Haven't Things Changed?

While I was at Remsoft, we always taught reforestation as a distinct activity from harvesting in the Modeling Fundamentals course. I suspect that many of the analysts we taught just replicated that conceptual modeling framework in the models they developed. And my experience has been that organizations rarely revamp models once developed.

Because models are rarely scrapped and rebuilt, I suspect most analysts these days inherited models built by some predecessor (who probably has now left the organization). There is never time to redesign a model so the same structure persists, with annual updates to the AREAS file (to account for depletions) and yield tables.

Ready for a Change? Contact Me!

If your models haven't been updated in a while, maybe it is time to rethink things. Together, we can design a new conceptual framework that is efficient and easier to debug if problems arise. And if you'd like to apply some targeted training (for say, Regimes or conceptual modeling) we can do that too!

Sunday, April 21, 2024

What To Do When Things Go Wrong

Introduction

I’ve been using Woodstock for a long time… a very, VERY long time. So, you might think I never get stumped or encounter oddball results. And you would be wrong. One of the things I miss most about working in a team environment is getting help from a colleague on a Woodstock model that is giving me grief. I recall working on a model build project for a client. The land base was relatively small but there were lots of details. This model required the use of the ALLOCATION and REGIMES modules, and the OUTPUTS section was quite extensive. A second set of eyes could have saved me hours of frustration.

The model tracked a lot of silvicultural costs, and many of those costs occurred in the same planning period. Since this was a new model build, I could define my own outputs rather than using inherited code. I did not want to simply create outputs for each one and then sum them because all those nodes will slow down matrix generation. Instead, I relied on summary coefficients in the YIELDS section to streamline the OUTPUTS section. The resulting matrix generation was very quick. As a test, I compared the values of the client summary outputs to my new outputs based on summary yield coefficients. Not surprisingly, I got the same answers and so I sent off the initial model to the client for review. I got back a simple question.

Why is the Objective Function Value Different than the Calculated NPV?

Sure enough, the objective function value (ordREV – ocdCOST) from the solver was about $5000 higher than my calculated NPV. I reviewed all of my outputs and verified that each one was triggered by the correct action. I also made sure that the yield coefficients were also correct. However, I did not find any errors and my attempts using various rewrites of the output specifications yielded no new insights. I was stumped. So I thought back on my process to see where things stopped working.

I ran one-line AREAS files to test various regime combinations, and these all produced consistent results between the objective function and output calculation. Next, I wrote a constraint to force the choice of a different prescription, and that too worked as expected. It was only when I incorporated a forced activity in the LPSCHEDULE that the results differed. The actions were carried out properly but deliveries in the AO Detailed report looked odd – all of the forced activities that were part of REGIMES had no development types associated with them.

I sent my simplified model off to Remsoft technical support and within a day I received back an email from Ugo. It turns out that there was a bug in Woodstock. I had created the LPSCHEDULE section using the Spatial Woodstock menu option and it wrote the LPSCHEDULE in the newer “Treatments” syntax:

.MASK() Age Area RegimeCode(Period,_RX(RxCode),Transition#)

Woodstock did not handle deliveries properly from the LPSCHEDULE, hence the different objective function values. As a workaround, Ugo told me that if I used the older “Stanley” syntax, the LPSCHEDULE would be processed properly:

.MASK() Age Area RegimeCode Period _RX(RxCode)

Ugo also told me that the bug had been fixed in the subsequent release. Problem solved! But the experience taught me a few important lessons.

Lessons Learned

Don’t waste a lot of time on something that seems to be eluding you. If you have a colleague you can ask for help, do it. Often, they can spot a problem in code that your brain refuses to see. It is also better than thinking you’ve gone insane.

Check ALL your work. I should have spotted the objective function issue before sending it to the client. But I assumed that the model was working properly because it had worked fine before I added any constraints (i.e., LPSCHEDULE).

Don’t assume it is necessarily your fault - software bugs are a fact of life. Once I identified the issue with my model, I created a small instance of the model that replicated the problem and sent it off to Remsoft technical support with a detailed explanation of the problem. That will get you faster solutions than if you just send a large, complicated model with no explanations as to what the problem is or what you have tried.

Do You Need a Second Pair of Eyes? Contact Me!

If your model is misbehaving (calculation errors, memory errors, etc.) then by all means contact Remsoft tech support. Otherwise, if your model is slow, or you want to implement something new and you are not sure how to do it, give me a shout. There are several alternative engagements we could undertake to keep you and your team productive.

Monday, April 15, 2024

Improve Model Efficiency - Part 3

Think Carefully About Outputs

When I am contacted about potential work, the problem is often related to slow Woodstock model performance. While performance can be related to either matrix generation and solution, or schedule execution and report writing, most of the concern is for the former. In both cases, however, the root cause is usually the same: the OUTPUTS section. Poor performance can be due to a number of factors, including:

  1. the type of outputs that are defined,
  2. the number of outputs in the model, and
  3. the level of detail required in reports.

Let’s consider each of these factors.

Output Types

All Woodstock modelers know that there are two main types of outputs: action-based and inventory-based. Action-based outputs are those that are triggered by an action you’ve defined. During matrix generation, Woodstock enumerates the timing choices associated actions, and the total number of output coefficients going into the matrix is simply the # of timing choices times the number of yield coefficients to be reported.

With inventory-based outputs, there is no action trigger. Rather, the matrix generation enumerates every development type class that potentially exists in each planning period, and the total number of output coefficients going into the matrix is the # of development type classes times the number of yield coefficients associated with each development type. While both timing choices and development type classes increase in number as each period is evaluated, there are always fewer timing choices than development type classes. As a result, the execution time devoted to inventory-based outputs will be significantly more than action-based outputs.

Output Nodes

Another factor to consider for both types of outputs is the # nodes associated with them. An output node simply refers to the number of triggering action-yield coefficient combinations needed to report the output properly. Consider this syntax:

*OUTPUT oaCC Clearcut area (ac)   
*SOURCE .MASK() aCC _AREA
*OUTPUT oaTH Thin area (ac)
*SOURCE .MASK() aTH _AREA
*OUTPUT oaHARV Harvest area (ac)
*SOURCE oaCC + oaTH

The node number associated with the first two outputs is 1 because there is just a single triggering action and a single yield coefficient (_AREA = 1). For the output oaHARV, the node number is 2 because you add the two nodes from the summed outputs. But things get much more complicated when you start dealing with prices and costs:

*OUTPUT ocSP Site prep cost ($)
*SOURCE .MASK() rCCPL(aSP) ycSP
… 
*OUTPUT ocPLT Planting cost ($)
*SOURCE .MASK() rCCPL(aPL) ycPLT

*OUTPUT ocRLW Woody release ($)
*SOURCE .MASK() rCCPL(aRLW) ycRLW
*OUTPUT ocSILV Silviculture costs ($)
*SOURCE ocSPC + ocSPB + ocSPM + ocPLT + ocINT + ocRLC + ocRLW

Clearly, the number of nodes can be much higher if there are multiple activities contributing to a single output. Harvest revenues are even worse, because they can originate with multiple harvest actions, with multiple volume coefficients and multiple prices. Unfortunately, if you have lots of harvest types and lots of products, your node numbers will be high.

Number Of Outputs

The sheer number of outputs in a model can be large, even without overkill. However, some of the worst offenders of model overkill typically include multiple versions of the same thing. For example, I’ve seen models with harvest volume reported by species, by ownership class, by tract, by age class, etc. These reporting outputs are usually not constrained and therefore have no impact on matrix generation. However, they can bring schedule execution and reporting to a crawl because each output has to calculated independently. Even worse, the same outputs may also appear in multiple reports (_ALL report plus customized reports).

Level of Detail

Most Woodstock modelers are aware of theme-based outputs. These can be very helpful when an output needs to be reported by species, tract, etc., because they reduce the overall number of outputs needed. But there is a tendency among many modelers to include redundant outputs. For example, I’ve seen many instances where an output is reported as a sum of outputs AND as an output of summed yield coefficients:

*OUTPUT oqTOT1 Total clearcut volume (m3)
*SOURCE .MASK() aCC yiPWD + .MASK() aCC yiCNS + .MASK() aCC yiSAW
*OUTPUT oqTOT2 Total clearcut volume (m3)
*SOURCE .MASK() aCC yiTOT

The first version is undesirable because of nodes anyway, but it really serves no purpose unless the sum in the YIELDS section is done incorrectly in the second version.

Tips To Improve Performance

  • Avoid inventory-based outputs wherever possible. If you use an inventory-based output to report pro-rated activities that are hard to predict, consider using REGIMES instead. For example, precommercial thinning is often used in young stands to correct overstocking. However, since we can’t really predict overstocking very well, we may assume that a fixed proportion of stands receive a PCT at a given age:

    *OUTPUT ocPCT PCT cost ($)
    *SOURCE .MASK() @AGE(16) _INVENT ycPCT

    This approach works and, to be honest, it was the only practical way to model PCT before the REGIMES module came along. But now, it is simpler and more efficient to use a Regime and Prescription to convert the inventory-based output to an action-based one:

    *PRESCRIPTION rxCCPL Clearcut, plant, commercial thin 
         _RXPERIOD _ACTION  _ENTRY   yAREA
             0       aCC   _INITIAL   1.00
             0       aSPC      -      1.00
             1       aPLT      -      1.00
             3       aINT      -      1.00
             6       aRLW      -      1.00
            16       aPCT   _FINAL    0.25


    Instead of relying on the age of the DevType to determine if a PCT should be done, the prescription states that PCT is done 15 periods after planting.
     
  • Avoid summary outputs. Where possible, use sums of coefficients in the YIELDS section to perform summary operations.

  • Avoid defining unnecessary outputs. Use theme-based outputs as needed but be judicious. For example, many modelers report on acres by age-class using outputs. This is totally unnecessary. The same information can be had from the built-in _CONDITION report. By design, this report details every development type class in every period. Using a Woodstock table filter, an SQL query or a pivot table in Excel, you can glean the same details without bringing your model execution to a crawl by reporting dozens of inventory-based outputs.

    Also, don’t use a theme-based output where there are many thematic attributes, but you only wish to constrain or report on just a few of them. Instead, define specific outputs for the constrained/reports cases. Otherwise, the matrix generator will inject all yield coefficients associated with each attribute into the matrix, bloating the number of non-zero elements, and slowing down your LP solver.

Want More Advice? Contact Me!

If you need a model audit or some in-house training to brush up your modeling skills, give me a shout. I’ll be happy to help.

Monday, April 1, 2024

Improve Model Efficiency - Part 2

Avoid Unintended Outcomes in Spatial Planning

Today, we will investigate unintended consequences in forest planning models. In a typical project, analysts work with a project team to learn what kinds of silviculture need to be modeled. Once established, the silviculturist and biometrician will establish the operability criteria for those actions. Often, the team will also provide you with a list of stands that they plan to harvest in the first period or two. 

As you construct the model, you soon realize that some of these preplanned activities break the rules. For example, there is a 13-year-old stand scheduled for thinning, despite the rule requiring that commercial thinning be carried out between ages 15 and 20. Despite the frustration it causes, this type of thing happens all the time. It is just a matter of extenuating circumstances – things that models are ill-equipped to handle at face value.

So, how do you handle this as an analyst? If you lower the threshold for thinning to 13 instead of 15, your problem goes away, but a new one takes its place. You will generate many more potential timing choices without corresponding yield tables for them. To avoid that, you can restrict them with an additional volume requirement. But that is somewhat inefficient and violates the intent to thin at ages 15 or older, in general.

Special Actions and Prescriptions

Instead, you can create a new variant of the action, which is only available in the first planning period. There are two ways to ensure that it is only available in period 1. You can use _CP = 1 in the operability statement, or you can use *EXCLUDE in the OPTIMIZE section to exclude the special action from being generated in periods 2 onward.

Of the two approaches, I prefer the first because you don’t need a new action code. If you are precise in your operability statement, only the stand that has been identified for harvest is affected. And there’s the rub: how do you know that you targeted a particular stand (polygon)? Here’s an example:

Stands Layer with Stand-IDs Labeled

Harvest Units Layer with Stand-IDs Labeled

In the harvest units layer, look for harvest units 215 and 225 (lime green), scheduled for the first planning period. You also notice that they are both from the same stand 5977 (red). Technically, they are not eligible for harvest under the standard clearfell operability, so you write a special *OPERABLE statement to allow those blocks to be harvested. Here is the code:

LANDSCAPE
*THEME StandID
5977
5978
5980
5981
5982
6626
6638
6639
*AGGREGATE CCP1
5977 6639

ACTIONS
*ACTION aCC Y Clearcut
*OPERABLE aCC
? ? ? ? ? ? ? ? ? CCP1 _CP = 1

Sure enough, the units 215 and 225 are harvested in period 1. But you’ve also generated harvest decision variables for units 10 and 1416, because they are also part of the same stand. This is not what was intended – only units 215 and 225 were supposed to be harvested.

Understanding Forced Choices

Remember that Woodstock does not schedule based on geographic information. It only sees pools of area that are eligible to be harvested. Even using an LPSchedule file does not necessarily mean that a particular polygon is harvested. True, the specified area associated with the development type is treated, regardless of any *OPERABLE statement. But technically, the solution only guarantees that the right number of acres is harvested. If you truly want to guarantee a particular polygon is harvested, you need to identify it within the landscape section somehow.

From our example above, a simple fix is to include a theme that identifies forced choices (e.g., CC for clearcuts, TH for commercial thins, NA for not applicable). Then, you could explicitly write the operable statement as follows:

LANDSCAPE
*THEME StandID
5977
5978
5980
5981
5982
6626
6638
6639 
*THEME Planned Activities
CC
TH
NA

ACTIONS
*ACTION aCC Y Clearcut
*OPERABLE aCC
? ? ? ? ? ? ? ? ? 5977 CC _CP = 1

Because the last thematic attribute is CC for harvest units 215 and 225, they are eligible for harvest. The other parts of stand 5977 do not have CC in the last theme and are not eligible for harvest. Since no other polygons have the combination of standID = 5977 and planned activity = CC, you know that only those explicit polygons are harvested.

Another approach might be to include harvest units as a theme. While this will work, you are dramatically increasing the number of development types. Unless you have a reason to include ALL planning units, this may be overkill. Remember, your goal in modeling is efficiency!

Summary

The main takeaway from this example is to be precise in your operability specifications to avoid unintended consequences. Don’t be misled by the idea that a stand-based or unit-based harvest model is necessarily equivalent to particular polygons. Unless there is a 1:1 correspondence between polygons and development type classes, there may be alternative optimal solutions that include other polygons. When you map out the solution, it may not meet your expectations.

Struggling With Spatial Planning Issues? Contact Me!

If you’d like a new set of eyes to look over your model(s), give me a shout to schedule a model audit or an on-site training review. We can use your own models to ensure concepts are relevant and suggestions are immediately useful to your team.










Why are MIP models difficult to solve (or not)?

Introduction I recently joined a conversation about why a mixed-integer programming (MIP) problem is so much harder to solve than a regular ...