Think Carefully About Outputs
When I am contacted about potential work, the problem is often related to slow Woodstock model performance. While performance can be related to either matrix generation and solution, or schedule execution and report writing, most of the concern is for the former. In both cases, however, the root cause is usually the same: the OUTPUTS section. Poor performance can be due to a number of factors, including:
- the type of outputs that are defined,
- the number of outputs in the model, and
- the level of detail required in reports.
Let’s consider each of these factors.
Output Types
All Woodstock modelers know that there are two main types of outputs: action-based and inventory-based. Action-based outputs are those that are triggered by an action you’ve defined. During matrix generation, Woodstock enumerates the timing choices associated actions, and the total number of output coefficients going into the matrix is simply the # of timing choices times the number of yield coefficients to be reported.
With inventory-based outputs, there is no action trigger. Rather, the matrix generation enumerates every development type class that potentially exists in each planning period, and the total number of output coefficients going into the matrix is the # of development type classes times the number of yield coefficients associated with each development type. While both timing choices and development type classes increase in number as each period is evaluated, there are always fewer timing choices than development type classes. As a result, the execution time devoted to inventory-based outputs will be significantly more than action-based outputs.
Output Nodes
Another factor to consider for both types of outputs is the # nodes associated with them. An output node simply refers to the number of triggering action-yield coefficient combinations needed to report the output properly. Consider this syntax:
*OUTPUT oaCC Clearcut area (ac)
*SOURCE .MASK() aCC _AREA
*OUTPUT oaTH Thin area (ac)
*SOURCE .MASK() aTH _AREA
*OUTPUT oaHARV Harvest area (ac)*SOURCE oaCC + oaTH
The node number associated with the first two outputs is 1 because there is just a single triggering action and a single yield coefficient (_AREA = 1). For the output oaHARV, the node number is 2 because you add the two nodes from the summed outputs. But things get much more complicated when you start dealing with prices and costs:
*OUTPUT ocSP Site prep cost ($)
*SOURCE .MASK() rCCPL(aSP) ycSP
…
*OUTPUT ocPLT Planting cost ($)
*SOURCE .MASK() rCCPL(aPL) ycPLT
…
*OUTPUT ocRLW Woody release ($)
*SOURCE .MASK() rCCPL(aRLW) ycRLW
*OUTPUT ocSILV Silviculture costs ($)
*SOURCE ocSPC + ocSPB + ocSPM + ocPLT + ocINT + ocRLC + ocRLW
Clearly, the number of nodes can be much higher if there are multiple activities contributing to a single output. Harvest revenues are even worse, because they can originate with multiple harvest actions, with multiple volume coefficients and multiple prices. Unfortunately, if you have lots of harvest types and lots of products, your node numbers will be high.
Number Of Outputs
The sheer number of outputs in a model can be large, even without overkill. However, some of the worst offenders of model overkill typically include multiple versions of the same thing. For example, I’ve seen models with harvest volume reported by species, by ownership class, by tract, by age class, etc. These reporting outputs are usually not constrained and therefore have no impact on matrix generation. However, they can bring schedule execution and reporting to a crawl because each output has to calculated independently. Even worse, the same outputs may also appear in multiple reports (_ALL report plus customized reports).
Level of Detail
Most Woodstock modelers are aware of theme-based outputs. These can be very helpful when an output needs to be reported by species, tract, etc., because they reduce the overall number of outputs needed. But there is a tendency among many modelers to include redundant outputs. For example, I’ve seen many instances where an output is reported as a sum of outputs AND as an output of summed yield coefficients:
*OUTPUT oqTOT1 Total clearcut volume (m3)
*SOURCE .MASK() aCC yiPWD + .MASK() aCC yiCNS + .MASK() aCC yiSAW
*OUTPUT oqTOT2 Total clearcut volume (m3)
*SOURCE .MASK() aCC yiTOT
The first version is undesirable because of nodes anyway, but it really serves no purpose unless the sum in the YIELDS section is done incorrectly in the second version.
Tips To Improve Performance
- Avoid inventory-based outputs wherever possible. If you use an inventory-based output to report pro-rated activities that are hard to predict, consider using REGIMES instead. For example, precommercial thinning is often used in young stands to correct overstocking. However, since we can’t really predict overstocking very well, we may assume that a fixed proportion of stands receive a PCT at a given age:
*OUTPUT ocPCT PCT cost ($)
*SOURCE .MASK() @AGE(16) _INVENT ycPCT
This approach works and, to be honest, it was the only practical way to model PCT before the REGIMES module came along. But now, it is simpler and more efficient to use a Regime and Prescription to convert the inventory-based output to an action-based one:
*PRESCRIPTION rxCCPL Clearcut, plant, commercial thin
_RXPERIOD _ACTION _ENTRY yAREA
0 aCC _INITIAL 1.00
0 aSPC - 1.00
1 aPLT - 1.00
3 aINT - 1.00
6 aRLW - 1.00
16 aPCT _FINAL 0.25
Instead of relying on the age of the DevType to determine if a PCT should be done, the prescription states that PCT is done 15 periods after planting.
- Avoid summary outputs. Where possible, use sums of coefficients in the YIELDS section to perform summary operations.
- Avoid defining unnecessary outputs. Use theme-based outputs as needed but be judicious. For example, many modelers report on acres by age-class using outputs. This is totally unnecessary. The same information can be had from the built-in _CONDITION report. By design, this report details every development type class in every period. Using a Woodstock table filter, an SQL query or a pivot table in Excel, you can glean the same details without bringing your model execution to a crawl by reporting dozens of inventory-based outputs.
Also, don’t use a theme-based output where there are many thematic attributes, but you only wish to constrain or report on just a few of them. Instead, define specific outputs for the constrained/reports cases. Otherwise, the matrix generator will inject all yield coefficients associated with each attribute into the matrix, bloating the number of non-zero elements, and slowing down your LP solver.
Want More Advice? Contact Me!
If you need a model audit or some in-house training to brush up your modeling skills, give me a shout. I’ll be happy to help.
No comments:
Post a Comment