Introduction
Someone once asked me, “When someone asks you to look at their model and make suggestions, what do you look for?” I thought this might be a good topic for a series of blog posts on modeling best practices to make QA easier.
The first thing I ask a client is what prompted the request? Is the model running too slow? Does it take a long time to solve? Does the solver struggle with feasibility? In most cases, the usual answer is, “it takes a long time to generate and solve.” That starts me thinking about efficiency in terms of development types and decision variables. Cardinal rule #1: avoid code that is difficult to understand or hides meaning.
Flawed Characterization
Here is a recent example that I came across. The primary data set is a shapefile where stand boundaries, harvest system, management intensity, and road buffers have been intersected. The resulting polygons had Woodstock theme fields populated according to the LANDSCAPE section attributes.
;*A Th1 Th2 Th3 Th4 Th5 Th6 Age Area*A 1 Y G NA NA 00000 1 51.238 ; 2 polygons*A 1 Y C NA NA 00000 1 8.006 ; 1 polygons*A 1 Y G NA NA 00000 1 46.339 ; 1 polygons*A 1 Y G NA NA 83901 17 11.346 ; 1 polygons*A 1 Y C NA NA 83901 17 4.766 ; 1 polygons*A 1 N G NA NA 83901 17 4.977 ; 19 polygons*A 1 Y C NA NA 83902 14 53.185 ; 2 polygons*A 1 Y G NA NA 83902 14 13.172 ; 1 polygons*A 1 Y C NA NA 83904 18 48.816 ; 1 polygons*A 1 Y G NA NA 83904 18 6.009 ; 2 polygons*A 1 N G NA NA 83904 18 7.428 ; 19 polygons*A 2 Y G NA NA 83905 17 3.222 ; 2 polygons*A 1 Y C NA NA 83906 17 5.124 ; 1 polygons*A 1 Y G NA NA 83906 17 10.972 ; 1 polygons*A 1 Y C NA NA 83907 9 8.334 ; 2 polygons*A 1 Y G NA NA 83907 9 20.993 ; 1 polygons*A 1 Y C NA NA 83908 13 47.249 ; 1 polygons*A 1 Y C NA NA 83909 18 18.29 ; 1 polygons*A 1 Y G NA NA 83909 18 18.805 ; 2 polygons*A 1 Y C NA NA 83910 18 7.695 ; 1 polygons*A 1 Y C NA NA 83914 14 6.881 ; 2 polygons*A 1 Y G NA NA 83914 14 38.483 ; 1 polygons*A 1 Y G NA NA 83914 14 5.538 ; 1 polygons*A 1 Y C NA NA 83914 14 17.36 ; 2 polygons*A 1 Y G NA NA 83914 14 24.605 ; 2 polygons*A 2 N G NA NA 83914 14 1.087 ; 1 polygons*A 1 Y G NA NA 83916 16 7.076 ; 1 polygons*A 2 N G NA NA 83916 16 1.86 ; 1 polygons*A 1 Y G NA NA 83917 22 31.772 ; 1 polygons*A 2 Y C NA NA 83918 16 5.214 ; 1 polygons*A 2 Y C NA NA 83919 18 6.393 ; 1 polygons*A 2 Y G NA NA 83919 18 17.887 ; 1 polygons*A 1 N G NA NA 83919 18 4.168 ; 8 polygons*A 1 Y G NA NA 84001 16 10.713 ; 2 polygons
Did you spot the problem yet? Look at the records with the most polygons. What do you notice about them? They are the ones with Theme2 = N, and they are a minority of the area. Based on the map examples, one can only conclude that Theme2 = N corresponds to roads. So, why is this a problem?
Redundant Development Types
;*A Th1 Th2 Th3 Th4 Th5 Th6 Age Area*A 1 Y G NA NA 00000 1 51.238 ; 2 polygons*A 1 Y C NA NA 00000 1 8.006 ; 1 polygons*A 1 Y G NA NA 00000 1 46.339 ; 1 polygons*A 1 Y G NA NA 83901 17 11.346 ; 1 polygons*A 1 Y C NA NA 83901 17 4.766 ; 1 polygons*A 1 Y C NA NA 83902 14 53.185 ; 2 polygons*A 1 Y G NA NA 83902 14 13.172 ; 1 polygons*A 1 Y C NA NA 83904 18 48.816 ; 1 polygons*A 1 Y G NA NA 83904 18 6.009 ; 2 polygons*A 2 Y G NA NA 83905 17 3.222 ; 2 polygons*A 1 Y C NA NA 83906 17 5.124 ; 1 polygons*A 1 Y G NA NA 83906 17 10.972 ; 1 polygons*A 1 Y C NA NA 83907 9 8.334 ; 2 polygons*A 1 Y G NA NA 83907 9 20.993 ; 1 polygons*A 1 Y C NA NA 83908 13 47.249 ; 1 polygons*A 1 Y C NA NA 83909 18 18.29 ; 1 polygons*A 1 Y G NA NA 83909 18 18.805 ; 2 polygons*A 1 Y C NA NA 83910 18 7.695 ; 1 polygons*A 1 Y C NA NA 83914 14 6.881 ; 2 polygons*A 1 Y G NA NA 83914 14 38.483 ; 1 polygons*A 1 Y G NA NA 83914 14 5.538 ; 1 polygons*A 1 Y C NA NA 83914 14 17.36 ; 2 polygons*A 1 Y G NA NA 83914 14 24.605 ; 2 polygons*A 1 Y G NA NA 83916 16 7.076 ; 1 polygons*A 1 Y G NA NA 83917 22 31.772 ; 1 polygons*A 2 Y C NA NA 83918 16 5.214 ; 1 polygons*A 2 Y C NA NA 83919 18 6.393 ; 1 polygons*A 2 Y G NA NA 83919 18 17.887 ; 1 polygons*A 1 Y G NA NA 84001 16 10.713 ; 2 polygons*A 0 N G NA NA NULL 1 19.52 ; 48 polygons