9
Database Setup
Overview
Instructions for Deriving Flow Direction Grids
Instructions for Setting Up and Checking Regression
Database Delivered by MTU
Overview
The database required to support threshold runoff calculations nation
wide is complex. The two most significant components of the database
that were not readily available at the start of this project (January,
1999) were projected flow direction grids for each RFC, and a digital database
of terrain and climatic characteristics required to support USGS flood
frequency calculations. Flow direction grids were derived using DEM
and RF1 data provided by NOHRSC, and data layers required for USGS flood
frequency calculations were compiled at Michigan Technological University.
A significant amount of work is required to (1) derive usable and accurate
flow direction grids, and (2) check the database delivered by Michigan
Tech against USGS Water Resources Investigations Report 94-4002 and make
modifications necessary to conform with the specifications required by
the Avenue programs developed at HRL. The threshold runoff Avenue
programs are designed to be fairly general. This requires a database
which conforms to strict specifications that must be carefully checked.
Some of the tasks are tedious but definitely worthwhile.
Deriving Flow Direction Grids
A summary of the flow direction pre-processing and some of the reasoning
behind the revisions to earlier plans is summarized here, followed by more
detailed instructions on how to execute this pre-processing procedure.
Summary of pre-processing steps for importing, projecting, and stream
carving:
1. Convert NOHRSC river line files (modified RF1), elevation raster,
and slope raster files into Arc/Info format.
2. For elevation and slope grids, reassign values of 32768 as
NODATA values.
3. Obtain outline of RFC boundary from NOHRSC
4. Project elevation grid, slope grid, river lines, and RFC boundary
into the Albers projection
5. Buffer the projected RFC boundary file.
6. Use the buffered RFC boundary to create a mask and clip out
portions of the elevation and slope grids to analyze. This reduces
processing times for later steps and helps to ensure that the vector streams
cross over the edge of the elevation grid to the border areas with NODATA
(a requirement for stream carving).
7. Fill the masked elevation grid.
8. Rasterize the RF1 rivers.
9. Carve stream cells into the land surface.
10. Introduce a slight elevation gradient in flat areas to force
flow directions in the correct direction. This involves running an
AML which calls a C program (flatareas5) and several Arc/Info Grid functions.
11. Fill sinks again.
12. Compute flow direction and flow accumulation.
13. Use semi-automated procedures to correct the flow direction
grid as explained below.
A variant of processing steps 1-12 were initially run at Michigan Tech
University for all RFCs. As time has passed, several reasons to refine
this procedure have been discovered; therefore, re-running some of these
steps is required. It should not be a huge burden on resources to
repeat these processing steps because they are automated and the 15 arc-second
grids are relatively small in the DEM world. The computational time
for processing steps 1-12 is on the order of 12 hours (depends on many
factors), but the most computationally expensive processes can be run overnight.
There are several reasons for refining this procedure that revealed
themselves by working with and checking the data at HRL.
- In the original procedure (Spring 1999), the buffer length selected
to clip grids was 30 cell widths outside the RFC boundary or 12000 m.
This buffer is not long enough at the edge of one portion of ABRFC because
it does not allow the processing routines to account for the effects of
the closest RF1 stream outside of the RFC basin. The presence or
absence of this particular river segment just outside ABRFC influences
the assignment of flow directions on a relatively flat area near the RFC
basin divide (this may be an artifact of poor DEM or the filling process).
To avoid this problem, a larger buffer of 50000 m is selected for re-processing.
- Some of the NOHRSC DEMs use more than one code to denote data
outside the study area. The automatic routines used in the first
round of processing assumed that the code 32768 was the only "NODATA" code
used. The code 65436 was also used in NODATA in MARFC. This
means restarting from step 2 is required for some RFCs.
- It is worthwhile to refine the stream carving method so that
instead of setting stream cells to 0 as in the original method, land surface
cells are raised relative to river cells. The current differential
used to raise the land surface cells is 100 m. Setting river cells
to 0 may cause more serious problems than using a constant elevation differential
in situations where two headwater arcs that drain to different basins originate
in neighboring DEM cells. Also, some land cells in coastal areas
are rounded down to zero elevation, thus it became difficult to distinguish
these cells from river cells using the old method. All ocean (or
large lake) cells have the value NODATA.
- For RFCs that border on large bodies of water (i.e. oceans or
Great Lakes) it may be necessary to edit (lengthen) the river arcs along
the coast. This is a manual procedure (explained below), but it goes
pretty quickly.
Latest Procedures (as of Dec. 2, 1999)
Note: These procedures require both Arc/Info 7.0 (or later) and ArcView
3.1 and Spatial Analyst 1.1 or later.
1. Extract the data for an RFC from the CD-ROM delivered by Michigan
Tech. Keep data for different RFCs in separate directories.
2. Run prepro1.aml. To run, follow the instructions
in the header of the prepro1.aml file. This AML may take many
hours to complete since it includes use of the "fill" Grid function.
The buffering step can also be expensive for coastal areas.
There is one interactive step that is required before the filling
occurs. A Grid graphics window will display a mask of the
watershed and the user is required to select the smallest box that
contains the entire area. After this step, it may be desirable
to let this AML to run over night (especially for large RFCs).
Prepro1.aml uses the project command and requires the projection
file geoalb83.prj to exist in the same directory.
3. If the RFC being processed borders large bodies of water (i.e.
oceans or Great Lakes) it will likely be necessary to edit the river arcs
along the coast. This is a manual procedure but it goes pretty quickly.
The steps to do this are as follows:
Load the the elevpmf Grid, projected RF1 Theme (rf1p), and the projected
RFC boundary Theme (rfcbndp) into an ArcView View. These Themes were
created by prepro1.aml.
Load the ThreshR extension.
All of the "rf1p" arcs point upstream rather than downstream.
It is useful to flip the direction of the arcs for for step 6 below.
To do this, make "rf1p" the active Theme and run ThreshR-Utility
--> Flip Arcs. This program will ask you to save the flipped
arcs as a new shapefile. Name the new file "rf1pf.shp." Attribute
information is transferred to rf1pf.shp with the exception of some standard
Arc/Info coverage attributes that are no longer needed in ArcView. For
this attribute transfer to work, the seventh field in the Arc/Info "rf1p"
file should be the Arc/Info User-ID field. This was the case for
the ABRFC data. If this is not the case, then the ThreshR-Utility
--> Flip Arcs script needs to be changed.
Note: I don't know with 100% certainty that all arcs will be pointing
upstream for all RFCs, so please check this before running this program.
An easy way to check this is to changed the Legend Symbol used to a line
with arrows rather than the default single line. If you zoom in,
you will see more arrows.
4. For coastal areas, extend the length of rf1pf lines when necessary:
This is done most easily doing using ArcView. Modify the legend
for the elevation grid (elevpmf) so that zero values have a distinctive
color from NODATA cells and a distinctive color for all cells greater than
zero. Make NODATA cells transparent. The goal is to lengthen
any rf1pf arcs that don't quite make it to the ocean. This is important
for stream carving, especially if a portion of the landscape is assigned
zero elevation. Add the RFC boundary Theme available from the NOHRSC
web site because this provides a clearer definition of the coastline than
the DEM and provides some guidance regarding the appropriate path along
which a river arc should be extended. Sometimes it is also helpful
to have an atlas handy to orient yourself.
Pan and Zoom along the coast to make the required edits. It is
much easier to add new arcs to rf1mod than it is to modify existing arcs.
See ArcView documentaion on how to edit an arc Shapefile.
5. Run prepro2.aml. To run, follow the instructions
in the header of the prepro2.aml file. This aml calls a C
program called flatareas5.c (must be compiled).
6. While prepro2.aml is running, ArcView scripts can be run to
identify potential problem points in the derived flow direction grid.
Potential problem points occur when two separate stream arcs come within
one 400 m grid cell of one another at a point that is far from a junction.
This happens quite frequently when using the RF1 data in conjunction with
a 15 arc-second DEM. The steps to do this are as follows:
- Load the most recent ThreshR Extension file into ArcView
- Add the rf1pf.shp theme that has its arcs pointing correctly downstream.
- Make a copy of the flipped and edited stream coverage (rf1pf.shp)
by selecting Theme --> Convert to Shape File (Add this shapefile to the
View). Make sure that either no arcs or all arcs are selected so that the
entire file will be copied.
- Run ThreshR-Utility
--> ID multi-arc Reaches
The goal here is to eliminate reaches that are composed of multiple
arcs. To do this, a new attribute is added to rf1pf.shp that contains
a unique value for each reach. (It turns out that this program works
for reaches with two arcs but may not work with reaches for 3 arcs so the
run may need to be repeated.) This attribute is used later with the
Avenue "Summarize" request to merge all multi-arc shape files. The
ID Multi-arc Reaches program adds an attribute to the rf1pf.shp called
"merge." There is a record in rf1pf.shp for each arc. For any
river reaches that are composed of multiple arcs, the merge attribute for
each arc in that reach is assigned an common integer value with other arcs
that make up that reach. For single arc reaches, the merge attribute
is 0. (Took 51 minutes for MARFC)
The program prompts the user to identify the river theme and a copy
of the river theme. Note that the river theme is modified by this
program. The original flipped coverage should have fewer records in its
attribute table than the copy when the program is completed (unless all
stream segments were originally only single arcs).
- Add a new integer field to the rf1pf.shp file called merge1, select
all zero records in "merge", and use the Calculate button to assign arcinf_id
to selected records, toggle the selection and assign "merge" plus the maximum
of arcinf_id to the selected records. Finally, run the program ThreshR-->
Merge Multi-arc Reaches to merge lines with a common "merge1" id value.
- Run ThreshR-Utility/Identify
Potential Problems (Time estimate for ABRFC: 14 min )
This program asks the user to:
- identify the river theme (mgline1.shp)
- identify the field containing a unique integer identifier
(Select the field "merge1")
- define how many cells should be used to define
the neighborhood of the outlet where we are not concerned about cells from
different segments being too close to one another. I used a default
value of 13 cells -- number should be odd if you want a
square neighborhood.
- identify the mask grid (rfcmask)
The program creates a grid with ones in the stream and zeros on the
land surface (I.e. "rivone1" this grid is useful for display purposes in
Step 7) and a point shapefile (i.e. "probzo1.shp") which identifies the
locations of potential problems. It is not known for sure if there
are any problems at these points until step 7.
Rivers with a relatively low drainage area threshold.
7. Manually check and edit the flow direction grid derived by
prepro2.aml.
This is the point at which manual checking and editing of the flow direction
grid is required. There are "trick" methods for editing grids in ArcView,
but I have found these methods relatively slow and tedious. I've also found
it difficult to do editing quickly when
using Arc/Info GridTools manually. Therefore, I have written an
AML (editgrid.aml) that makes editing the flow direction grids relatively
quick and painless. The feature that makes this AML so nice is that
the "griddirection" command is used to draw arrows showing the flow
directions. This makes it easy to see where flowdirection grid edits
may be needed.
Before using editgrid.aml, the shapefile probzo1.shp must be converted
to an Arc/Info point coverage and a grid which is the base2 logarithm
(This is needed to use the griddirection command -- see Arc/Info Help for
more explanation) of the flow direction grid must be created.
Arc: shapearc probzo1 probzo1
Grid: fdb2 = log2 ( fdmod )
Grid: fdb2i = int ( fdb2 ) /* convert to integer
grid and delete old grid
Grid: kill fdb2 all
The header of editgrid.aml must be changed to be consistent with the
input file names being used. The procedure for editing grid values
may appear tedious at first, but it becomes fairly easy once you get the
hang of it. When the editgrid.aml program is started, the user is
prompted to type a record number to start with. This means a record
number in the problem points coverage. This option is helpful if
you want to check a portion of the points, take a break and come back to
check the rest.
Before running editgrid.aml, type the command
Grid: gridedit edit fdb2i
For some reason, this command does not always work reliably when executed
within the AML.
For each potential problem point, gridded, streams are displayed in
white, flow direction arrows are displayed in red and digitized (RF1) streams
are displayed in blue. The potential problem points are labelled
and numbered with green. The display automaticall defaults to an
extent that is 10 cells on each side of the problem point in question.
Since this might turn out to be too close to understand the overall flow
picture, the user is first prompted with the question:
"Resize window? (9 moves to edit options):"
The user may type a number, representing the number of cells to define
on each side of the cell extent. (20 is interesting, 50 is getting
a bit big) Typing 9 moves the user to the next option:
"Enter edit value (9 when finished):"
The edit values to consider are (0 = E, 1=SE, 2=S, 3=SW, 4=W, 5=NW,
6=N, 7=NE).
If you quit the program normally, your changes will be saved and
the arrows you see on the screen will reflect your changes when you rerun
the program. If you exit out of the AML due to a mis-typed key stroke
and a subsequent error message, this is no big deal. Just restart
the AML and start with the last record you had previously worked on.
You should be able to scroll up in your terminal window to find the last
record in the problem points shapefile that you looked at before exiting.
If you exit out of AML due to an error, you may get the following message
when attempting to restart the AML.
"Cursor OUT_CUR already declared"
AML MESSAGE - Stopping execution of AML file due to ERROR condition
To correct this problem, type the following and then restart editgrid.aml.
Grid: cursor out_cur remove
Sometimes a cluster of problem points will exist in the same area.
Flow direction corrections only need to be made once for these areas.
Part of the reason that this program is so inflexible is that AML is very
quirky and it sometimes takes a while to debug. Avenue is far superior.
(Good example of the types of problems encountered -- MARFC, point 40
Why problems -- MARFC, 499 9)
Now that a good flow direction grid has been created, convert fdb2i
back into the 1, 2, 4, 8, 16, . . . . codes that Arc/Info can understand.
Grid: fdr = exp2 ( fdb2i )
Grid: fd = int ( fdr )
Grid: kill fdr all
Setting Up and Checking Regression Database
Delivered by MTU
To enable ThreshR Scripts work in all RFCs and make different types
of computations depending on the location of interest, strict ThreshR database
specifications must be met. Michigan Tech has collected and supplied
nearly all of the required data sources; however, there are numerous checks
and modifications that must be made to this database in order for the ThreshR
software to run correctly. This is a setup cost that is necessary
due to the complexity of the database, and the fact that the database specifications
have evolved over time. There may have been some way to reduce this
with more hands on interaction with the contractor. Up until now,
I have not had time to document some of the more tedious issues that I
have identified. The checks and modifications described here must
be made for each RFC.
Tasks required to complete ThreshR setup:
- Compute flow accumulation from the edited flow direction grid (output
file should be named fa).
- Compute flow length from the edited flow direction grid (output file
should be named fld).
- Delineate the RFC boundary (call this rfcbound.shp) using ArcView
and clicking on an outlet point.
- Create a point file of HRAP center points called (hrappts.shp).
This file should cover the entire RFC. The extension coord.avx can
be used to create this file. With a View active, click on HRAP -->
Create HRAP Center points with a View active and choose to create a new
point coverage in geographic coordinates based on HRAP extent. Specifying
HRAP extent requires the user to specify the lower left HRAPX and HRAPY
and the number of columns and rows. To determine the appropriate
geographic extent for an RFC, the program xmrgtoasc.c can be run on an
existing gridded threshold runoff file. The location of the gridded
runoff files for ABRFC on the HRL-NHDRs is /nonawips/usr/apps/ffg/files/abrfc/grro.
More
information on coord.avx and xmrgtoasc.c.
- Rename "elevpm" (created by prepro1.aml) to "dem" using the Arc RENAME
command.
- Make sure the slope grid is available (not delivered with the ThreshR
database in Dec. 1999). If not, derive a slope Grid using the ArcView
Spatial Analyst function Surface--> Derive slope. Make sure the Analysis-->
Properties extent and grid size are set to same as "dem" before doing this.
The Derive Slope function returns slope in degrees. To get the slope
in percent, use Analysis-->Map Calculator to take the Tangent of the slope
grid. The ArcView "Tan" function assumes input units are radians,
so degrees must be converted to radians before applying the Tan function.
The name required for the slope grid by the ThreshR software is "slopeff".
Determine the file source name for the output of the tangent Map Calculation
by choosing Theme--> Properties. Copy the source Grid to "slopeff"
by using the File--> Manage Data Sources.
- (Keep this for now.) Make sure that all source file names
conform to an 8.3 naming conventions. Although neither UNIX systems
or modern PC operating systems (e.g. Windows 98) are limited to this filename
length, it is desirable to store data on a CD-ROM media which can be read
from multiple platforms. In my experience extrac characters in file
names that don't conform to the 8.3. naming convention standard become
garbled on our HP UNIX systems. Thus, to make the data files more
universally usable, all data files used by ThreshR should conform to the
DOS 8.3 naming convention.
- The statekey.shp and regions.shp files were not delivered with
all of the correct fields. Look at the files from ABRFC or MBRFC
as models to add the correct fields to the attribute tables for these Shapefiles.
When the Region shapes and State shapes for different regions were combined,
some polygon slivers were created at the intersection of shape boundaries.
To clean up the database, these slivers should be merged with the approprate
larger neighbors. Take a look at the AML mergereg.aml under the CBRFC
directory to see how this can be easily done.
aaa.mergethemes; regions.shp should have reg_name (30 char), reg_num
(2 int), abbr (2 char) To get stuff from the Tape -- see the file /fs/gis/threshr/cbrfc/tar.txt
for an example.
regequat.txt -- ulimit = char
- Revise the parcode.txt and regequat.txt using the files from
ABRFC and MBRFC as models. These files need to be converted to dBase
format. The reason for this change is that dBase files can be edited
more easily within ArcView. The original choice of using .txt files
was because dBase files have limitations on field names; however, this
can be alleviated in the future by using aliases within ArcView.
Field names in the original parcode.txt and regequat.txt table designs
may need to be shortened. It is probably easiest to make the field
name changes first using a text editor (e.g. Nedit) and then convert the
files to dBase format so that they can be edited directly within ArcView.
In the new version of ThreshR, these files must be named "parcode.dbf"
and "regequat.dbf.
- Check all of the numbers in regequat.dbf against USGS Report
94-4002, and be sure to check the "Errata Sheet." The region
numbers in regions.shp should be checked to see if they are consistent
with the region numbers in regequat.dbf.
- A critical field in parcode.dbf is the "conversion" field.
This field is used to tell the program whether or not a units conversion
and/or adjustment factor on the source data is required to calculate peak
flows correctly. The codes "a","s","m", and "d" are used to indicate
add, subtract, multiply and divide respectively. A number indicating
a multiplicative or additive factor must follow each of these codes, separated
by an underscore. Multiple operations can be made on the same parameter
by separating operations with an underscore. Example: the conversion
required when computing mean basin elevation in New Mexico (ELV_NM) is
"m_3.281_d_1000" which means multiply the parameter value by 3.281 and
divide by 1000. This converts mean basin elevation from meters to
thousands of ft. The conversion codes are interpreted and applied
by the script "threshr.qxparams." A checker using USGS Report 94-4002
must be careful because subtractions are sometimes reported in the text,
but not reflected in the printed equations and sometimes they are reported
in the printed equations but not stated in the text.
- An important field in regequat.dbf is the "special" field.
The "special" field contains the code 0 if the corresponding regression
equations conform to the standard format (see Functionality
description). For a particular state or region with a unique
form of the regression equation, the special function code must be a unique
integer not only within that RFC but across the entire United States.
- For specialized data layers in individual states in vector format,
convert all of the Arc/Info Coverages into Shapefiles and eliminate unnecessary
fields. This is done because there is no need to keep two types of
vector data in the ThreshR database. Make sure that there is a field
in the Shapefile that contains the parameter code that is represented by
that layer. For example, in Kansas, there is a polygon Shapefile
that specifies soil permeability. The code for this attribute in
parcode.dbf is SLPM_KS, so the field name containing the actual permeability
values must have this name. For the case of Minnesota, where the
parameter of interest is percent streams and ponds, the parameter code
is STRP_MN. The field named strp_mn in the streams and ponds shapefile
strp_mn.shp contains the value 100 indicating that any area inside these
polygons is 100 percent water.
- Several problems requiring fixes were found in setting up the
database for ABRFC and MBRFC. To give some more examples of things
to look for, here are some of the example problems found.
parcode.txt was missing parameters for Wyoming and Montana
The "params" field in the statekey shapefile was missing for MBRFC.
BASL was listed as a parameter for Wyoming in regequat.txt, but this
is incorrect because there is another BASL parameter that has different
units.
Some parameter names in parcode.dbf were found to be inconsistent with
parameter names in reqequat.dbf.
Some of the GIS data filenames were inconsistent with filenames listed
in parcode.dbf (Example: ro_mn and mar_mn).
Some information in USGS 94-4002 was not interpreted correctly.
(Example South Dakota Sand Hills: "For streams in the Sand Hills
area, use 0.4 times the discharges given by the Western Region.")
Thus, equations for the Sand Hills region can be included in the regequat.dbf
table.
Precipitation data for Montana has the wrong units.
CBRFC
Multiplied forest percent in Idaho (frp_id) by 100.
Precipitation data for california have the wrong units -- should be
in inches. Currently assuming this data set is from PRISM giving
the units in hundredths of mm.
Exponent for NM Nothwest Plateau Q50 incorrect.Also NM southwest mountain,
Q5
Notes on regression equations for individual states:
Iowa: The Grid "stral_ia" is incorrect. As a temporary
solution, baseline runs for Q2 are made using the regression equations
from Carpenter and Georgakakos, 1993 (p. 20) is used. A linefile
of first order streams in Iowa is needed.
Colorado: Have not yet included the equations for the Eastern
Colorado Plains
Texas: The playa area in the panhandle where regression equations
do not apply is included in regions 3 and 4 on the region map provided
in the regression database.
Nebraska: Using the database available, no distinction can be
made between total drainage area and contributing drainage area.
Solving this problem may have to wait until NED comes out.
Region names are totally wrong in Wyoming
|