OAX SBN Data ------------ Tarballs of OAX SBN data have been provided. Each tarball contains an hour's worth of data as collected from an AWIPS SBN data stream. Tarball names use time-stamps indicating the most recent time-stamp of any file in that data set. For example, a tarball named "0303_1339.tgz" means the tarball contains data collected from roughly 1239 - 1339Z on March 3. The uncompressed data is about 50% larger than the tarball size. Directions for use ------------------ 1) Find a directory to hold the data from the tarball(s) and be sure that the file system has sufficient space to hold the unpacked data. You do not need to unpack all the tarballs. Unpack as many as you want. The data is unpacked from the tarballs into subdirectories. The names of these subdirectories are time-stamps indicating the month, day, hour, and minute when the data was collected. Each of the four fields is a two-character integer. For example, a source data subdirectory name of "0303_1339" (extracted from a tarball named "0303_1339.tgz") indicates the data was collected on March 3 between 1239 - 1339Z. The ingest script searches the root of the source data directory for all subdirectories that use this naming convention. As a result, do not change the names of these subdirectories after extracting the data from the tarballs. Say you have a directory "/data/source" which will be the root of your data source tree. Unpack the tarball 0303_1339.tgz into /data/source. You will then have one subdir in /data/source: "0303_1339". You may unpack other tarballs into /data/source if you wish. 3) Run the script "ingestData-spam_edex.sh" to ingest data from your data source tree. The data is read "in situ" by the script, thus no data is physically copied to a "target" location. The executable "spam_edex" is called by the script to send messages to qpid. "spam_edex" must be in your path. Also, since spam_edex depends on the qpid libraries, your LD_LIBRARY_PATH must reference these libraries. To simulate the ingest of a real AWIPS SBN data stream, the data will be "ingested" by the script in time sequence according to the actual time-stamps of the data files. The rate of ingest is adjustable as described below. Using the above information, you would run (using default speed = 1): ingestData-spam_edex.sh /data/source 4) Detailed discussion of usage: USAGE: ingestData-spam_edex.sh [speed] - The first argument is mandatory and must be a directory name. "root of data source directory" is the absolute path of the directory that contains one or more subdirectories consisting of raw one-hour SBN data. - Second argument is optional and is the speed or rate of data reading. Note that the value for speed: 1 <= integer <= 10 If left blank, speed defaults to 1. Each integer given for speed represents the divisor to be used with the dividend 60 (60 seconds) to produce a value representing the number of seconds of real time to process one minute of raw source data. Reasons why one would want to accelerate rate of data ingest: performance testing, system loading, impatience, etc. For example, if speed is 1, the script will take 60 seconds of real time to process/ingest one minute of raw source data. This attempts to simulate the actual SBN ingest data rate. If speed is 2, the script will take 30 seconds of real time to process one minute of raw source data. And so on. At low speed settings, the ingest (copying) may not be uniform over each time interval (the copying will occur at the beginning of interval and may finish well ahead of the end of the time interval). In theory, if speed is set to 10, the script will take only 6 seconds to process/ingest one minute of raw source data. There are practical limits to the speed setting which are imposed by system resources and the amount of time to simply traverse the algorithm in the script. On most machines, a true speed of 10 may not be achievable. EXAMPLE: ingestData-spam_edex.sh /data/source 2 In this example, the script will ingest raw data files from subdirs in /data/source with speed=2. A speed of 2 means that each minute of raw source data will be copied to the endpoints in 30 seconds (so data ingest will be double the rate of actual sbn ingest from the SBNCP). Here is what /data/source should look like - ls -l /data/source drwxr-xr-x 13 root root 4096 Mar 3 13:39 0303_1339 drwxr-xr-x 13 root root 4096 Mar 3 14:43 0303_1441 drwxr-xr-x 13 root root 4096 Mar 3 15:45 0303_1545 drwxr-xr-x 13 root root 4096 Mar 3 16:48 0303_1647 drwxr-xr-x 13 root root 4096 Mar 3 17:52 0303_1750 Each of the above directories 0303_1339, 0303_1441, etc contains about one hour's worth of raw sbn data. Each directory must have subdirs named 'sat', 'radar', etc which contain sbn data for that datatype. That is, ls -l /data/source/0303_1339 drwxrwxrwx 2 root root 4096 Mar 3 13:30 airep/ drwxrwxrwx 2 root root 4096 Mar 3 13:39 binlightning/ drwxrwxrwx 2 root root 8192 Mar 3 13:35 bufrua/ drwxrwxrwx 2 root root 98304 Mar 3 13:36 grib1/ drwxrwxrwx 2 root root 4096 Mar 3 13:38 grib2/ drwxrwxrwx 2 root root 8192 Mar 3 13:38 metar/ drwxrwxrwx 2 root root 12288 Mar 3 13:35 pirep/ drwxrwxrwx 2 root root 65536 Mar 3 13:39 radar/ drwxrwxrwx 2 root root 4096 Mar 3 13:32 sat/ drwxrwxrwx 2 root root 28672 Mar 3 13:38 sfcobs/ drwxrwxrwx 2 root root 8192 Mar 3 13:39 taf/ Note that you can stop the script at any time using Ctl-C if running interactively or kill (of the pid of the script process) if running in the background. 5) A log file is created in $HOME/.SBN.ingest.log. Tail the log file to track the progress of ingest. A scratch dir is created in $HOME/.SBN. 6) Assumptions: The local machine uses a 24-hour clock. You should not need to set GMT on your local machine. Each data dir contains no more than one hour's worth of data. This is default. System load will affect the rate of data reading. On a busy system or a system with "limited" resources, high rates of data reading may not be possible. Thus, for "high" rates of data reading, the speed argument may be more theoretical than real. spam_edex and associated enhancements to the ingestData script were made by David Friedman (Keane/Raytheon AWIPS Team). JW - updated 10/28/10