7. Troubleshooting

Techniques for troubleshooting various aspects of AvnFPS are addressed in other sections of this manual. The purpose of this section is to gather some of the more likely fault scenarios and troubleshooting techniques into a single location.

7.1. Overall Server Health

A number of server processes keep AvnFPS running. A series of status lights on the TAF Monitor GUI reports the status of most of these processes. Keep alive messages are sent among the servers every 30 seconds to set the colors of these status lights. Section 1. of the System Administration Manual, "System Overview," contains a detailed description of each server. Server hosts may vary between AWIPS releases to support load-balancing efforts. For AWIPS OB9.2, the servers reside on a single host, px2f.

7.1.1. Diagnostic Tools

  • If all the server status lights are red, check for

    • Large (>60s) clock skew between client and server hosts, nominally between px2f machine and AWIPS workstations.

    • Failed name/event server (avnserver)

    • A network failure

  • The command ps -ejHf | grep avnpython lists server processes and any component threads. In this case, indentation in the rightmost column indicates hierarchy. A sample listing follows:

    px2-wfo:user: ps -ejHf | grep avnpython
    fxa 18202     1 18201 18201 0 Apr28 ? 00:00:00  avnpython /awips/adapt/avnfps/OB9.2/py/avninit.py px2f
    fxa 18203 18202 18201 18201 0 Apr28 ? 00:06:46   avnpython /awips/adapt/avnfps/OB9.2/py/avnserver.py -d -n px2f
    fxa 18224 18202 18201 18201 0 Apr28 ? 00:02:10   avnpython /awips/adapt/avnfps/OB9.2/py/avndrs.py -d -n px2f
    fxa 18239 18202 18201 18201 0 Apr28 ? 00:17:02   avnpython /awips/adapt/avnfps/OB9.2/py/avndis.py -d -n px2f
    fxa 18250 18202 18201 18201 0 Apr28 ? 00:00:55   avnpython /awips/adapt/avnfps/OB9.2/py/avnxs.py -d -n px2f
    		

    Notes:

    • The command given above will shows the actual processes. To see the threads spawned by those processes use -efL flags to the ps command: ps -efL | grep avnpython.

    • Some comments follow:

      PIDComments"Kill"-able?
      18202avninit daemonYes
      18203avnserver processYes
      18224Data Request Server (avndrs) processYes
      18239Data Ingest Server (avndis) processYes
      18250Transmit Server (avnxs) processYes

    • Threads of a process will not accept signals from a kill command. You must identify the process that owns the thread and kill that top-level process. Once a parent process terminates, all of its threads will terminate as well.

  • The command netstat -a | grep 9090 can be used to diagnose connections among the various servers. The name/event server uses port 9090 to disseminate information about the various servers that are up and running. Here's an example:

    Checking netstat on px2

    px2-wfo:user: netstat -a | grep 9090
    tcp        0      0 px2f-wfo:9090              px2f-wfo:55812             ESTABLISHED
    tcp        0      0 px2f-wfo:9090              px2f-wfo:55819             ESTABLISHED
    tcp        0      0 px2f-wfo:55819             px2f-wfo:9090              ESTABLISHED
    tcp        0      0 px2f-wfo:55812             px2f-wfo:9090              ESTABLISHED
    tcp        0      0 px2f-wfo:9090              lx3-wfo:40638              ESTABLISHED
    
    

    Notes:

    • The columns are Protocol, Receive Queue, Send Queue, Local Address, Foreign Address, and State.

    • Six connections to and from the name/event server can be seen in the px2 listing.

      • The first four established connections show connections between the servers on px2 and the name/event server.

      • There is an additional connection to a process on lx3. Most likely, this is an instance of the AvnWatch GUI.

7.1.2. Stopping/Starting Servers

avninit

Starting the servers for AvnFPS is a little complicated because of the inter-relationships among them. The utility avninit was developed to handle these complexities. avninit is a persistent process that runs on px2 or, in case of failover, px1 and attempts to restart servers as needed. If you fix a file or network problem that is preventing a server from starting, avninit will attempt to restart the downed server. However, avninit will only attempt 10 restarts of a failed server in a one-hour span of time.

Environment
[Important]Important
All AvnFPS servers must run at user fxa; do not try to start them as root. The standard server startup scripts will check userid before launching the servers.

Avnkill

The utility avnkill can be used to stop all AvnFPS servers that are running on a host. This includes avninit. avnkill will be persistent, sending interrupt signals first, then kill signals. It will also try to clean up ill-behaved child processes that refuse to die when the parent dies. Once all servers have stopped running, restart avninit, using the remoteServers.sh start

Bouncing Servers

To "bounce" a Data Ingest, Data Request, or Transmisison Server, identify the top-level process associated with the application and use the kill command to stop it. Within a few seconds, avninit should launch a new instance.

7.2. Log Files

See Section 6: “Logging” of the System Administration Manual, for complete information on logs. The following information is distilled from that section.

Log files for AvnFPSOB9.2 server processes are stored in the directory tree /data/logs/adapt/avnfps, local to the host computer. The names of the logs files are formed by the name of the application and the current day of the week (e. g., avnmenu_Thu and avndis_Fri). All applications use collective logging which means that log file entries from different instances of an application can be found, interleaved, in a single log file. The following screen capture shows sample directory listings:

GUI logs on a workstation

lx2-wfo:user: pwd
/data/logs/adapt/avnfps
lx2-wfo:user: ls
avnclimate_Fri  avnmenu_Mon  avnmenu_Thu  avnqcstats_Tue  avnsetup_Thu  avnwatch_Mon  avnwatch_Thu
avnclimate_Wed  avnmenu_Sat  avnmenu_Tue  avnsetup_Mon    avnsetup_Tue  avnwatch_Sat  avnwatch_Tue
avnmenu_Fri     avnmenu_Sun  avnmenu_Wed  avnsetup_Sun    avnwatch_Fri  avnwatch_Sun  avnwatch_Wed

Server logs on px1/px2

px2-wfo:user: pwd
/data/logs/adapt/avnfps
px2-wfo:user: ls
avndrs_Fri  avndrs_Sun  avndrs_Wed   avninit_Sun  avninit_Wed    avnserver_Sat  avnserver_Tue
avndrs_Mon  avndrs_Thu  avninit_Fri  avninit_Thu  avnserver_Fri  avnserver_Sun  avnserver_Wed
avndrs_Sat  avndrs_Tue  avninit_Sat  avninit_Tue  avnserver_Mon  avnserver_Thu

It is possible for certain unexpected errors to go undetected in log files. One way to test against this possibility is to launch a GUI process from the command line and watch the standard error and standard output streams. Here is a command that will launch the AvnFPS startup menu from the command line:

lx2-wfo:ncfuser:19: /awips/adapt/avnfps/bin/avnstart.sh avnmenu
		

7.3. Data Ingest

Data Ingest Servers (instances of avndis) monitor events in AWIPS and decode various data as they become available. For some data types, the term decode means little more than copying a file. Once avndis has put the data into the AvnFPS directory tree, Data Request Servers (instances of avndrs) serve the data to the various GUIs.

7.3.1. Data Ingest Troubleshooting

  • Log files for avndrs show data being delivered to GUIs as well as data access problems. See if GUIs are connecting to avndrs successfully and receiving data.

  • Log files for avndis show data arriving and being decoded. Errors are rather easy to spot in here.

7.3.2. Specific hints for specific data types

Text (TAFs, METARs) Database triggers deliver products to /awips/adapt/avnfps/data/text.
Lightning Observations Directory /data/fxa/point/binLightning/netcdf is monitored for new and updated files.
Lightning Probability Directory /data/fxa/img/SBN/netCDF/LATLON/3hr/LTG is monitored for new and updated files.
Low-Level Wind Shear The following directories can be monitored for new files:
  • /data/fxa/point/acarsProfiles/netcdf

  • /data/fxa/point/profiler/netcdf

  • /data/fxa/LDAD/profiler/netCDF,

  • /data/fxa/radar/CCCC/VWP, where CCCC = radar-ID

IFPS Grids An entry in /etc/cron.d/px2apps issues the job that exports the data from GFESuite ifpServer to /awips/adapt/avnfps/data/grids where the files are monitored by gamin.

7.4. Product Transmission

Product transmission is covered in depth in Section 1.4: “Transmission Server” of the System Administration Manual. Here are the recommended steps to follow:

  • Check log file on px2f for avnxs server normally located in the /data/logs/adapt/avnfps directory. If a forecast prepared by AvnFPS was written to the pending queue /awips/adapt/avnfps/OB9.2/xmit/pending and the transmission server attempted to acces the file, there should be a corresponding entry, starting with the word SUCCESS or FAILNNN, where NNN is the code returned by the system call to handleOUP.pl.

  • If avnxs indicates handleOUP.pl failure, proceed to investigate its log file, /data/logs/fxa/yyyymmdd/handleOUP.log, on the px2f machine,.