[EHPweb] Chile EQ Web post-mortem update
Lisa A Wald
lisa at usgs.gov
Thu Mar 18 02:29:20 UTC 2010
Chile EQ Web post-mortem update on action items:
1) Processes were high on webservers.
The number of processes was high (upper-400's to 500+) for all the webservers, espcecially ehp3 which was causing everything to slow down and rsyncs to fail (on ehp3 many times, and on ehp1 one time). This was due to the high amount of web traffic, the ongoing latency issue with ehp3, and the missing 4th server, ehp4.
ACTION ITEMS:
a) test using Blowfish cipher and null cipher, which have less (or no) encryption but less overhead also - Chris, Eric - COMPLETE
b) test Eric's pushit script that includes error handling - Chris, Eric - COMPLETE, but needs to be modified by Chris
c) move ehp3 to Menlo - Stan & Chris - no progress
---
2) RSS feed appears to be getting corrupted.
Clients that use the EQ RSS feed are encountering problems that appear to be related to corrupted feeds. Wan entered a ticket for this in Trac (#926) which has been assigned to Jeremy.
ACTION ITEMS:
a) look into RSS feed problems - Jeremy - user error - COMPLETE
---
3) PAGER failures with PDL.
Sometime between 9:03 am and 1:54 pm on Saturday, February 27, the prototype PDL client running on losspager1 stopped transmitting content to ehpmaster. That client is configured to send PAGER content to ehpmaster and ehpbackup. Interestingly (but not usefully), it continued to transmit data to ehpbackup. We saw this same behavior during the Haiti response, leading me to believe that this behavior has something to do with excessive network "load" on the PDL client.
We experience similar behavior on ehpmaster, which has a PDL client configured to receive PAGER content from losspager1. This client is configured to call two scripts when it receives content:
1) A logging script
2) The PAGER indexer script.
At (at least) three different times on Feb 27, the PDL client failed to call the indexing script when receiving PAGER content. In each case, the client called the script when a later bundle of PAGER content was received.
ACTION ITEMS:
a) test the current production version of PDL in the PAGER development environment; the PDL client must be reliable - Mike, Jeremy - unknown
b) implement a stop/restart every 24 hours and try this when things get hung - Mike - unknown
---
3) Tsunami add-ons did not work as expected.
The tsunami warning centers didn't send any updates after the first add-on, or they did but were not associated with the event since they were using their own eventID rather than the authoritative eventID. Manual add-ons for tsunami alerts were not possible, and general information links were placed in the wrong section on the Tsunami tab.
ACTION ITEMS:
a) create an add-on type for general info tsunami links - Jeremy - COMPLETE
b) look into using warning center alerts to trigger add-on links - Jeremy - will not do - COMPLETE
c) add recenteqs check for superceded events to that add-ons with non-authoritative eventIDs can be still added to the event page - Jeremy - unknown
d) modifications to allow access to the non-authoritative eventID add-ons through the EQintheNews Admin - Jeremy - unknown
e) find out web stats of pageviews for tsunami alerts - Lisa - meeting with Paul 3/18
f) talk to Paul about which links he thinks we should be using - Lisa - meeting with Paul 3/18
g) look at Tsunami Warning Center websites for links we think we should be using - Madeleine - unknown
4) Aftershock Map procedure needs to be determined.
The aftershock map was requested again, and this time we got the list in a different format than the first or second time. Eric had to write another parser for the input to create the XML file. Gavin will continue to give us an updated list every morning until no longer needed so we can update the map each morning.
ACTION ITEMS:
a) We need to determine a definitive source for the aftershock list and a standard procedure for the product as it is now; have a meeting. - Lisa meeting with Paul 3/18
5) The Executive Summary failed and there was a 20-30 min delay getting it out.
The form did not contain validation, so an incorrect user entry caused it to fail.
ACTION ITEMS:
a) Validation will be added to the app. - Matthew - COMPLETE, I think
b) We will recreate the Exec Summary from the ground up during FY10 using a project plan.
Additional action items not related to Chile EQ:
a) The 2004 Sumatra event and 2006 Gulf of Mexico event have the same eventID, and their information keeps getting mixed up on their event pages. Since then, the year has been added to the eventID, so this will not happen again, but we need to manually change these eventIDs to include the year as a prefix, and put redirects in place for the old links. - Jeremy - COMPLETE
b) Organize a workshop for the DB group, Edge, Hydra, two new student hires, Scott, and Volkan to share information about the web architecture and future plans, and the website infrastructure such as the Admin area, Trac ticketing, documentation, and versioning system. Scott and Volkan would travel to Golden for the workshop. - Lisa - will do now that Bill Horton has started
c) Give Gavin PAGER admin privileges in the Admin area. - Lisa - COMPLETE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://geohazards.usgs.gov/pipermail/ehpweb/attachments/20100317/f73b3f8b/attachment.html>
More information about the EHPweb
mailing list