[EHPweb] QDDS failure
David Oppenheimer
oppen at usgs.gov
Tue Jun 2 23:10:18 GMT 2009
I checked a few QDDS clients (leaves) and they are up and running. However, their notion the last message id# sent by the 3 hubs is incorrect. When the two (3?) QDDS hubs crashed, they reset their counters to zero, so that when I restarted them, their "current message id#" started at zero. Meanwhile, the clients stayed up, and their notion of the hubs' current message id# is 349,682 for qdds1, and 178,940 for iris. This means that a client will never re-request a message from a hub until the messageid# of hubs exceeds the values before they crashed. The last time this happened was 9/8/2008, so it would probably take place in about 9 months. That's too long to wait.
The solution is for me to send out an email to all QDDS clients asking them to stop QDDS, delete the file called save_max_received, and restart. I'll do that when Reston rejoins the network.
-David
-------------------------------------------------------
David Oppenheimer office:650.329.4792
U.S. Geological Survey fax: 650.329.4732
345 Middlefield Road.-MS 977 email: oppen at usgs.gov
Menlo Park, CA 94025
-----Original Message-----
From: Christopher J Bidwell [mailto:cbidwell at usgs.gov]
Sent: Tuesday, June 02, 2009 3:13 PM
To: David H Oppenheimer; EHP Web
Subject: Re: [EHPweb] QDDS failure
I'm getting alerts that graben and ehzeast are unreachable.
--------------
Thanks,
Chris Bidwell, RHCT
Web Admin
Geologic Hazards Team
303-273-8642
cbidwell at usgs.gov
(Sent via Blackberry)
----- Original Message -----
From: "David Oppenheimer" [oppen at usgs.gov]
Sent: 06/02/2009 02:57 PM MST
To: <ehpweb at geohazards.usgs.gov>
Subject: [EHPweb] QDDS failure
For unknown reasons, all 3 QDDS hubs died. I've successfully restarted QDDS
at qdds1.wr.usgs.gov and dmc.iris.washington.edu. I am unable to ssh into
qdds2.er.usgs.gov. Not sure what to do about that machine. Does anyone have
a contact there who can walk up to the machine?
I don't see anything obvious in the 2 QDDS logfiles that caused their
deaths. This has never happened before.
Thanks to Stan Schwarz and Stan Silverman for notifying me.
-David
-------------------------------------------------------
David Oppenheimer office:650.329.4792
U.S. Geological Survey fax: 650.329.4732
345 Middlefield Road.-MS 977 email: oppen at usgs.gov
Menlo Park, CA 94025
-----Original Message-----
From: ehpweb-bounces at geohazards.usgs.gov
[mailto:ehpweb-bounces at geohazards.usgs.gov] On Behalf Of Eric M Martinez
Sent: Monday, June 01, 2009 4:06 PM
To: ehpweb at geohazards.usgs.gov
Cc: Earle Paul
Subject: Re: [EHPweb] DYFI/PAGER
I've started both indexers back up at this time. EHPMaster has seemed
to stabilize a bit but there is still a massive backup running to
ehpnas which may continue to cause problems.
Thanks,
~Eric.
On Jun 1, 2009, at 4:45 PM, Eric M Martinez wrote:
> I'm shutting down both the DYFI and PAGER indexers for the next 15
> minutes to try to stabilize EHPMaster. Both of these processes have
> been generating quite a significant amount of errors all day and I
> have been fighting to keep them running. Please let me know if you
> know of any outside factors (config changes etc) that could be causing
> this.
>
> Thanks,
> ~Eric.
>
>
>
>
> _______________________________________________
> EHPweb mailing list
> EHPweb at geohazards.usgs.gov
> https://geohazards.usgs.gov/mailman/listinfo/ehpweb
_______________________________________________
EHPweb mailing list
EHPweb at geohazards.usgs.gov
https://geohazards.usgs.gov/mailman/listinfo/ehpweb
_______________________________________________
EHPweb mailing list
EHPweb at geohazards.usgs.gov
https://geohazards.usgs.gov/mailman/listinfo/ehpweb
More information about the EHPweb
mailing list