[EHPweb] QDDS failure again

David Oppenheimer oppen at usgs.gov
Wed Jun 3 20:14:58 GMT 2009


All QDDS hubs died again. I just restarted all hubs. 

Anyone know what's going on with the USGS/DOI networking?  

Failure times (gmt) are
Dmc.iris.washington.edu	16:40	! reset curr_message_id (again)
Qdds2.er.usgs.gov		19:57	! didn't reset (nor previous time)
Qdds1.wr.usgs.gov		16:21	! didn't reset

I'm not going to ask QDDS clients to restart, as the id# for the IRIS hub
will pass where it was yesterday sometime today.

This episode illustrates a problem. The QDDS hubs run under non-"quake"
accounts. More people than I should be able to log in and administer QDDS
hubs. Should we set up quake accounts on qdds2 and qdds1 with permission to
administer these 2 hubs?  We would need to work with IRIS on how to
administer their hub.  

-David

-------------------------------------------------------
David Oppenheimer                   office:650.329.4792
U.S. Geological Survey              fax:   650.329.4732
345 Middlefield Road.-MS 977    email: oppen at usgs.gov
Menlo Park, CA 94025


-----Original Message-----
From: David Oppenheimer [mailto:oppen at usgs.gov] 
Sent: Tuesday, June 02, 2009 2:58 PM
To: 'ehpweb at geohazards.usgs.gov'
Subject: QDDS failure

For unknown reasons, all 3 QDDS hubs died. I've successfully restarted QDDS
at qdds1.wr.usgs.gov and dmc.iris.washington.edu. I am unable to ssh into
qdds2.er.usgs.gov. Not sure what to do about that machine. Does anyone have
a contact there who can walk up to the machine?

I don't see anything obvious in the 2 QDDS logfiles that caused their
deaths. This has never happened before.

Thanks to Stan Schwarz and Stan Silverman for notifying me.

-David 

-------------------------------------------------------
David Oppenheimer                   office:650.329.4792
U.S. Geological Survey              fax:   650.329.4732
345 Middlefield Road.-MS 977    email: oppen at usgs.gov
Menlo Park, CA 94025


-----Original Message-----
From: ehpweb-bounces at geohazards.usgs.gov
[mailto:ehpweb-bounces at geohazards.usgs.gov] On Behalf Of Eric M Martinez
Sent: Monday, June 01, 2009 4:06 PM
To: ehpweb at geohazards.usgs.gov
Cc: Earle Paul
Subject: Re: [EHPweb] DYFI/PAGER

I've started both indexers back up at this time.  EHPMaster has seemed  
to stabilize a bit but there is still a massive backup running to  
ehpnas which may continue to cause problems.

Thanks,
	~Eric.




On Jun 1, 2009, at 4:45 PM, Eric M Martinez wrote:

> I'm shutting down both the DYFI and PAGER indexers for the next 15
> minutes to try to stabilize EHPMaster.  Both of these processes have
> been generating quite a significant amount of errors all day and I
> have been fighting to keep them running.  Please let me know if you
> know of any outside factors (config changes etc) that could be causing
> this.
>
> Thanks,
> 	~Eric.
>
>
>
>
> _______________________________________________
> EHPweb mailing list
> EHPweb at geohazards.usgs.gov
> https://geohazards.usgs.gov/mailman/listinfo/ehpweb

_______________________________________________
EHPweb mailing list
EHPweb at geohazards.usgs.gov
https://geohazards.usgs.gov/mailman/listinfo/ehpweb



More information about the EHPweb mailing list