Philip does your RT130 traffic go through NAT on the cell modem or server end? both? <div><br></div><div>Have you tried using tcpdump on the server side to get a packet trace?</div><div><br></div><div>e.g.</div><div> sudo tcpdump -i (serverinterface) host (dasiporhostname)</div>
<div><br></div><div>Its really helpful for debugging these kinds of problems.</div><div><br></div><div>I don't run NAT on any of our 8 cell routers (VZW 3G, dynamic IP), its all pure routing and stable except for the antenna ice :>)</div>
<div>-Dave</div><div><br><div class="gmail_quote">On Sat, Feb 23, 2013 at 4:00 AM, <span dir="ltr"><<a href="mailto:anss-netops-request@geohazards.usgs.gov" target="_blank">anss-netops-request@geohazards.usgs.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send ANSS-netops mailing list submissions to<br>
<a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://geohazards.usgs.gov/mailman/listinfo/anss-netops" target="_blank">https://geohazards.usgs.gov/mailman/listinfo/anss-netops</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:anss-netops-request@geohazards.usgs.gov">anss-netops-request@geohazards.usgs.gov</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:anss-netops-owner@geohazards.usgs.gov">anss-netops-owner@geohazards.usgs.gov</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of ANSS-netops digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. reftek and data stoppages (Philip Crotwell)<br>
2. Re: reftek and data stoppages [USGS] (Ian Billings)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Fri, 22 Feb 2013 11:41:11 -0500<br>
From: Philip Crotwell <<a href="mailto:crotwell@seis.sc.edu">crotwell@seis.sc.edu</a>><br>
To: "<a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a>"<br>
<<a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a>><br>
Cc: "Thomas J. Owens" <<a href="mailto:owens@seis.sc.edu">owens@seis.sc.edu</a>><br>
Subject: [ANSS-netops] reftek and data stoppages<br>
Message-ID:<br>
<CAGFrVcWJ+fWpRyqaPVp9nbaR2-=<a href="mailto:fcbFA9%2B8EKNUoyymQ0G-P8A@mail.gmail.com">fcbFA9+8EKNUoyymQ0G-P8A@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Hi all<br>
<br>
We have four stations with reftek 130s on cell modems going into earthworm<br>
via rtpd. I recently moved my server to new hardware and rediscovered an<br>
old problem. About every day or two some of the stations stop sending data<br>
even though the link is ok. I don't know what the initial cause is, maybe<br>
the cell link briefly died, but at the time we check on things, the cell<br>
link is fine but data is just not flowing.<br>
<br>
The odd thing is the clicking the "das discovery" button in the web admin<br>
tool RTCC causes all the stations to start flowing again. The rediscovery<br>
part is that some years ago when we first noticed this, I put in a cron job<br>
to hit the "das discovery" url once every 15 minutes and the problem went<br>
away. Given limited brain cells, I promptly forgot about it. Not until I<br>
switched server machine, and forgot to transfer the cron job, did I<br>
remember the issue.<br>
<br>
It is very puzzling to me that if getting the stations back on line is as<br>
easy as clicking a url, then why in the world can't rtpd do it itself!?!??<br>
Have any of you seen this issue? Any suggestions on ways to deal with it<br>
other than a cron based das discovery? I should say we run a mixed network<br>
with other statins using either q330s or guralps, and only the refteks seem<br>
to have trouble noticing that the cell link is working.<br>
<br>
One other puzzle is that my understanding is that the rt130s will cache up<br>
to 99 minutes of data in the case of a lost connection. My experience is<br>
that you get the benefit of the cache only in cases of the outage lasting<br>
less than 99 minutes. If the outage is longer, then when the link comes up<br>
the rt130 starts sending real time data and never sends the previous 99<br>
minutes. If however the outage is less than the cache time, it will start<br>
sending the old cached data first. Seems weird that a 98 minute outage<br>
results in no data loss, but a 100 minute outage results in a 100 minute<br>
data loss.<br>
<br>
We have recent, but not the absolute latest versions of firmware, so I<br>
should probably upgrade those just in case. We have stations showing this<br>
issue with firmware at recent as 3.3.1 and I don't see anything in the<br>
release notes that would suggest newer firmware addresses this. RTPD on the<br>
server is the latest version, 2.1.9.0b.<br>
<br>
Here is some output of me running rtpid around the time I hit the "das<br>
discovery" button. I hit the button at 11:29:00 and all the stations had<br>
come back to life and sending data within 12 seconds of the discovery<br>
action.<br>
<br>
thanks,<br>
Philip<br>
<br>
2013:053-11:27:56 earthworm rtpid[3545] RTPID version 2.1.0.0<br>
2013:053-11:27:56 earthworm rtpid[3545] Options:<br>
2013:053-11:27:56 earthworm rtpid[3545] Host = localhost<br>
2013:053-11:27:56 earthworm rtpid[3545] Port = 2543<br>
2013:053-11:27:56 earthworm rtpid[3545] Retry = nonfatal<br>
2013:053-11:27:56 earthworm rtpid[3545] Log file = rtpid.log<br>
2013:053-11:27:56 earthworm rtpid[3545] Verbose = FALSE<br>
2013:053-11:27:56 earthworm rtpid[3545] Timeout = 60<br>
2013:053-11:27:56 earthworm rtpid[3545] Attempts = 9999<br>
2013:053-11:27:56 earthworm rtpid[3545] Attempting connection:<br>
localhost:2543<br>
2013:053-11:27:56 earthworm rtpid[3545] Successful connection:<br>
localhost:2543<br>
<br>
---- hit "DAS-DISCOVERY" at 11:29:00 ----<br>
<br>
2013:053-11:29:02 earthworm rtpid[3545] Unit A898 detected<br>
2013:053-11:29:03 earthworm rtpid[3545] Unit A064 detected<br>
2013:053-11:29:06 earthworm rtpid[3545] Unit A872 detected<br>
2013:053-11:29:12 earthworm rtpid[3545] Unit A900 detected<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/1586a6ed/attachment-0001.html" target="_blank">http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/1586a6ed/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Fri, 22 Feb 2013 11:31:54 -0600<br>
From: Ian Billings <<a href="mailto:i.billings@reftek.com">i.billings@reftek.com</a>><br>
To: Philip Crotwell <<a href="mailto:crotwell@seis.sc.edu">crotwell@seis.sc.edu</a>>,<br>
<a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a><br>
Cc: "Thomas J. Owens" <<a href="mailto:owens@seis.sc.edu">owens@seis.sc.edu</a>><br>
Subject: Re: [ANSS-netops] reftek and data stoppages [USGS]<br>
Message-ID: <<a href="mailto:aec7190bf6de0fefacaf958f291bd3a3@mail.gmail.com">aec7190bf6de0fefacaf958f291bd3a3@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="windows-1252"<br>
<br>
Philip,<br>
<br>
<br>
<br>
Firstly the issue with RTP/RTPD links sleeping due to outage. This points<br>
to something at the RTPD server end. If a RTP/RTPD is declared down the<br>
130 will go into a sequence of sending server discovery packets to the RTPD<br>
host address every 6-8secs, for a 300sec on/120sec RTP sleeping, cycle<br>
until the link is re-established. RTPD when sensing this unconditional<br>
sync, also known as server discover, will start to negotiate with the units<br>
RTP to bring the link up. I would need to look at the RTPD log file to see<br>
if the unconditional syncs are coming into RTPD and if so what RTPD<br>
responds with. I suspect there is a time out in the local firewall or<br>
router handling the traffic from the 130 units to the RTPD server. And<br>
sending the Das discovery resets this time out. A server discovery from<br>
RTPD to a 130 is different from what RTPDID sends out and what RTPD sends<br>
to a DAS if it receives an unconditional sync from it. If you could send a<br>
RTPD log file I could at least confirm that server discoveries from the<br>
130?s are being seen by RTPD.<br>
<br>
<br>
<br>
Secondly your issue with 99minutes of buffered data. The firmware in the<br>
130 is written in such a way that when the RTP/RTPD is declared down the<br>
RTPD thread data will be saved to RAM for the thread?s TOSS threshold. In<br>
your case 99mins. However if the link remains down for more than the TOSS<br>
threshold this RTPD thread data since the link down declaration is all<br>
deleted from. Again in your case this happens at 99mins. REF TEK?s logic<br>
to do this has been explained to allow RAM to be freed up to handle other<br>
thread data link Disk Thread because it is unlikely the link will<br>
re-establish if the TOSS threshold has been met and this old RTPD thread<br>
data will continue to get older etc, and therefore of lesser value when the<br>
link is re-established. If you find you have link outages lasting on<br>
average 110mins then simply increase the TOSS threshold so most data will<br>
be recovered.<br>
<br>
<br>
<br>
Again please send me a RTPD log file that has a time window of known 130 to<br>
central link stability but no RTP/RTPD connection as well as the part of<br>
the log that shows before and after result of the user issued server<br>
discovery.<br>
<br>
<br>
<br>
Thanks,<br>
<br>
Ian Billings<br>
<br>
Field Technician<br>
<br>
*REF TEK ? A Division of Trimble Navigation*<br>
<br>
<a href="http://support.reftek.com" target="_blank">http://support.reftek.com</a><br>
<br>
Skype ian_billings1<br>
<br>
PH 214 440 1265<br>
<br>
PH 214 440 1289 (Direct)<br>
<br>
<br>
<br>
<br>
<br>
*From:* ANSS-netops [mailto:<a href="mailto:anss-netops-bounces@geohazards.usgs.gov">anss-netops-bounces@geohazards.usgs.gov</a>] *On<br>
Behalf Of *Philip Crotwell<br>
*Sent:* Friday, February 22, 2013 10:41 AM<br>
*To:* <a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a><br>
*Cc:* Thomas J. Owens<br>
*Subject:* [ANSS-netops] reftek and data stoppages<br>
<br>
<br>
<br>
<br>
<br>
Hi all<br>
<br>
We have four stations with reftek 130s on cell modems going into earthworm<br>
via rtpd. I recently moved my server to new hardware and rediscovered an<br>
old problem. About every day or two some of the stations stop sending data<br>
even though the link is ok. I don't know what the initial cause is, maybe<br>
the cell link briefly died, but at the time we check on things, the cell<br>
link is fine but data is just not flowing.<br>
<br>
The odd thing is the clicking the "das discovery" button in the web admin<br>
tool RTCC causes all the stations to start flowing again. The rediscovery<br>
part is that some years ago when we first noticed this, I put in a cron job<br>
to hit the "das discovery" url once every 15 minutes and the problem went<br>
away. Given limited brain cells, I promptly forgot about it. Not until I<br>
switched server machine, and forgot to transfer the cron job, did I<br>
remember the issue.<br>
<br>
It is very puzzling to me that if getting the stations back on line is as<br>
easy as clicking a url, then why in the world can't rtpd do it itself!?!??<br>
Have any of you seen this issue? Any suggestions on ways to deal with it<br>
other than a cron based das discovery? I should say we run a mixed network<br>
with other statins using either q330s or guralps, and only the refteks seem<br>
to have trouble noticing that the cell link is working.<br>
<br>
One other puzzle is that my understanding is that the rt130s will cache up<br>
to 99 minutes of data in the case of a lost connection. My experience is<br>
that you get the benefit of the cache only in cases of the outage lasting<br>
less than 99 minutes. If the outage is longer, then when the link comes up<br>
the rt130 starts sending real time data and never sends the previous 99<br>
minutes. If however the outage is less than the cache time, it will start<br>
sending the old cached data first. Seems weird that a 98 minute outage<br>
results in no data loss, but a 100 minute outage results in a 100 minute<br>
data loss.<br>
<br>
We have recent, but not the absolute latest versions of firmware, so I<br>
should probably upgrade those just in case. We have stations showing this<br>
issue with firmware at recent as 3.3.1 and I don't see anything in the<br>
release notes that would suggest newer firmware addresses this. RTPD on the<br>
server is the latest version, 2.1.9.0b.<br>
<br>
Here is some output of me running rtpid around the time I hit the "das<br>
discovery" button. I hit the button at 11:29:00 and all the stations had<br>
come back to life and sending data within 12 seconds of the discovery<br>
action.<br>
<br>
thanks,<br>
<br>
Philip<br>
<br>
<br>
2013:053-11:27:56 earthworm rtpid[3545] RTPID version 2.1.0.0<br>
2013:053-11:27:56 earthworm rtpid[3545] Options:<br>
2013:053-11:27:56 earthworm rtpid[3545] Host = localhost<br>
2013:053-11:27:56 earthworm rtpid[3545] Port = 2543<br>
2013:053-11:27:56 earthworm rtpid[3545] Retry = nonfatal<br>
2013:053-11:27:56 earthworm rtpid[3545] Log file = rtpid.log<br>
2013:053-11:27:56 earthworm rtpid[3545] Verbose = FALSE<br>
2013:053-11:27:56 earthworm rtpid[3545] Timeout = 60<br>
2013:053-11:27:56 earthworm rtpid[3545] Attempts = 9999<br>
2013:053-11:27:56 earthworm rtpid[3545] Attempting connection:<br>
localhost:2543<br>
2013:053-11:27:56 earthworm rtpid[3545] Successful connection:<br>
localhost:2543<br>
<br>
---- hit "DAS-DISCOVERY" at 11:29:00 ----<br>
<br>
<br>
2013:053-11:29:02 earthworm rtpid[3545] Unit A898 detected<br>
2013:053-11:29:03 earthworm rtpid[3545] Unit A064 detected<br>
2013:053-11:29:06 earthworm rtpid[3545] Unit A872 detected<br>
2013:053-11:29:12 earthworm rtpid[3545] Unit A900 detected<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/61d46944/attachment-0001.html" target="_blank">http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/61d46944/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
ANSS-netops mailing list<br>
<a href="mailto:ANSS-netops@geohazards.usgs.gov">ANSS-netops@geohazards.usgs.gov</a><br>
<a href="https://geohazards.usgs.gov/mailman/listinfo/anss-netops" target="_blank">https://geohazards.usgs.gov/mailman/listinfo/anss-netops</a><br>
<br>
<br>
------------------------------<br>
<br>
End of ANSS-netops Digest, Vol 44, Issue 7<br>
******************************************<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Sent from my iNTERNETS!!!
</div>