Philip does your RT130 traffic go through NAT on the cell modem or server end? both? <div><br></div><div>Have you tried using tcpdump on the server side to get a packet trace?</div><div><br></div><div>e.g.</div><div>  sudo tcpdump -i (serverinterface) host (dasiporhostname)</div>

<div><br></div><div>Its really helpful for debugging these kinds of problems.</div><div><br></div><div>I don't run NAT on any of our 8 cell routers (VZW 3G, dynamic IP), its all pure routing and stable except for the antenna ice :>)</div>

<div>-Dave</div><div><br><div class="gmail_quote">On Sat, Feb 23, 2013 at 4:00 AM,  <span dir="ltr"><<a href="mailto:anss-netops-request@geohazards.usgs.gov" target="_blank">anss-netops-request@geohazards.usgs.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send ANSS-netops mailing list submissions to<br>

        <a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="https://geohazards.usgs.gov/mailman/listinfo/anss-netops" target="_blank">https://geohazards.usgs.gov/mailman/listinfo/anss-netops</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:anss-netops-request@geohazards.usgs.gov">anss-netops-request@geohazards.usgs.gov</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:anss-netops-owner@geohazards.usgs.gov">anss-netops-owner@geohazards.usgs.gov</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of ANSS-netops digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. reftek and data stoppages (Philip Crotwell)<br>

   2. Re: reftek and data stoppages [USGS] (Ian Billings)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Fri, 22 Feb 2013 11:41:11 -0500<br>

From: Philip Crotwell <<a href="mailto:crotwell@seis.sc.edu">crotwell@seis.sc.edu</a>><br>

To: "<a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a>"<br>

        <<a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a>><br>

Cc: "Thomas J. Owens" <<a href="mailto:owens@seis.sc.edu">owens@seis.sc.edu</a>><br>

Subject: [ANSS-netops] reftek and data stoppages<br>

Message-ID:<br>

        <CAGFrVcWJ+fWpRyqaPVp9nbaR2-=<a href="mailto:fcbFA9%2B8EKNUoyymQ0G-P8A@mail.gmail.com">fcbFA9+8EKNUoyymQ0G-P8A@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="iso-8859-1"<br>

<br>

Hi all<br>

<br>

We have four stations with reftek 130s on cell modems going into earthworm<br>

via rtpd. I recently moved my server to new hardware and rediscovered an<br>

old problem. About every day or two some of the stations stop sending data<br>

even though the link is ok. I don't know what the initial cause is, maybe<br>

the cell link briefly died, but at the time we check on things, the cell<br>

link is fine but data is just not flowing.<br>

<br>

The odd thing is the clicking the "das discovery" button in the web admin<br>

tool RTCC causes all the stations to start flowing again. The rediscovery<br>

part is that some years ago when we first noticed this, I put in a cron job<br>

to hit the "das discovery" url once every 15 minutes and the problem went<br>

away. Given limited brain cells, I promptly forgot about it. Not until I<br>

switched server machine, and forgot to transfer the cron job, did I<br>

remember the issue.<br>

<br>

It is very puzzling to me that if getting the stations back on line is as<br>

easy as clicking a url, then why in the world can't rtpd do it itself!?!??<br>

Have any of you seen this issue? Any suggestions on ways to deal with it<br>

other than a cron based das discovery? I should say we run a mixed network<br>

with other statins using either q330s or guralps, and only the refteks seem<br>

to have trouble noticing that the cell link is working.<br>

<br>

One other puzzle is that my understanding is that the rt130s will cache up<br>

to 99 minutes of data in the case of a lost connection. My experience is<br>

that you get the benefit of the cache only in cases of the outage lasting<br>

less than 99 minutes. If the outage is longer, then when the link comes up<br>

the rt130 starts sending real time data and never sends the previous 99<br>

minutes. If however the outage is less than the cache time, it will start<br>

sending the old cached data first. Seems weird that a 98 minute outage<br>

results in no data loss, but a 100 minute outage results in a 100 minute<br>

data loss.<br>

<br>

We have recent, but not the absolute latest versions of firmware, so I<br>

should probably upgrade those just in case. We have stations showing this<br>

issue with firmware at recent as 3.3.1 and I don't see anything in the<br>

release notes that would suggest newer firmware addresses this. RTPD on the<br>

server is the latest version, 2.1.9.0b.<br>

<br>

Here is some output of me running rtpid around the time I hit the "das<br>

discovery" button. I hit the button at 11:29:00 and all the stations had<br>

come back to life and sending data within 12 seconds of the discovery<br>

action.<br>

<br>

thanks,<br>

Philip<br>

<br>

2013:053-11:27:56 earthworm rtpid[3545] RTPID version 2.1.0.0<br>

2013:053-11:27:56 earthworm rtpid[3545] Options:<br>

2013:053-11:27:56 earthworm rtpid[3545]   Host      = localhost<br>

2013:053-11:27:56 earthworm rtpid[3545]   Port      = 2543<br>

2013:053-11:27:56 earthworm rtpid[3545]   Retry     = nonfatal<br>

2013:053-11:27:56 earthworm rtpid[3545]   Log file  = rtpid.log<br>

2013:053-11:27:56 earthworm rtpid[3545]   Verbose   = FALSE<br>

2013:053-11:27:56 earthworm rtpid[3545]   Timeout   = 60<br>

2013:053-11:27:56 earthworm rtpid[3545]   Attempts  = 9999<br>

2013:053-11:27:56 earthworm rtpid[3545] Attempting connection:<br>

localhost:2543<br>

2013:053-11:27:56 earthworm rtpid[3545] Successful connection:<br>

localhost:2543<br>

<br>

    ---- hit "DAS-DISCOVERY" at 11:29:00 ----<br>

<br>

2013:053-11:29:02 earthworm rtpid[3545] Unit A898 detected<br>

2013:053-11:29:03 earthworm rtpid[3545] Unit A064 detected<br>

2013:053-11:29:06 earthworm rtpid[3545] Unit A872 detected<br>

2013:053-11:29:12 earthworm rtpid[3545] Unit A900 detected<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/1586a6ed/attachment-0001.html" target="_blank">http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/1586a6ed/attachment-0001.html</a>><br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Fri, 22 Feb 2013 11:31:54 -0600<br>

From: Ian Billings <<a href="mailto:i.billings@reftek.com">i.billings@reftek.com</a>><br>

To: Philip Crotwell <<a href="mailto:crotwell@seis.sc.edu">crotwell@seis.sc.edu</a>>,<br>

        <a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a><br>

Cc: "Thomas J. Owens" <<a href="mailto:owens@seis.sc.edu">owens@seis.sc.edu</a>><br>

Subject: Re: [ANSS-netops] reftek and data stoppages [USGS]<br>

Message-ID: <<a href="mailto:aec7190bf6de0fefacaf958f291bd3a3@mail.gmail.com">aec7190bf6de0fefacaf958f291bd3a3@mail.gmail.com</a>><br>

Content-Type: text/plain; charset="windows-1252"<br>

<br>

Philip,<br>

<br>

<br>

<br>

Firstly the issue with RTP/RTPD links sleeping due to outage.  This points<br>

to something at the RTPD server end.  If a RTP/RTPD is declared down the<br>

130 will go into a sequence of sending server discovery packets to the RTPD<br>

host address every 6-8secs, for a 300sec on/120sec RTP sleeping, cycle<br>

until the link is re-established.  RTPD when sensing this unconditional<br>

sync, also known as server discover, will start to negotiate with the units<br>

RTP to bring the link up.  I would need to look at the RTPD log file to see<br>

if the unconditional syncs are coming into RTPD and if so what RTPD<br>

responds with.  I suspect there is a time out in the local firewall or<br>

router handling the traffic from the 130 units to the RTPD server.  And<br>

sending the Das discovery resets this time out.  A server discovery from<br>

RTPD to a 130 is different from what RTPDID sends out and what RTPD sends<br>

to a DAS if it receives an unconditional sync from it.  If you could send a<br>

RTPD log file I could at least confirm that server discoveries from the<br>

130?s are being seen by RTPD.<br>

<br>

<br>

<br>

Secondly your issue with 99minutes of buffered data.  The firmware in the<br>

130 is written in such a way that when the RTP/RTPD is declared down the<br>

RTPD thread data will be saved to RAM for the thread?s TOSS threshold.  In<br>

your case 99mins.  However if the link remains down for more than the TOSS<br>

threshold this RTPD thread data since the link down declaration is all<br>

deleted from.  Again in your case this happens at 99mins.  REF TEK?s logic<br>

to do this has been explained to allow RAM to be freed up to handle other<br>

thread data link Disk Thread because it is unlikely the link will<br>

re-establish if the TOSS threshold has been met and this old RTPD thread<br>

data will continue to get older etc, and therefore of lesser value when the<br>

link is re-established.  If you find you have link outages lasting on<br>

average 110mins then simply increase the TOSS threshold so most data will<br>

be recovered.<br>

<br>

<br>

<br>

Again please send me a RTPD log file that has a time window of known 130 to<br>

central link stability but no RTP/RTPD connection as well as the part of<br>

the log that shows before and after result of the user issued server<br>

discovery.<br>

<br>

<br>

<br>

Thanks,<br>

<br>

Ian Billings<br>

<br>

Field Technician<br>

<br>

*REF TEK ? A Division of Trimble Navigation*<br>

<br>

<a href="http://support.reftek.com" target="_blank">http://support.reftek.com</a><br>

<br>

Skype ian_billings1<br>

<br>

PH 214 440 1265<br>

<br>

PH 214 440 1289 (Direct)<br>

<br>

<br>

<br>

<br>

<br>

*From:* ANSS-netops [mailto:<a href="mailto:anss-netops-bounces@geohazards.usgs.gov">anss-netops-bounces@geohazards.usgs.gov</a>] *On<br>

Behalf Of *Philip Crotwell<br>

*Sent:* Friday, February 22, 2013 10:41 AM<br>

*To:* <a href="mailto:anss-netops@geohazards.usgs.gov">anss-netops@geohazards.usgs.gov</a><br>

*Cc:* Thomas J. Owens<br>

*Subject:* [ANSS-netops] reftek and data stoppages<br>

<br>

<br>

<br>

<br>

<br>

Hi all<br>

<br>

We have four stations with reftek 130s on cell modems going into earthworm<br>

via rtpd. I recently moved my server to new hardware and rediscovered an<br>

old problem. About every day or two some of the stations stop sending data<br>

even though the link is ok. I don't know what the initial cause is, maybe<br>

the cell link briefly died, but at the time we check on things, the cell<br>

link is fine but data is just not flowing.<br>

<br>

The odd thing is the clicking the "das discovery" button in the web admin<br>

tool RTCC causes all the stations to start flowing again. The rediscovery<br>

part is that some years ago when we first noticed this, I put in a cron job<br>

to hit the "das discovery" url once every 15 minutes and the problem went<br>

away. Given limited brain cells, I promptly forgot about it. Not until I<br>

switched server machine, and forgot to transfer the cron job, did I<br>

remember the issue.<br>

<br>

It is very puzzling to me that if getting the stations back on line is as<br>

easy as clicking a url, then why in the world can't rtpd do it itself!?!??<br>

Have any of you seen this issue? Any suggestions on ways to deal with it<br>

other than a cron based das discovery? I should say we run a mixed network<br>

with other statins using either q330s or guralps, and only the refteks seem<br>

to have trouble noticing that the cell link is working.<br>

<br>

One other puzzle is that my understanding is that the rt130s will cache up<br>

to 99 minutes of data in the case of a lost connection. My experience is<br>

that you get the benefit of the cache only in cases of the outage lasting<br>

less than 99 minutes. If the outage is longer, then when the link comes up<br>

the rt130 starts sending real time data and never sends the previous 99<br>

minutes. If however the outage is less than the cache time, it will start<br>

sending the old cached data first. Seems weird that a 98 minute outage<br>

results in no data loss, but a 100 minute outage results in a 100 minute<br>

data loss.<br>

<br>

We have recent, but not the absolute latest versions of firmware, so I<br>

should probably upgrade those just in case. We have stations showing this<br>

issue with firmware at recent as 3.3.1 and I don't see anything in the<br>

release notes that would suggest newer firmware addresses this. RTPD on the<br>

server is the latest version, 2.1.9.0b.<br>

<br>

Here is some output of me running rtpid around the time I hit the "das<br>

discovery" button. I hit the button at 11:29:00 and all the stations had<br>

come back to life and sending data within 12 seconds of the discovery<br>

action.<br>

<br>

thanks,<br>

<br>

Philip<br>

<br>

<br>

2013:053-11:27:56 earthworm rtpid[3545] RTPID version 2.1.0.0<br>

2013:053-11:27:56 earthworm rtpid[3545] Options:<br>

2013:053-11:27:56 earthworm rtpid[3545]   Host      = localhost<br>

2013:053-11:27:56 earthworm rtpid[3545]   Port      = 2543<br>

2013:053-11:27:56 earthworm rtpid[3545]   Retry     = nonfatal<br>

2013:053-11:27:56 earthworm rtpid[3545]   Log file  = rtpid.log<br>

2013:053-11:27:56 earthworm rtpid[3545]   Verbose   = FALSE<br>

2013:053-11:27:56 earthworm rtpid[3545]   Timeout   = 60<br>

2013:053-11:27:56 earthworm rtpid[3545]   Attempts  = 9999<br>

2013:053-11:27:56 earthworm rtpid[3545] Attempting connection:<br>

localhost:2543<br>

2013:053-11:27:56 earthworm rtpid[3545] Successful connection:<br>

localhost:2543<br>

<br>

    ---- hit "DAS-DISCOVERY" at 11:29:00 ----<br>

<br>

<br>

2013:053-11:29:02 earthworm rtpid[3545] Unit A898 detected<br>

2013:053-11:29:03 earthworm rtpid[3545] Unit A064 detected<br>

2013:053-11:29:06 earthworm rtpid[3545] Unit A872 detected<br>

2013:053-11:29:12 earthworm rtpid[3545] Unit A900 detected<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/61d46944/attachment-0001.html" target="_blank">http://geohazards.usgs.gov/pipermail/anss-netops/attachments/20130222/61d46944/attachment-0001.html</a>><br>

<br>

------------------------------<br>

<br>

Subject: Digest Footer<br>

<br>

_______________________________________________<br>

ANSS-netops mailing list<br>

<a href="mailto:ANSS-netops@geohazards.usgs.gov">ANSS-netops@geohazards.usgs.gov</a><br>

<a href="https://geohazards.usgs.gov/mailman/listinfo/anss-netops" target="_blank">https://geohazards.usgs.gov/mailman/listinfo/anss-netops</a><br>

<br>

<br>

------------------------------<br>

<br>

End of ANSS-netops Digest, Vol 44, Issue 7<br>

******************************************<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>Sent from my iNTERNETS!!!

</div>