networking-forum.com
Community BlogCommunity Wiki * Register  * Search  * Login
View unanswered postsView active topics

All times are UTC - 6 hours [ DST ]



Post new topic Reply to topic  [ 37 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Mar 30, 2012 3:10 pm 
Offline
New Member
New Member

Joined: Fri Mar 30, 2012 3:03 pm
Posts: 13
Certs: CCNP, MCITP
We are troubleshooting connectivity issues across a layer two connection between sites across a provider. Users experience application hangs and timeouts when passing across this specific link.

Packet captures do NOT show packet DROPS, but show RE-TRANSMITS. It is the strangest thing.

I'm attaching the pcap files from the server, client and switchport on switch at server side. Retransmits start at frame 87.

Any assistance in explaining this behavior is appreciated.

Thanks,
Tarek


Attachments:
File comment: pcap files
cap14-exchserver.zip [87.73 KiB]
Downloaded 133 times
Top
 Profile  
 
PostPosted: Sat Mar 31, 2012 2:07 am 
Offline
New Member
New Member

Joined: Wed Oct 05, 2011 10:50 pm
Posts: 46
Did you check the client for any errors? It looks like the client does not accept packet 87 since there is almost 300ms period between packet 87 and 88. When exchange sends packet 88 (which is dup of packet 87), the client doesn't seem to accept that either. The client then sends a dup packet of 86 in packet 89. The connection times out and is reset.

On packet 229 and 230, they are re-sends because client did not replay back to packet 229 so exchange server re-sent after 300ms which is packet 230. The client does respond back to packet 229 almost immediately after receiving packet 230 so connection continues.

So it looks like packets are making it to client but client does not ack quickly enough or denies the packet and the server starts to re-send its packets. Have you done any debugs on the client side? Is this for everyone at site or a single client? If for everyone, do you have any encapsulation you add as packets that traverse the wan?


Top
 Profile  
 
PostPosted: Sat Mar 31, 2012 4:30 am 
Offline
Post Whore
Post Whore
User avatar

Joined: Wed Jun 17, 2009 11:28 am
Posts: 1579
Location: Longford Ireland
Certs: BSc computer network administration, CCNP, MCSE
Maybe try verying MTU size?

_________________
Good Luck,

David


Top
 Profile  
 
PostPosted: Sat Mar 31, 2012 4:40 am 
Offline
Member
Member
User avatar

Joined: Wed Jun 22, 2011 4:24 am
Posts: 161
Certs: CCNP , CCIP , 530010.
You didn't mention if you use a Cisco router or what L2 stuff you got going on over there between you and the provider or what L3 stuff .. The logical thing to me is not to look for client to router issues but to look at the WAN link. You have connectivity so L3 is ok if it's not a loop. I guess not. So check the interface for errors. If you have none it's something on L2,5. Does your provider run MPLS ? I would say that you should check the cabling but if no errors appear under the interface there is no problem there.
I can't look at the wireshark captures because I broke my kernel last night :)) Damn it ! :p

_________________
Stay the curse !


Top
 Profile  
 
PostPosted: Sun Apr 01, 2012 9:03 am 
Offline
New Member
New Member

Joined: Fri Mar 30, 2012 3:03 pm
Posts: 13
Certs: CCNP, MCITP
We are using a 3750 at the server end, and a 3560 at client end, SVIs on both switches handle routing between each other. No errors anywhere. Provider runs MPLS. Cisco TAC couldn't explain it after 6 weeks of troubleshooting and provider doesn't care since there are no packet drops.

server/client connection behaves normal when going across an IPSec VPN connection, which was the only connection between the two sites until now.

This WAN link has been like this every since it was put in place.

MTU is fine on switches, tests up to 9000 across provider.

Its just strange how the provider isn't touching frames, but the end points behave differently as you can see in the captures.

Thanks for all your input. This has been one of those nightmare problems -two months now and paying for the circuit - where no one knows why its broke but everyone still wants you to fix it.


Top
 Profile  
 
PostPosted: Tue Apr 03, 2012 8:31 am 
Offline
Junior Member
Junior Member

Joined: Sat Mar 07, 2009 1:36 pm
Posts: 81
interesting...

there are a lot or retransmissions. Lets get to the basics, a retransmission happens in case... from the senders perspective the receiver did not ACK the segment or the ACK gets lost.

hmm with my little brain, I could see that what the client sends is also received by the server so then ... maybe it's arriving at the server very late for some reason like congestion on the path... just an idea from me.

lets wait for the experts :)


Top
 Profile  
 
PostPosted: Thu Apr 05, 2012 8:04 am 
Offline
New Member
New Member

Joined: Fri Mar 30, 2012 3:03 pm
Posts: 13
Certs: CCNP, MCITP
I thought of that as well. But even though receiving side responds with ACK that we see the sender does receive, the sender continues to retransmit the same packet over and over again.

If the ACK was received a little late, the sender should stop sending that same packet after it receives the ACK, but it doesn't???


Top
 Profile  
 
PostPosted: Thu Apr 05, 2012 3:02 pm 
Offline
Junior Member
Junior Member

Joined: Sat Mar 07, 2009 1:36 pm
Posts: 81
tarjall wrote:
If the ACK was received a little late, the sender should stop sending that same packet after it receives the ACK, but it doesn't???

If the ACK comes too late then it could also be invalid...

what I would do is to reduce the maximum segment size and see what then happens.


Top
 Profile  
 
PostPosted: Thu Apr 05, 2012 10:52 pm 
Offline
New Member
New Member

Joined: Fri Mar 30, 2012 3:03 pm
Posts: 13
Certs: CCNP, MCITP
We have tried lowering MSS to as low as 1200, but saw the same symptoms.


Top
 Profile  
 
PostPosted: Sat Apr 07, 2012 1:56 pm 
Offline
Junior Member
Junior Member

Joined: Sat Mar 07, 2009 1:36 pm
Posts: 81
When this is so far ... then I would also suppose that u have even tried this from different computers and different TCP services... I would like to know if and when u find a solution to this interesting issue.

Thanks


Top
 Profile  
 
PostPosted: Sun Apr 08, 2012 11:02 am 
Offline
CCIE #24973
CCIE #24973
User avatar

Joined: Fri Mar 02, 2007 5:18 am
Posts: 196
Location: Bahrain
Certs: CCNP,CCSP,CCIE (R&S)#24973
can you check your wan facing interfaces if there is any CRC error there, clear the counters and check again.

please post back.

_________________
"Nothing Is Limited, Except Our Understanding To The Universe"


Top
 Profile  
 
PostPosted: Mon Apr 09, 2012 9:42 am 
Offline
New Member
New Member

Joined: Fri Mar 30, 2012 3:03 pm
Posts: 13
Certs: CCNP, MCITP
Same symptoms with different machines.

No errors on any interfaces.

Provider came out and ran RFC 2544 tests and all came back clear.

Strange thing is if I run ipsec on top of the connection, applications work fine. No retransmits.

I've tried ESP-Null, and AH, thing work fine using both.


Top
 Profile  
 
PostPosted: Mon Apr 09, 2012 2:05 pm 
Offline
CCIE #24973
CCIE #24973
User avatar

Joined: Fri Mar 02, 2007 5:18 am
Posts: 196
Location: Bahrain
Certs: CCNP,CCSP,CCIE (R&S)#24973
could you please post simple diagram showing all devices across the path that showing the Ipsec path and MPLS path.
sorry for that but to help us help you on clear way.

_________________
"Nothing Is Limited, Except Our Understanding To The Universe"


Top
 Profile  
 
PostPosted: Fri Apr 13, 2012 2:19 am 
Offline
Post Whore
Post Whore
User avatar

Joined: Mon Nov 16, 2009 8:10 pm
Posts: 2523
Location: San Diego, CA
Certs: CCNP, BCNE, Network+, Security+
It looks like these were simultaneous packet captures? I went ahead and combined them all into one and used a display filter of tcp.stream eq 1. This is likely above my head, but one thing that looks odd to me is seeing the same IP address with a source MAC of two different devices.

When the RPC traffic is going back and forth before the retransmits start, when 172.16.36.9 sends requests, the L2 header shows
Code:
Ethernet II, Src: Cisco_c9:3e:90 (e8:b7:48:c9:3e:90), Dst: Cisco_1d:4c:10 (00:1c:f6:1d:4c:10)
.

However, when the retransmits start, the L2 header shows
Code:
Ethernet II, Src: Hewlett-_3c:ab:ad (00:1e:0b:3c:ab:ad), Dst: Cisco_c9:3e:91 (e8:b7:48:c9:3e:91)
for the requests originating from 172.16.36.9.

Both the Source and Destination MAC changes when the "retransmits" begin appearing.

EDIT - I'm even seeing VMWare MACs listed for .7 sometimes.

_________________
Regards,

Steven King
San Diego Cisco User Group - http://www.sdcug.com
"The only time something is impossible is when you think it is." - Kevin Corbin, CCIE #11577


Top
 Profile  
 
PostPosted: Fri Apr 13, 2012 2:28 am 
Offline
Post Whore
Post Whore
User avatar

Joined: Mon Nov 16, 2009 8:10 pm
Posts: 2523
Location: San Diego, CA
Certs: CCNP, BCNE, Network+, Security+
This may not have anything to do with it, but keep in mind that Wireshark can "incorrectly" identify duplicated traffic as retransmits. I don't see at a quick glance any real delays between the requests and responses, and the changes in MAC addresses makes me wonder if this is somehow duplicated traffic.

_________________
Regards,

Steven King
San Diego Cisco User Group - http://www.sdcug.com
"The only time something is impossible is when you think it is." - Kevin Corbin, CCIE #11577


Top
 Profile  
 
PostPosted: Fri Apr 13, 2012 2:37 am 
Offline
Post Whore
Post Whore
User avatar

Joined: Mon Nov 16, 2009 8:10 pm
Posts: 2523
Location: San Diego, CA
Certs: CCNP, BCNE, Network+, Security+
You know what, after looking at it again, I'm betting this is duplicated traffic.

Put them all together, look at tcp.stream eq 3. It looks like the same series of requests and responses, 3 times.

Also it highlights a problem which I won't troubleshoot, but the error code is listed on this site:

http://www.mombu.com/microsoft/microsoft/t-nca-s-fault-context-mismatch-1701756.html

_________________
Regards,

Steven King
San Diego Cisco User Group - http://www.sdcug.com
"The only time something is impossible is when you think it is." - Kevin Corbin, CCIE #11577


Top
 Profile  
 
PostPosted: Mon Apr 16, 2012 11:10 am 
Offline
New Member
New Member

Joined: Fri Mar 30, 2012 3:03 pm
Posts: 13
Certs: CCNP, MCITP
Here are the diagrams for all the scenarios we tested:

direct connections, retransmits happen:
Attachment:
basic.JPG
basic.JPG [ 24.79 KiB | Viewed 1823 times ]


workstation behind 3560, but routes directly to 3750, retransmits happen:
Attachment:
workstation-provider-vlan-3560.JPG
workstation-provider-vlan-3560.JPG [ 24.51 KiB | Viewed 1823 times ]


bypassing provider and plugging directly into 3750, routing is same as diagram workstation-provider-vlan-3560, retransmits don't happen and applications work:
Attachment:
workstation-provider-vlan-3750.JPG
workstation-provider-vlan-3750.JPG [ 24.63 KiB | Viewed 1823 times ]


Adding routers in the mix and lowering MSS. No change/not working:
Attachment:
with-routers.JPG
with-routers.JPG [ 31.97 KiB | Viewed 1823 times ]


Only scenario where things work across provider:
Attachment:
with-routers.-and-vpn.JPG
with-routers.-and-vpn.JPG [ 34.13 KiB | Viewed 1823 times ]


Thanks

PS: I'd be glad to mail over a VISA giftcard to anyone who can solve this (or lead in correct direction).


Top
 Profile  
 
PostPosted: Mon Apr 16, 2012 12:35 pm 
Offline
Post Whore
Post Whore
User avatar

Joined: Mon Nov 16, 2009 8:10 pm
Posts: 2523
Location: San Diego, CA
Certs: CCNP, BCNE, Network+, Security+
How is this a L2 issue if you're performing inter-VLAN routing? You're sending traffic from VLAN 36 into VLAN 10.

_________________
Regards,

Steven King
San Diego Cisco User Group - http://www.sdcug.com
"The only time something is impossible is when you think it is." - Kevin Corbin, CCIE #11577


Top
 Profile  
 
PostPosted: Mon Apr 16, 2012 1:42 pm 
Offline
Post Whore
Post Whore
User avatar

Joined: Mon Nov 16, 2009 8:10 pm
Posts: 2523
Location: San Diego, CA
Certs: CCNP, BCNE, Network+, Security+
Here are my notes from One Note after looking at your captures and diagrams; unfortunately the color coding I used won't carry over. I used the first pic as a reference. Please confirm the MAC addresses/IPs, traffic flow, and portions where I have a quetion mark so that I have a better understanding:

Code:
VMs (Exchange Server?)
IP:  172.16.10.7
MAC: 00:50:56:9c:5e:e7
GW: 172.16.10.2 (MAC: 9c:af:ca:64:2c:42?)

Workstation?
IP: 172.16.36.9
MAC: 00:1e:0b:3c:ab:ad?
GW: 172.16.36.5 (MAC: e8:b7:48:c9:3e:91?)

Traffic path from workstation? to VMs(Exchange Server?):
VLAN36, Workstation -> 172.16.36.5 -> VLAN918, 172.22.22.5 -> 172.22.22.4 -> VLAN 10, 172.16.10.2

 Client pcap:
   1. [FRAME 1]Traffic initiated from workstation? to VMs(Exchange Server?)
      a. L2 Header: Ethernet II, Src: Hewlett-_3c:ab:ad (00:1e:0b:3c:ab:ad), Dst: Cisco_c9:3e:91 (e8:b7:48:c9:3e:91 - Workstation Default Gateway?)
   2. Response from VMs(Exchange Server?)
      a. L2 Header: Ethernet II, Src: Cisco_c9:3e:91 (e8:b7:48:c9:3e:91), Dst: Hewlett-_3c:ab:ad (00:1e:0b:3c:ab:ad)
      
<At frame 88 (First retransmit)>
MAC info is the same; seems correct
Retransmits appear to be sent when there is no response in ~299ms or greater

Server pcap:
   1. [FRAME 1]Traffic initiated from workstation? to VMs(Exchange Server?)
      a. L2 header: Ethernet II, Src: Cisco_1d:4c:11 (00:1c:f6:1d:4c:11), Dst: Vmware_9c:5e:e7 (00:50:56:9c:5e:e7)
   2. Response sent to Workstation?
      a. Ethernet II, Src: Vmware_9c:5e:e7 (00:50:56:9c:5e:e7), Dst: Cisco_64:2c:42 (9c:af:ca:64:2c:42  - VMs(Exchange Server?) Default Gateway?)
      
<At frame 72 (First retransmit)>
   1. Retransmit sent due to no response in 307ms to frame 71
      a. L2 header: Ethernet II, Src: Vmware_9c:5e:e7 (00:50:56:9c:5e:e7), Dst: Cisco_64:2c:42 (9c:af:ca:64:2c:42)
   2. At 309ms from frame 71, in frame 73, ACK is sent in response to frame 71


So if you haven't noticed already, I don't see alot except:

1. In your server pcap, it shows the server sending traffic off subnet via what I assume is its default GW with MAC 9c:af:ca:64:2c:42?
2. However, in the beginning of the capture, it is receiving traffic from the workstation with source MAC 00:1c:f6:1d:4c:11.

Something there doesn't click. I would expect to see your server's default GW MAC be the source when receiving traffic from off-subnet. Can you please tell me what these MAC addresses are?
9c:af:ca:64:2c:42
00:1c:f6:1d:4c:11

It appears that the app is sensitive to around ~300ms with no response.

Are you sure this is a network problem? Looking at the time stamps on the server/client pcaps, it would appear that the server receives everything the client sends and vice versa. There is a ~4 second delta between the two, but that's probably just difference in clocks. Either that, or it has nothing to do with the problem anyway because the 4 second delta is present when you don't have retransmits.

This makes me believe the client isn't sending the ACK the server is looking for, so it retransmits. It's not like I see anything additional sent from the client capture and it's retransmitting anyway. Unless I'm missing something.

_________________
Regards,

Steven King
San Diego Cisco User Group - http://www.sdcug.com
"The only time something is impossible is when you think it is." - Kevin Corbin, CCIE #11577


Top
 Profile  
 
PostPosted: Mon Apr 16, 2012 2:06 pm 
Offline
Post Whore
Post Whore
User avatar

Joined: Mon Nov 16, 2009 8:10 pm
Posts: 2523
Location: San Diego, CA
Certs: CCNP, BCNE, Network+, Security+
These aren't raw captures are they? Does the server or client have multiple NICs? Have you checked NIC bonding?

_________________
Regards,

Steven King
San Diego Cisco User Group - http://www.sdcug.com
"The only time something is impossible is when you think it is." - Kevin Corbin, CCIE #11577


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 37 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: tzmueller and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group