Wednesday, March 23, 2011

In Linux, how do I know if an ACK is received to a certain TCP packet?

Long story short: in Linux, how do I ensure that an ACK message is received for a certain TCP packet?

Full story:

I'm debugging an Asterisk/OpenH323 <-> Panasonic IP-GW16 issue.

An H323 connection involves two sessions: H225.0 and H245. These are just two TCP sessions, over which some data are transmitted.

Let's call them Session 1 (for H225.0) and Session 2 (for H245).

Session 1 has well-known TCP port number of 1720, while port for Session 2 is chosen at runtime.

Control flow goes as follows:

  1. Panasonic calls Asterisk: it opens Session 1 (TCP/1720) to Asterisk and sends a SETUP message over Session 1, which contains the port 2 that Panasonic will listen to.
  2. Asterisk sends Panasonic a CALL PROCEEDING message over Session 1
  3. Panasonic starts listening on the port 2
  4. Panasonic sends a TCP ACK over Session 1.
  5. Asterisk opens TCP Session 2 on port 2.

Order of steps 2 and 3 is important: Panasonic will not listen to port 2 unless it has received a CALL PROCEEDING message on step 2.

But in OpenH323 code, step 2 and step 5 are only several lines away.

That's why connection sometimes works in debug mode and quite never works in release.

It is clearly seen in a packet dump. I made a series of experiments, and in 52 cases out of 52, if step 5 goes before step 4, the connection fails; if not, the connection succeeds.

There are no other messages sent from Panasonic except that ACK in step 4, and it seems that the only way Asterisk may know that port 2 is listened to is receiving that ACK.

Of course I may implement a timed wait, but I want a cleaner solution.

So the question goes again: after sending a message over TCP connection in step 2, how do I know if an ACK is received to the packet containing the message?

From stackoverflow
  • In this specific case, I'd say you would find that your tcp_info structure would hold a nonzero tcp_info.tcpi_unacked. You'd get this via getsockopt(TCP_INFO).

    Note: unstable interface apparently.

  • The timing sequence seems odd, although if the Panasonic is using a proprietary O/S that might explain it.

    To clarify - AIUI - if the Panasonic was running a "normal" O/S, the ACK sent by it in stage 4 would happen immediately after the Panasonic's software has read() the data from the control TCP socket.

    Similarly the OpenH323 code call to write() (in step 2) shouldn't return (assuming that it's not a non-blocking socket!) until the ACK from the Panasonic has been received by the Asterisk server. That is how you're supposed to know that the ACK has been received.

    Essentially it seems that the Panasonic isn't doing the equivalent of listen() on the second socket until after it has read() the CALL PROCEEDING message. It looks like a race condition - sometimes Open323 will try to connect() before the other end is ready.

    When this happens, do you get ECONNREFUSED at the OpenH23 end?

    Alnitak : ok - this was a bit wrong - I'll edit. The ACK would normally be sent once the _O/S_ has received the packet, it doesn't rely on the application reading() the data.

0 comments:

Post a Comment