[Udpcast] Reliability - MD5sum's don't match

Alain Knaff alain.knaff at lll.lu
Sun Jun 20 00:12:38 CEST 2004


On Saturday 19 June 2004 20:57, Matthew Cooke wrote:
> Hi,
>
> I wonder if someone could help me. I have started testing the
> UDPreceive/udpsend programs for rolling out linux onto refurbished PC's for
> a charity I work for. The idea is to role out linux onto 10-20 refurbished
> PC's (at once) before shipping them to charities/schools/colleges that
> really need them, mainly in Africa.
>
> I am using the command line version because we are not transfering a raw
> disk image we are transferring a tar'd tree and preparing the machines for
> custom hardware detection (as all the hardware is basically random).
>
> The tar file is about 1.2gig, and I am piping the output through both
> md5sum and tar to extract it. When I tried this on 1 machine everything was
> fine but today I tested it onto 5 machines in parallel and the MD5sum of
> the result on all 5 clients did not match the server version (they did
> however match each other) and the tar file appears to be slightly corrupted
> (though the machines did boot ok).
>
> So my question is "is it possible that the data is corrupted during
> multicasting?"

Is this repeateable? I.e. if you do two _separate_ transfers of the
same file to two sets of 5 machines, do the md5sum's of both transfers
match?

If so, it probably is unrelated to the transfer, but may have
something to do with how the tar file is made and transferred
(i.e. are there other intervening steps in transferring the file,
other than UDPcast)

If on the other hand, it is not repeatable (different md5sum's each
time, or sometimes good transfers, and sometimes bad), it may be
related to Ethernet packet corruption during the transfer. On good
equipment, this should be extremely rare, however not impossible.

> and if so "Is there anything I can do to make it work reliably?"

Well, if you are indeed seeing in-flight Ethernet packet corruption,
there is no easy way of makeing it go away. However, on the other
hand, you can make it very noticeable by chosing compresed transfer
(using lzop compression). The corruption will still occur, but will be
detected and lead to an aborted transfer, which in many cases is
preferable over having a lurking, undetected corruption. Of course,
lzop is only feasible if the corruption is not so frequent that a
second (and third, ..., and fourth, ...) try will fail as well.

>From our experience here (several years of transfers in half a dozen
schools), we've only observed such corruption twice. It was detected
because we used compression. After repeating the transfer, the problem
went away.

> There seem to be quite a few error correction parameters available, which
> ones should I try first?

Because packet corruption is so rare, none of the options
unfortunately deals with this case.

Most available options deal with ways to deal with packet _loss_
(rather than corruption) which in our experience is much more
frequent.

However, should the problem be confirmed, I'll introduce a new feature
to add CRC checksums to the invidual packets. Once available, this
will have corrupted packets rejected, leading to the activation of the
packet loss recovery algorithms. [Normally, the kernel is already
supposed to protect the UDP packets with such a checksum, but possibly
it fails under certain rare circumstances...]

> Any help much appreciated,
> Matt.

Alain





More information about the Udpcast mailing list