[Udpcast] udpcast scalability?

Felix Rauch rauch at inf.ethz.ch
Fri Feb 25 00:30:27 CET 2005


On Thu, 24 Feb 2005, Ramon Bastiaans wrote:
[...]
> Because SystemImager's image'ing tools dont compress or image the files, we 
> need to cast an entire filesystem (lots of files) over the network. And 
> because udpcast only supports sending/receiving a single file and writing to 
> a single file descriptor, it can only write asynchronously to one file. 
> Because of this we use tar to pipe all files through on both receiver and 
> sender.

The idea is to send a whole partition, if possible. This can be
significantly faster than using tar (see below). We used to clone
whole disks or partitions on our 128-node cluster, which was pretty
fast: About 20 MByte/s over two Fast Ethernet links (we used our own
tool though, not udpcast).

> This is when the problem arose. When we didn't use tar, we could get high 
> speeds and the (network/harddisk) hardware seemed to become the limiting 
> factor. But only when using the --nosync writes. Because we use tar (which 
> obviously has no --nosync option), now tar became the bottleneck.

The problem with tar is that it has to deal with each file
individually, which causes many movements of the disk's head. Each
move to a track where the next file or its corresponding inode is
located, brings some latency, which will ultimately reduce throughput.

If you clone a single file (like a whole partition), then there are
only very few head movements and the movements are only to the next
track on the disk. Hence, you get higher throughput than with tar.

Of course, cloning a whole partition also copies empty blocks, which
is not strictly necessary. Therefore, you clone more data than when
using tar, but you can do it with higher throughput. Whether this is
actually a win depends on a number of factors, most importantly the
fill rate of your partition.

- Felix



More information about the Udpcast mailing list