Why would I use bup and not rsync/rsnapshot?

1,832 views
Skip to first unread message

Dieter_be

unread,
Oct 15, 2010, 8:43:05 AM10/15/10
to bup-list
Hi,
I have read the bup readme and the design article (well, some parts i
just skimmed over. very entertaining btw)
But I still don't get why one should or would use bup over rsnapshot
or rsync.

Is this more a proof-of-concept kind of thing "because git is so cool"
or are there actually benefits over rsnapshot?
One low hanging fruit with rsnapshot is disk space usage: as soon as a
file is renamed or has even only 1 bit difference, it's stored again.
I'm aware of that. But in terms of transmission efficiency (network
traffic and time consumption), does bup offer anything that rsync
doesn't? (I'm speaking for the sole use case of doing regular backups
of VM images btw)


FWIW, I think the actual filesystem is often a good place to implement
deduplication (because it's transparent, works for any application,
etc). I'm personally looking forward to btrfs which brings
compression, deduplication, snapshotting etc all in the filesystem
itself. Although all that stuff will probably be decoupled from
userspace. (as in: if you want to make backups to a btrfs volume,
your synchronisation program will need to do its own compression and
deduplication, which is not optimal. But that's another story)

FWIW 2: I wrote an (unfinished) rsync benchmarking tool[*], with one
of the testruns measuring the efficiency (in bytes sent, and time
taken) of rsyncing VM image snapshots.
You can easily hack/extend it for more use cases or even different
synchronisation backends, so pull requests welcome ;-)

Dieter

[*] http://dieter.plaetinck.be/rsyncbench_an_rsync_benchmarking_tool

Joe Beda

unread,
Oct 15, 2010, 9:16:52 AM10/15/10
to Dieter_be, bup-list
The disk space savings is non-trivial.  If you are, say, editing metadata on *many* image files, it is nice that you don't have duplicate copies.  

Bup is also good in the face of file renames/moves.  rsync/rsnapshot doesn't handle that case at all.

Note that file system dedup only works at block boundaries. Bup has a stable splitting mechanism that works well in the face of insertions and deletions in the file.  And I don't trust it -- I have friends that have had issues with ZFSs dedup functionality.  And ZFS is much more mature that btrfs.

Joe

Zoran Zaric

unread,
Oct 15, 2010, 9:23:54 AM10/15/10
to Dieter_be, bup-...@googlegroups.com
On 15.10.2010 14:43, Dieter_be wrote:
> Hi,

Hey,

> I have read the bup readme and the design article (well, some parts i
> just skimmed over. very entertaining btw)
> But I still don't get why one should or would use bup over rsnapshot
> or rsync.

bup provides botz deduplication and compression.

Say you have 2 Debian servers. both of them have a bunch of data in
common. (This can be a media library on many computers, youre email or
whatever).

Now you backup the first server. bup's magic happens. files are split up
in chunks, those are packed together in packfiles, nothing really to
bother you. If you want to know the "magic" behind it please read the
DESIGN document or ask for specific explanations.

When finished you start a backup on the second server. Here again files
are split up. Duplicate chunks aren't saved a second time, they are just
referenced. This is what deduplaction is.

> Is this more a proof-of-concept kind of thing "because git is so cool"
> or are there actually benefits over rsnapshot?
> One low hanging fruit with rsnapshot is disk space usage: as soon as a
> file is renamed or has even only 1 bit difference, it's stored again.
> I'm aware of that. But in terms of transmission efficiency (network
> traffic and time consumption), does bup offer anything that rsync
> doesn't? (I'm speaking for the sole use case of doing regular backups
> of VM images btw)

bup uses a algorithm similar to the one that rsnapshot uses for
efficient transmissions. bup uses it for deduplication.

VM images is where bup shines and more or less was designed for: You
have huge files that change a little.

I'm sorry but don't have to much time at the moment, please skim through
the DESIGN document to read what bup does.

> FWIW 2: I wrote an (unfinished) rsync benchmarking tool[*], with one
> of the testruns measuring the efficiency (in bytes sent, and time
> taken) of rsyncing VM image snapshots.
> You can easily hack/extend it for more use cases or even different
> synchronisation backends, so pull requests welcome ;-)

I'll have a look at your benchmarking tool ASAP to see how we can
possibly use it to benchmark bup as well.

I did some "benchmarks" some time ago [1]. I imported my rsnapshot
backups to bup.

TL;TR:
rsnapshot: 12.6G
bup: 4.6G

> Dieter

Thanks for your interest!
Zoran
>
> [*] http://dieter.plaetinck.be/rsyncbench_an_rsync_benchmarking_tool

[1]
http://groups.google.com/group/bup-list/browse_thread/thread/8a426b233554670/e67926cfd69900fb?lnk=gst&q=importing#e67926cfd69900fb

Dieter Plaetinck

unread,
Oct 15, 2010, 9:46:02 AM10/15/10
to bup-...@googlegroups.com
Okay,
so we can expect much more efficient file storage in comparison to rsnapshot.  That's to be expected.
But my main concern is transmission traffic and duration; which I guess will be comparable to rsync then.
Would be interesting to have some comparison numbers (I personally have no time/interest right now to bench bup, sorry)

Dieter

green

unread,
Oct 15, 2010, 10:13:56 AM10/15/10
to bup-...@googlegroups.com
Dieter Plaetinck wrote at 2010-10-15 07:46 -0600:
> so we can expect much more efficient file storage in comparison to
> rsnapshot. That's to be expected.

> But my main concern is transmission traffic and duration; which I guess
> will be comparable to rsync then.

Except in the case of file renames/moves. Bup
1. saves backup space,
2. saves transfer time, and
3. keeps old snapshots as well.

With rsync transfer time is greater, and backup space is greater unless old
snapshots are removed.

signature.asc

Zoran Zaric

unread,
Oct 15, 2010, 10:17:32 AM10/15/10
to Dieter Plaetinck, bup-...@googlegroups.com
if you tell me the parameters for a benchmark I'll be happy to do it.

just my quick thoughts:

two servers get backed up to another one. I'll do it with Amazon EC2
instances. I'll measure the traffic using NIC counters.

I'll do 3 runs:

Initial backup

a second without changed data

a third with some added data. I'll try to mimic a changing VM Image by
generating a 1G textfile and changing some lines somewhere in the middle.

I'll do all of this for two servers (with different fake VM images) for
both bup and rsnapshot.

I'll measure:
* backup time
* transfered data
* backup space

Any thoughts on that?

Zoran

Dieter Plaetinck

unread,
Oct 16, 2010, 4:00:52 AM10/16/10
to bup-...@googlegroups.com
On Fri, Oct 15, 2010 at 4:17 PM, Zoran Zaric <li...@zoranzaric.de> wrote:
if you tell me the parameters for a benchmark I'll be happy to do it.

just my quick thoughts:

two servers get backed up to another one. I'll do it with Amazon EC2
instances. I'll measure the traffic using NIC counters.
 
Nic counters don't seem 100% accurate.
Look at my rsyncbench tool, where I use tcpdump to match the exact traffic


I'll do 3 runs:

Initial backup

a second without changed data

a third with some added data. I'll try to mimic a changing VM Image by
generating a 1G textfile and changing some lines somewhere in the middle.

You could also use your real images (if you can get them out of bup, that is :)

I'll do all of this for two servers (with different fake VM images) for
both bup and rsnapshot.

I'll measure:
 * backup time
 * transfered data
 * backup space

Any thoughts on that?

Zoran

that seems good

Zoran Zaric

unread,
Oct 16, 2010, 6:57:32 AM10/16/10
to Oliver Dietz, bup-...@googlegroups.com
On 16.10.2010 10:00, Dieter Plaetinck wrote:
> On Fri, Oct 15, 2010 at 4:17 PM, Zoran Zaric <li...@zoranzaric.de> wrote:
>> two servers get backed up to another one. I'll do it with Amazon EC2
>> instances. I'll measure the traffic using NIC counters.
>
> Nic counters don't seem 100% accurate.
> Look at my rsyncbench tool, where I use tcpdump to match the exact traffic

Allright I'll do so.

>> a third with some added data. I'll try to mimic a changing VM Image by
>> generating a 1G textfile and changing some lines somewhere in the middle.
>>
>
> You could also use your real images (if you can get them out of bup, that is
> :)

My Upstream connection isn't to good, so uploading some VM images isn't
an option.

Avery Pennarun

unread,
Oct 16, 2010, 11:19:13 AM10/16/10
to Dieter Plaetinck, bup-...@googlegroups.com
On Sat, Oct 16, 2010 at 8:00 AM, Dieter Plaetinck
<dieterp...@gmail.com> wrote:
> On Fri, Oct 15, 2010 at 4:17 PM, Zoran Zaric <li...@zoranzaric.de> wrote:
>> if you tell me the parameters for a benchmark I'll be happy to do it.
>>
>> just my quick thoughts:
>>
>> two servers get backed up to another one. I'll do it with Amazon EC2
>> instances. I'll measure the traffic using NIC counters.
>
> Nic counters don't seem 100% accurate.
> Look at my rsyncbench tool, where I use tcpdump to match the exact traffic

What do you mean "not 100% accurate"? There's no reason I can imagine
that the counters on eth0/eth1/etc shouldn't be anything but accurate.

The down side is that you'll get things like TCP headers and
retransmits thrown into your count, as well as traffic on any other
ports you use at the time. You might argue that those are in fact
*more* accurate than not including them, since if (say) bup sent a
whole bunch of one-byte packets, you're technically paying for more
TCP headers. Of course, neither bup nor rsnapshot nor rsync do
anything like that so it doesn't matter.

Beware that tcpdump can also drop packets sometimes, though.

When I'm testing stuff, I usually use iptables accounting rules.

iptables -A OUTPUT -p tcp --port 22 -j ACCEPT
...
iptables -nvL OUTPUT

The rule you added should then have counters for how many bytes
matched that rule. (Note: I haven't tested the above commands, so
there might be a typo or two.)

Have fun,

Avery

Zoran Zaric

unread,
Oct 16, 2010, 10:26:52 PM10/16/10
to Dieter Plaetinck, Avery Pennarun, bup-...@googlegroups.com
Ok, I did my benchmark.

# 1. run - first backup
## bup
time: 219 s
transferred: 165417,61 KB
disk space: 156804 KB

## rsnapshot
time: 60s
transferred: 888294,84 KB
953708 KB

# 2. run
## bup
time: 0 s
transferred: 21,65 KB
disk space: 156820 KB

## rsnapshot
time: 4 s
transferred: 1270,8 KB
disk space: 968784 KB

# 3. run - generated a 1G fake-image with data from /dev/urandom on each
## bup
time: 691 s
transferred: 2208538,07 KB
disk space: 2277004 KB

## rsnapshot
time: 133 s
transferred: 2190208,71 KB
disk space: 3083116 KB

# 4. run
## bup
time: 12 s
transferred: 21,99 KB
disk space: 2281276 KB

## rsnapshot
time: 4 s
transferred: 1239,87 KB
disk space: 3098200 KB

# 5. run - changed 1M in the middle of the fake image file
## bup
time: 106 s
transferred: 2249,64 KB
disk space: 2281920 KB

## rsnapshot
time: 119 s
transferred: 3696,62 KB
disk space: 5212580 KB

# total
## bup
time: 1037 s
transferred: 2376248,96 KB
disk space: 2281920 KB

## rsnapshot
time: 320 s
transferred: 3084710,84
disk space: 5212580 KB

If anyone has questions feel free to ask.

Zoran

Avery Pennarun

unread,
Oct 16, 2010, 10:37:49 PM10/16/10
to Zoran Zaric, Dieter Plaetinck, bup-...@googlegroups.com
On Sun, Oct 17, 2010 at 2:26 AM, Zoran Zaric <li...@zoranzaric.de> wrote:
> # 1. run - first backup
> ## bup
> time: 219 s
> transferred: 165417,61 KB
> disk space: 156804 KB
>
> ## rsnapshot
> time: 60s
> transferred: 888294,84 KB
> 953708 KB

Man, bup is slow. Definitely need to work on that :)

Thanks for all your work!

Have fun,

Avery

green

unread,
Oct 16, 2010, 10:44:13 PM10/16/10
to bup-...@googlegroups.com
Avery Pennarun wrote at 2010-10-16 20:37 -0600:
> On Sun, Oct 17, 2010 at 2:26 AM, Zoran Zaric <li...@zoranzaric.de> wrote:
> > # 1. run - first backup
> > ## bup
> > time: 219 s
> > transferred: 165417,61 KB
> > disk space: 156804 KB
> >
> > ## rsnapshot
> > time: 60s
> > transferred: 888294,84 KB
> > 953708 KB
>
> Man, bup is slow. Definitely need to work on that :)

I suppose that this does not really show bup's advantage when backing up
multiple similar systems and moving/renaming big files or directories between
snapshots.

signature.asc

Zoran Zaric

unread,
Oct 16, 2010, 10:52:01 PM10/16/10
to bup-...@googlegroups.com

Well, two instances of the same image are pretty much similar systems.

Sure I could have copied and/or moved around my fake images, transferred
around between the server, but I think it's pretty clear where bup
shines. Take a look at the rest of the "benchmark".

I think disk space- and transfer-wise looks pretty good.

What would your improvements be?

Zoran

Avery Pennarun

unread,
Oct 16, 2010, 10:56:34 PM10/16/10
to green, bup-...@googlegroups.com
On Sun, Oct 17, 2010 at 2:44 AM, green <greenfr...@gmail.com> wrote:
> I suppose that this does not really show bup's advantage when backing up
> multiple similar systems and moving/renaming big files or directories between
> snapshots.

On the other hand, by the time you've obtained a 9x performance
increase, adding file renames into that might be just bragging :)

Have fun,

Avery

Zoran Zaric

unread,
Oct 17, 2010, 1:19:25 AM10/17/10
to bup-...@googlegroups.com, Dieter_be
Dieter, just FYI:

I wrote a import-rsnapshot command, which just needs some testcases
before I submit the patches. I pushed it to my github repo:

http://github.com/zoranzaric/bup/tree/import-rsnapshot

Feel free to test and use it. It should make the transition from
rsnapshot to bup pretty easy.

Keep in mind that bup's master (and the import-rsnapshot branch) don't
save metadata like permissions, yet.

Rob is working on it and it'll be great.

Zoran

Dieter Plaetinck

unread,
Oct 17, 2010, 7:03:12 AM10/17/10
to Avery Pennarun, bup-...@googlegroups.com
On Sat, Oct 16, 2010 at 5:19 PM, Avery Pennarun <apen...@gmail.com> wrote:
On Sat, Oct 16, 2010 at 8:00 AM, Dieter Plaetinck
<dieterp...@gmail.com> wrote:
> On Fri, Oct 15, 2010 at 4:17 PM, Zoran Zaric <li...@zoranzaric.de> wrote:
>> if you tell me the parameters for a benchmark I'll be happy to do it.
>>
>> just my quick thoughts:
>>
>> two servers get backed up to another one. I'll do it with Amazon EC2
>> instances. I'll measure the traffic using NIC counters.
>
> Nic counters don't seem 100% accurate.
> Look at my rsyncbench tool, where I use tcpdump to match the exact traffic

What do you mean "not 100% accurate"?  There's no reason I can imagine
that the counters on eth0/eth1/etc shouldn't be anything but accurate.

The down side is that you'll get things like TCP headers and
retransmits thrown into your count, as well as traffic on any other
ports you use at the time.  You might argue that those are in fact
*more* accurate than not including them, since if (say) bup sent a
whole bunch of one-byte packets, you're technically paying for more
TCP headers.  Of course, neither bup nor rsnapshot nor rsync do
anything like that so it doesn't matter.

Yes that's pretty much what I meant.  I was also thinking about arp traffic and basically any traffic not caused by the application you're benchmarking
 

Beware that tcpdump can also drop packets sometimes, though.

Oh, really? I didn't know

Dieter Plaetinck

unread,
Oct 17, 2010, 7:09:39 AM10/17/10
to Zoran Zaric, Avery Pennarun, bup-...@googlegroups.com
On Sun, Oct 17, 2010 at 4:26 AM, Zoran Zaric <li...@zoranzaric.de> wrote:
Ok, I did my benchmark.

# 1. run - first backup
## bup
time: 219 s
transferred: 165417,61 KB
disk space: 156804 KB

## rsnapshot
time: 60s
transferred: 888294,84 KB
953708 KB

what happened here? is this an initial import from nothing to full disk image?
remarkable savings though

# 2. run
## bup
time: 0 s
transferred: 21,65 KB
disk space: 156820 KB

## rsnapshot
time: 4 s
transferred: 1270,8 KB
disk space: 968784 KB

what happened here? a no-op? or a small change?
 

# 3. run - generated a 1G fake-image with data from /dev/urandom on each
## bup
time: 691 s
transferred: 2208538,07 KB
disk space: 2277004 KB

## rsnapshot
time: 133 s
transferred: 2190208,71 KB
disk space: 3083116 KB

need more details.. what did you sync to where.. and what did you store, on which nodes?
I'm gonna stop commenting here. Please provide more info on what exactly you did at each step
 

Zoran Zaric

unread,
Oct 17, 2010, 7:19:49 AM10/17/10
to Dieter Plaetinck, Avery Pennarun, bup-...@googlegroups.com
On 17.10.2010 13:09, Dieter Plaetinck wrote:
> On Sun, Oct 17, 2010 at 4:26 AM, Zoran Zaric <li...@zoranzaric.de> wrote:
>
>> Ok, I did my benchmark.
>>
>> # 1. run - first backup
>> ## bup
>> time: 219 s
>> transferred: 165417,61 KB
>> disk space: 156804 KB
>>
>> ## rsnapshot
>> time: 60s
>> transferred: 888294,84 KB
>> 953708 KB
>>
>
> what happened here? is this an initial import from nothing to full disk
> image?
> remarkable savings though

Exactly.

Zoran

Dieter Plaetinck

unread,
Oct 18, 2010, 10:24:49 AM10/18/10
to bup-...@googlegroups.com
Forwarding to list.
My mistake for sending to Zoran instead of the list. (although I choose to blame gmail, whose replying behavior is kind of odd)

---------- Forwarded message ----------
From: Zoran Zaric <li...@zoranzaric.de>
Date: Mon, Oct 18, 2010 at 1:44 PM
Subject: Re: Why would I use bup and not rsync/rsnapshot?
To: Dieter Plaetinck <dieterp...@gmail.com>


On 18.10.2010 13:18, Dieter Plaetinck wrote:
> Can you provide info on what you did for each single step? Otherwise it's
> really hard to understand what's going on.

Sure.

I did bup backups from the backup server over ssh on the servers:
ssh server1 "bup index -u /; bup save -r backupserver: -n server1 /bin
/boot /etc /home" (list of directories not complete, i accidentally
deleted my log...

0. Configuration

I configured rsnapshot to exclude all pseudo filesystems like /dev /sys
/selinux etc.

1. run
Initial  backup of both systems.
i ran the first "rsnaphsot hourly" for rsnapshot. and the first bup save
commands

2. run
Just another backup run, so see hof efficiant backups with little to no
changes are stored.

before 3. run
generated a 1G file with data from /dev/urandom on each server, so we
have 2G of unique data, 1G each server.

the command was something like

dd if=/dev/urandom of=/home/testfile bs=1M count=1024

I ran this on both servers.

4. run
another plain backup step with little to no changes

before 5.run
change the 511. 1M block in each testfile.

something like
cd /home
dd if=testfile of=testfile.tmp bs=1M count=512 skip=512
dd if=/dev/urandom of=testfile bs=1M count=1 seek=511
dd if=testfile.tmp of=testfile bs=1M count 512 seek=512
rm testfile.tmp

> Dieter

I hope it's more clear now. I'm sorry i didn't provide the needed
information earlier.

Zoran

Dieter Plaetinck

unread,
Oct 18, 2010, 11:15:38 AM10/18/10
to bup-...@googlegroups.com
So, going from step 2 to 3, you just added a 1GB file to two servers, and you backed up both servers to the backup server in both bup and rsnapshot.
Bup went from 156MB to 2277MB ( 2121MB diff) , and rsnapshot from 968 MB to 3083 MB (2115 MB diff), so this depicts a 100% addition and both have nearly the same ~2GB transferred, so this shows that an "initial import" has very little optimisation.
Clearly git is not able to do any special dedup (which is not suprising, since the image is 100% random, I guess real VM images have some room for dedup) or compression (which suprises me a bit, I thought git stores blobs compressed)

Then you do a "change the 511. 1M block" (what does that mean?) (and have some other changes - this bothers me a bit btw, I'm only interested in syncing raw images, not all kinds of other content that changes, especially not uncontrolled like in your test)
results being:
bup:   2281- 2277 = 4MB difference
rsync:5212 - 3083 = 2129 MB difference,
bytes being sent because of the change to the 1GB file:
bup: 2.2MB
rsync: 3.7MB
So, what happens here, 2 images have 1MB changed, so ~2MB transfer is needed, which bup does nicely and rsync is a bit less efficient, and then rsnapshot needs to save the full 1GB files again, causing the big 2GB storage increase.

It isn't really the benchmark I was expecting, but the numbers are clear enough.  I want to see some more bytes-transferred numbers for real VM images though, maybe I find the time to do that myself, sometime.

How suitable is bup for real-life backups of VM images? (Since I don't care about file metadata, and run it only on Linux, I should be pretty safe, right?) I see the DESIGN document claims there is no "bup restore" but the README even uses "bup restore" in the examples.  It looks that by now, restoring works properly, right?

Thanks for your help guys,

Dieter

Avery Pennarun

unread,
Oct 18, 2010, 12:25:11 PM10/18/10
to Dieter Plaetinck, bup-...@googlegroups.com
On Mon, Oct 18, 2010 at 3:15 PM, Dieter Plaetinck
<dieterp...@gmail.com> wrote:
> Clearly git is not able to do any special dedup (which is not suprising,
> since the image is 100% random, I guess real VM images have some room for
> dedup) or compression (which suprises me a bit, I thought git stores blobs
> compressed)

You simply can't compress randomness; try it sometime. If you could
compress it, it wouldn't be random.

> Then you do a "change the 511. 1M block" (what does that mean?)

He listed the commands; it's obvious from those what he means.

> So, what happens here, 2 images have 1MB changed, so ~2MB transfer is
> needed, which bup does nicely and rsync is a bit less efficient, and then
> rsnapshot needs to save the full 1GB files again, causing the big 2GB
> storage increase.
>
> It isn't really the benchmark I was expecting, but the numbers are clear
> enough.  I want to see some more bytes-transferred numbers for real VM
> images though, maybe I find the time to do that myself, sometime.

The trends persist across any sort of files. Other than the
improvements over rsnapshot that come from renaming, of course, since
presumably you aren't renaming your VM images. As far as bup is
concerned, renames are only the change of a few bytes (the filename)
and not anything else.

> How suitable is bup for real-life backups of VM images? (Since I don't care
> about file metadata, and run it only on Linux, I should be pretty safe,
> right?) I see the DESIGN document claims there is no "bup restore" but the
> README even uses "bup restore" in the examples.  It looks that by now,
> restoring works properly, right?

To paraphrase: Zoran backed up the contents of his VM images, while
you plan to back up the raw VM disk files using your host system.

I'm not quite sure why you don't just test it on your own data; it
will only take a few minutes to set up, and then you'll have the final
answer on *your* data, not a synthetic benchmark.

But anyway, I expect that you'll find the results quite excellent. I
certainly have when I've backed up my VM images. VM disks tend to
have a lot of duplication outside the gzip compression window (about
32-128k) because when you copy a file around and then delete the
original, you end up with chunks of the same file in two totally
different places on the disk. gzip fails badly at compressing such
things - it only really fixes redundancy inside that small window -
and so bup usually compresses better than gzip on even a *single* copy
of a VM disk image, if that VM disk has been busy in the past.

rsnapshot just stores (I think gzipped?) copies of the VM disk, so the
disk space usage of bup should be much less.

As for file transfer, I'd expect bup and rsnapshot to be pretty close
to each other - for the first backup. After that, bup retains
persistent state on the client side (the .idx and .midx files) which
will allow it to do future backups while sending far fewer bytes. And
of course you won't have to store the entire file over again like you
would with rsnapshot.

Basically: just try it. It's better.

(Except for speed. It's getting to be time to rewrite more of bup in
C, I guess. :))

Have fun,

Avery

Dieter Plaetinck

unread,
Oct 18, 2010, 3:33:44 PM10/18/10
to Avery Pennarun, bup-...@googlegroups.com
On Mon, Oct 18, 2010 at 6:25 PM, Avery Pennarun <apen...@gmail.com> wrote:
On Mon, Oct 18, 2010 at 3:15 PM, Dieter Plaetinck
<dieterp...@gmail.com> wrote:
> Clearly git is not able to do any special dedup (which is not suprising,
> since the image is 100% random, I guess real VM images have some room for
> dedup) or compression (which suprises me a bit, I thought git stores blobs
> compressed)

You simply can't compress randomness; try it sometime.  If you could
compress it, it wouldn't be random.

Makes sense, i'm curious to see what the result of the compression will be on real life images

> Then you do a "change the 511. 1M block" (what does that mean?)

He listed the commands; it's obvious from those what he means.
I'm sorry, I didn't understand the code.  now I tried it out and noticed the testfile gets truncated when you try to write in the middle of it, hence the need for the backup of the 2nd half.  I didn't know that and didn't understand that phrase.


> So, what happens here, 2 images have 1MB changed, so ~2MB transfer is
> needed, which bup does nicely and rsync is a bit less efficient, and then
> rsnapshot needs to save the full 1GB files again, causing the big 2GB
> storage increase.
>
> It isn't really the benchmark I was expecting, but the numbers are clear
> enough.  I want to see some more bytes-transferred numbers for real VM
> images though, maybe I find the time to do that myself, sometime.

The trends persist across any sort of files.  Other than the
improvements over rsnapshot that come from renaming, of course, since
presumably you aren't renaming your VM images.  As far as bup is
concerned, renames are only the change of a few bytes (the filename)
and not anything else.


yeah
 
> How suitable is bup for real-life backups of VM images? (Since I don't care
> about file metadata, and run it only on Linux, I should be pretty safe,
> right?) I see the DESIGN document claims there is no "bup restore" but the
> README even uses "bup restore" in the examples.  It looks that by now,
> restoring works properly, right?

To paraphrase: Zoran backed up the contents of his VM images, while
you plan to back up the raw VM disk files using your host system.

I'm not quite sure why you don't just test it on your own data; it
will only take a few minutes to set up, and then you'll have the final
answer on *your* data, not a synthetic benchmark.

I was enquiring about the reliability and trustworthyness of bup.
I'm interested in running my own tests and measurements,
but only when I consider using it. I.e. after someone can tell me
"yeah bup should work fine and restoring works" (maybe cumbersome, but working)
Also, in my case I might be able to avoid backing up VM images alltogether if I can pull the relevant data
out of my vdi files, but that's another story.
 
But anyway, I expect that you'll find the results quite excellent.  I
certainly have when I've backed up my VM images.  VM disks tend to
have a lot of duplication outside the gzip compression window (about
32-128k) because when you copy a file around and then delete the
original, you end up with chunks of the same file in two totally
different places on the disk.  gzip fails badly at compressing such
things - it only really fixes redundancy inside that small window -
and so bup usually compresses better than gzip on even a *single* copy
of a VM disk image, if that VM disk has been busy in the past.

rsnapshot just stores (I think gzipped?) copies of the VM disk, so the
disk space usage of bup should be much less.
 
afaik rsnapshot does not compress stored files at all. (otherwise rsyncing would get pretty hard)

As for file transfer, I'd expect bup and rsnapshot to be pretty close
to each other - for the first backup.  After that, bup retains
persistent state on the client side (the .idx and .midx files) which
will allow it to do future backups while sending far fewer bytes.  And
of course you won't have to store the entire file over again like you
would with rsnapshot.

Basically: just try it.  It's better.

(Except for speed.  It's getting to be time to rewrite more of bup in
C, I guess. :))
Is it safe to assume that such rewriting or other refactoring don't affect the storage format?

Have fun,

Avery


Zoran Zaric

unread,
Oct 18, 2010, 5:50:08 PM10/18/10
to Dieter Plaetinck, Avery Pennarun, bup-...@googlegroups.com
On 18.10.2010 21:33, Dieter Plaetinck wrote:
> I was enquiring about the reliability and trustworthyness of bup.
> I'm interested in running my own tests and measurements,
> but only when I consider using it. I.e. after someone can tell me
> "yeah bup should work fine and restoring works" (maybe cumbersome, but
> working)
> Also, in my case I might be able to avoid backing up VM images alltogether
> if I can pull the relevant data
> out of my vdi files, but that's another story.

yeah bup should work fine and restoring works ;)

I'm sorry if something i wrote wasn't clear. I'm not a native speaker,
so please excuse my inaccuracies.

Zoran

Aleksandr Milewski

unread,
Oct 19, 2010, 4:49:12 PM10/19/10
to bup-...@googlegroups.com
On 10/18/10 9:25 AM, Avery Pennarun wrote:

> (Except for speed. It's getting to be time to rewrite more of bup in
> C, I guess. :))

Starting with a pack indexer? :D

I'm limping along manually indexing packs on my big backup, which is
mostly OK, since I'm not making really big changes there, just a pack or
two now and then.

-Z

Avery Pennarun

unread,
Oct 20, 2010, 5:11:50 PM10/20/10
to Aleksandr Milewski, bup-...@googlegroups.com

As a matter of fact, the pack indexer would probably be plenty fast
even in python :) But you're right, it does need to be written. I
happen to be on a road trip right now that's interfering with such
things, but I promise it's on my list.

Or if someone else around here is feeling motivated, it's not actually
too hard. Just look for the place where we call 'git index-pack' and
replace it :)

Have fun,

Avery

Jon Dowland

unread,
Oct 21, 2010, 3:59:32 AM10/21/10
to bup-list
I've found this thread particularly interesting because my personal
backup journey had me try (and eventually reject) rsnapshot.

In my case I was backing up a remote $HOME to a low powered NAS. Hard
link trees are simply not an appropriate solution. I think by the time
I gave up on rsnapshot, I had more space used in filesystem overhead
than files themselves.

Dieter Plaetinck

unread,
Oct 21, 2010, 7:35:03 AM10/21/10
to Jon Dowland, bup-list
Right, I think a "bup vs rsnapshot" comparison in the documentation would make sense.
I wouldn't mind writing such a paragraph, would this be a welcome addition to the readme?
Reply all
Reply to author
Forward
0 new messages