Discussion:
EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?
(too old to reply)
Berend de Boer
2013-07-03 01:10:25 UTC
Permalink
Hi All,

I'm experimenting with building a FreeBSD NFS server on Amazon AWS
EC2. I've created a zpool with 5 disks in a raidz2 configuration.

How can I make a consistent backup of this using EBS?

On Linux' file systems I can freeze a file system, start the backup of
all disks, and unfreeze. This freeze usually only takes 100ms or so.

ZFS on FreeBSD does not appear to have such an option. I.e. what I'm
looking for is basically a hardware based snapshot. ZFS should simply
be suspended at a recoverable point for a few hundred ms.

A similar question from 2010 is here:
http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots

Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is
going to be impossible. Unfortunately that means back to Linux sigh.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Adam Vande More
2013-07-03 01:26:59 UTC
Permalink
Post by Berend de Boer
Hi All,
I'm experimenting with building a FreeBSD NFS server on Amazon AWS
EC2. I've created a zpool with 5 disks in a raidz2 configuration.
How can I make a consistent backup of this using EBS?
On Linux' file systems I can freeze a file system, start the backup of
all disks, and unfreeze. This freeze usually only takes 100ms or so.
ZFS on FreeBSD does not appear to have such an option. I.e. what I'm
looking for is basically a hardware based snapshot. ZFS should simply
be suspended at a recoverable point for a few hundred ms.
http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots
Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is
going to be impossible. Unfortunately that means back to Linux sigh.
What is wrong with a simple ZFS snapshot and running the backup against it?
I assume that's how most of us are doing it.
--
Adam Vande More
Berend de Boer
2013-07-03 02:08:13 UTC
Permalink
Adam> What is wrong with a simple ZFS snapshot and running the
Adam> backup against it?  I assume that's how most of us are doing
Adam> it.

For starters, I suppose very few people are using FreeBSD on AWS, so
"most of us" don't have a choice :-)

But this might simply be my understanding: what if I want to use the
EBS snapshot of the 5 disks I've taken and attach them to another
machine, and mount it?

But perhaps you don't know what an EBS snapshot is? It's not a backup
of your file system, it's a hardware-based backup of a disk at a
single point in time.


I.e. with EBS I can take a snapshot of 5 1TB, create new disks from
it, and attach it to another machine. In SECONDS.

I'm not looking for a way to take a ZFS snapshot, stream that to S3
for hours, and stream it back and write to another disk for hours.


But in case I didn't get you: could you please let me know if your
approach would allow me to take a consistent backup of my disks and
mount them on another server or use them for recovery purposes?
--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Jeremy Chadwick
2013-07-03 05:50:47 UTC
Permalink
Post by Berend de Boer
Adam> What is wrong with a simple ZFS snapshot and running the
Adam> backup against it?  I assume that's how most of us are doing
Adam> it.
For starters, I suppose very few people are using FreeBSD on AWS, so
"most of us" don't have a choice :-)
But this might simply be my understanding: what if I want to use the
EBS snapshot of the 5 disks I've taken and attach them to another
machine, and mount it?
But perhaps you don't know what an EBS snapshot is? It's not a backup
of your file system, it's a hardware-based backup of a disk at a
single point in time.
I.e. with EBS I can take a snapshot of 5 1TB, create new disks from
it, and attach it to another machine. In SECONDS.
Okay, I think I understand what you're asking. Please correct me:

It sounds to me like the Linux OS images on AWS have utilities or the
capability to create EBS images that are snapshots of the "virtual
disks" that make up the AWS system, and that you can transfer these
to another Linux AWS machine and mount the EBS images, and that this is
being done within Linux itself.

Correct?

If so -- then what you're wanting to ask is: "does FreeBSD have support
for EBS images?" (Meaning this has nothing to do with ZFS) I get the
impression an EBS image is a proprietary Amazon thing, so you would need
to ask Amazon if they have the same utilities for FreeBSD, or ask the
individual(s) responsible for the FreeBSD AWS images if there are such
tools. Again: nothing to do with ZFS.
Post by Berend de Boer
Post by Berend de Boer
On Linux' file systems I can freeze a file system, start the backup of
all disks, and unfreeze.
I'm not looking for a way to take a ZFS snapshot, stream that to S3
for hours, and stream it back and write to another disk for hours.
Except ZFS is a filesystem, yet above you just said "On Linux I can
freeze a filesystem..."

Understand my confusion now?
Post by Berend de Boer
But in case I didn't get you: could you please let me know if your
approach would allow me to take a consistent backup of my disks and
mount them on another server or use them for recovery purposes?
ZFS snapshots will let you take a snapshot of a pool/filesystem
(including incremental, if needed). You can send ZFS snapshots to
another system for use using "zfs send" and "zfs recv".

You can use ZFS snapshots for almost-bare-metal recovery depending on
how you set up your system, but it does not do recovery of things like
boot blocks/bootloaders or other nuances. You would have to do those on
your own (set up the boot blocks, etc.), or do so at a different layer.

For example, referring to virtualisation/hypervisors: it is possible to
take a "snapshot of a disk used by a VM", and then take that snapshot
and move it to another machine where its added (as a new disk) + seen by
the OS as a new disk and can be used.

And that has nothing to do with ZFS -- that has to do with the VM
software and/or HV software. *Sometimes* OS vendors actually have
utilities (running within the guest itself) that can do these tasks,
hence my paragraph above starting out with "If so --".
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Markus Gebert
2013-07-03 08:59:07 UTC
Permalink
The closest thing we can do in FreeBSD is to unmount the filesystem, take the snapshot, and remount. This has the side effect of closing all open files, so it's not really an alternative.
The other option is to not freeze the filesystem before taking the snapshot, but again you risk leaving things in an inconsistent state, and/or the last few writes you think you made didn't actually get committed to disk yet. For automated systems that create then clone filesystems for new VMs, this can be a big problem. At best, you're going to get a warning that the filesystem wasn't cleanly unmounted.
Actually, sync(2)/sync(8) will do the job on ZFS. It won't stop/pause I/O running in other contexts, but it does guarantee that any commands you ran and completed prior to calling sync will make it to disk in ZFS.
This is because sync in ZFS is implemented as a ZIL commit, so transactions that haven't yet made it to disk via the normal syncing context will at least be committed via their ZIL blocks. Which can then be replayed when the pool is imported later, in this case from the EBS snapshots.
And since the entire tree from the überblock down in ZFS is COW, you can't get an inconsistent pool simply by doing a virtual disk snapshot, regardless of how that is implemented.
--Will.
Sorry, yes, this is true. We're not using ZFS to clone and provision new VMs, so I was just thinking about UFS here. And ZFS does have a good advantage here that it seems to actually respect sync requests. I think it was here I reported a few months ago that we were seeing UFS+SUJ not actually doing anything when sync(8) was called.
But for some workloads this still isn't sufficient if you have processes running that could be writing at any time. As an example, we have a database server using ZFS backed storage. Short of shutting down the server, there's no way to guarantee it won't try to write even if we lock all tables, disconnect all clients, etc. mysql has all sorts of things done on timers that occur lazily in the future, including periodic checkpoint writes even if there is no activity.
I know this is a sort of obscure use case, but Linux and Windows both have this functionality that VMWare will use if present (and the guest tools know about it). Linux goes a step further and ensures that it's not in the middle of writing anything to swap during the quiesce period, too. I don't think this would be terribly difficult to implement, a hook somewhere along the write chain that blocks (or queues up) anything trying to write until the unfreeze comes along, but I'm guessing there are all sorts of deadlock opportunities here.
Indeed sync(8) has the disadvantage that you cannot prevent writes between the syscall and the EBS snapshot, so depending on the application, this can make the resulting EBS snapshot useless.

But taking a zfs snapshot is an atomic operation. Why not use that? For example:

1. snapshot the zfs at the same point in time you'd issue that ioctl on Linux
2. take the EBS snapshot at any time
3. clone the EBS snapshot to the new/other VM
4. zfs import the pool there
5. zfs rollback the filesystem to the snapshot taken in step 1 (or clone it and use that)

Any writes that have been issued between the zfs snapshot and the EBS snapshot are discarded, and like that you get the exact same filesystem data as you would have gotten with ioctl. Also, taking the zfs snapshot should take much less time, because you don't have to wait for the EBS snapshot to complete before you can resume IO on the filesystem. So you don't even depend on EBS snapshots being quick when using the zfs approach, a big advantage in my opinion.


Markus
Berend de Boer
2013-07-03 09:15:50 UTC
Permalink
Markus> 1. snapshot the zfs at the same point in time you'd issue
Markus> that ioctl on Linux
Markus> 2. take the EBS snapshot at any time
Markus> 3. clone the EBS snapshot to the new/other VM
Markus> 4. zfs import the pool there
Markus> 5. zfs rollback the filesystem to
Markus> the snapshot taken in step 1 (or clone it and use that)

That seems like a very good first step!

It's unfortunately not automatic, but for recovery purposes it should
do.

Do you think (yes, I will definitely test this), that ZFS can mount a
file system consisting of a couple of disk (raidz2 setup), and access
it even though every disk might be a backup taken at a slighty
different time?

Obviously I'm going to throw away the mounted state and rollback to my
snapshot, but it has to be able to mount a set of disks which might be
in a terrible state first.


Markus> Also, taking the zfs snapshot should take much less time,
Markus> because you don't have to wait for the EBS snapshot to
Markus> complete before you can resume IO on the filesystem. So
Markus> you don't even depend on EBS snapshots being quick when
Markus> using the zfs approach, a big advantage in my opinion.

You don't have to wait for an EBS snapshot to complete. That can take
hours. EBS simply takes the moment in time you give the command, and
starts the backup from there. Normal I/O to the disk continues (so
uses some kind of COW system I suppose)

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Markus Gebert
2013-07-03 09:55:51 UTC
Permalink
Post by Berend de Boer
Markus> 1. snapshot the zfs at the same point in time you'd issue
Markus> that ioctl on Linux
Markus> 2. take the EBS snapshot at any time
Markus> 3. clone the EBS snapshot to the new/other VM
Markus> 4. zfs import the pool there
Markus> 5. zfs rollback the filesystem to
Markus> the snapshot taken in step 1 (or clone it and use that)
That seems like a very good first step!
It's unfortunately not automatic, but for recovery purposes it should
do.
This is as automatic as you make it to be :-). But yes, the code that does that might not exist yet...
Post by Berend de Boer
Do you think (yes, I will definitely test this), that ZFS can mount a
file system consisting of a couple of disk (raidz2 setup), and access
it even though every disk might be a backup taken at a slighty
different time?
I'm not entirely sure. I've written the scenario above with one disk in mind, which works for sure.

I know that zfs keeps around a certain amount of old transactions/uberblocks, so that in case it finds that the newest transaction can't be used on import for some reason, it can rollback to an older transaction (see -F option of zpool import). This usually means data loss, but I guess that's a non-issue in your scenario, as you'll throw away data newer than your snapshot anyway and the snapshot should be on disk when you take the EBS snapshot.

Then again, please test this, I'm not sure wether the old transactions even help in this scenario. And if the time delta gets too big and you do too many writes in the meantime, zfs might not be able to import the pool, if no mutual transaction can be found anymore.

Of course it'd be safest to EBS snapshot all disk at the same exact time, but if I understand you correctly, there is no such functionality and the OS is expected to guarantee some kind of consistency between multiple related disks.
Post by Berend de Boer
Obviously I'm going to throw away the mounted state and rollback to my
snapshot, but it has to be able to mount a set of disks which might be
in a terrible state first.
Markus> Also, taking the zfs snapshot should take much less time,
Markus> because you don't have to wait for the EBS snapshot to
Markus> complete before you can resume IO on the filesystem. So
Markus> you don't even depend on EBS snapshots being quick when
Markus> using the zfs approach, a big advantage in my opinion.
You don't have to wait for an EBS snapshot to complete. That can take
hours. EBS simply takes the moment in time you give the command, and
starts the backup from there. Normal I/O to the disk continues (so
uses some kind of COW system I suppose)
Yes, but a zfs snapshot is near instant. ioctl, wait for sync, mark clean, trigger EBS snapshot, ioctl again to resume IO, sounds like more work. So I wasn't saying EBS snapshots are slow, but the whole process probably isn't as quick as just taking a zfs snapshot. zfs probably will loose some time when importing on the new VM, but at that point you usually don't care.


Markus
Berend de Boer
2013-07-03 11:21:23 UTC
Permalink
Markus> Of course it'd be safest to EBS snapshot all disk at the
Markus> same exact time, but if I understand you correctly, there
Markus> is no such functionality and the OS is expected to
Markus> guarantee some kind of consistency between multiple
Markus> related disks.

That's exactly the point. I agree with you that the one disk solution
is trivial. It's the multiple disk case that concerns me.


Markus> Yes, but a zfs snapshot is near instant. ioctl, wait for
Markus> sync, mark clean, trigger EBS snapshot, ioctl again to
Markus> resume IO, sounds like more work.

Definitely true. Could take a few seconds if you have a lot of
disks. But a hickup of can't write for a few seconds isn't noticeable
in most situations.

And after you have done your zfs snapshot, you're not done either! You
have to transfer it somewhere, probably compressed, so your (p)bzip2
dominates your CPUs, your network bandwidth is gone, etc. So a backup
starts to becomes a really high impact event, while an EBS snapshot
today isn't a big deal. Slightly degraded performance perhaps, but not
much.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Daniel Kalchev
2013-07-03 13:00:25 UTC
Permalink
Post by Berend de Boer
And after you have done your zfs snapshot, you're not done either! You
have to transfer it somewhere, probably compressed, so your (p)bzip2
dominates your CPUs, your network bandwidth is gone, etc.
The idea that was proposed was to create an local ZFS snapshot, that is
not being sent anywhere. Because the ZFS snapshot is a ZIL commit, it
can be very fast. Well, how fast, depends on some conditions - I have an
system that sometimes takes minutes for a snapshot, but .. I am really
torturing ZFS there.

With the local ZFS snapshot, you then trigger an EBS snapshot of your
disks. That is more or less identical to your server losing power and
then coming back -- you only are sure there is a consistent snapshot of
the filesystem available.

However, whether this suits you or not is another matter. Do you want to
essentially emulate power loss/restart of the server when you revert to
use those snapshots? If so, then you are ok. ZFS has you covered.
Perhaps even without making the snapshot in the first place. But, if you
want your application data consistent on disk, then temporarily stopping
the applications is the only safe way -- FreeBSD/ZFS, Linux or what you
have won't make any difference.
Post by Berend de Boer
So a backup starts to becomes a really high impact event, while an EBS
snapshot today isn't a big deal. Slightly degraded performance
perhaps, but not much.
It seems, Amazon uses some sort of ZFS (volume) snapshots in order to
implement the functionality of EBS. Why it would take hours to complete
is hard to understand, perhaps they are actually backing it up somewhere
too, using ZFS send/receive (or equivalents if they don't use ZFS).

Daniel
Berend de Boer
2013-07-03 19:28:28 UTC
Permalink
Daniel> It seems, Amazon uses some sort of ZFS (volume) snapshots
Daniel> in order to implement the functionality of EBS. Why it
Daniel> would take hours to complete is hard to understand,

It all depends on how big your disks are :-)

First snapshot takes longer, then they take differences. Perhaps 1TB
takes 20 minutes or so? But diffs are much faster, usually 2 minutes
or so.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Berend de Boer
2013-07-04 09:48:38 UTC
Permalink
Berend> Do you think (yes, I will definitely test this), that ZFS
Berend> can mount a file system consisting of a couple of disk
Berend> (raidz2 setup), and access it even though every disk might
Berend> be a backup taken at a slighty different time?

Answering my own question: raidz2, four disks, 128GB disks each, no
writing happening can be backed up with an EBS snapshot, then turned
into volumes (disks) again and mounted on another FreeBSD without
apparent ill effects.

The snapshots of the four disks were about 2 seconds apart each.

zfs took perhaps 30 seconds to start on the second server.

No snapshot was available (only had done one of a file system, no zfs
snapshot -r was ever done). Didn't do any rollback.


Next step is to do take the snapshot while writing stuff is going on.

Then what you suggested: taking a snapshot first, and rollback after
mount.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Daniel Kalchev
2013-07-04 11:00:16 UTC
Permalink
Post by Berend de Boer
Berend> Do you think (yes, I will definitely test this), that ZFS
Berend> can mount a file system consisting of a couple of disk
Berend> (raidz2 setup), and access it even though every disk might
Berend> be a backup taken at a slighty different time?
Answering my own question: raidz2, four disks, 128GB disks each, no
writing happening can be backed up with an EBS snapshot, then turned
into volumes (disks) again and mounted on another FreeBSD without
apparent ill effects.
The snapshots of the four disks were about 2 seconds apart each.
zfs took perhaps 30 seconds to start on the second server.
It apparently had to replay the ZIL.
Post by Berend de Boer
No snapshot was available (only had done one of a file system, no zfs
snapshot -r was ever done). Didn't do any rollback.
As mentioned earlier, doing this is merely emulating (graceful) power
loss of the system, or a sudden reboot if you will -- without unmounting
the file systems.
ZFS guarantees that your file system will be always consistent, at the
cost of losing some data -- minimized by the ZIL if your applications
used synchronous writes.
Post by Berend de Boer
Next step is to do take the snapshot while writing stuff is going on.
The snapshot will 'record' the known file system state. This does not
necessarily mean it will record what you think it should, because
applications might hold data in memory that makes their stored state
consistent. ZFS has no way to know any of this (nor does any other
filesystem).
Exactly what happens when your system suddenly reboots. ZFS will
guarantee the file system consistency.
Post by Berend de Boer
Then what you suggested: taking a snapshot first, and rollback after mount.
Taking a snapshot at some point of time will guarantee you have the file
system state at that time.
There will be difference with not taking a snapshot, only if your
applications do not use synchronous writes where they should, because
the snapshot will make the pending writes synchronous (I believe this is
the case, or it would not be consistent).

if your applications always do sync writes, for example an NFS server,
whether you do the snapshot or not will not make any difference.

But you should test it anyway. :)

Daniel
Berend de Boer
2013-07-07 21:53:28 UTC
Permalink
Markus> But taking a zfs snapshot is an atomic operation. Why not
Markus> use that? For example:

Markus> 1. snapshot the zfs at the same point in time you'd issue
Markus> that ioctl on Linux 2. take the EBS snapshot at any time
Markus> 3. clone the EBS snapshot to the new/other VM 4. zfs
Markus> import the pool there 5. zfs rollback the filesystem to
Markus> the snapshot taken in step 1 (or clone it and use that)

OK, various tests later: this does not really work. If you create the
snapshot, and make a backup, the snapshot does not show up on the
backup (whatever the reason, perhaps the disks so inconsistent zfs has
to rollback).

But the biggest issue is that if writing is going on and you make the
EBS snapshot, you can't really mount it. Maybe zfs mounts after hours,
but I just gave if it didn't mount it after 1.5 hours.

Another interesting thing I've seen was a completely empty drive after
mount!

So clearly EBS snapshots on a mounted multi-drive pool don't work.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Markus Gebert
2013-07-08 10:51:42 UTC
Permalink
Post by Berend de Boer
Markus> But taking a zfs snapshot is an atomic operation. Why not
Markus> 1. snapshot the zfs at the same point in time you'd issue
Markus> that ioctl on Linux 2. take the EBS snapshot at any time
Markus> 3. clone the EBS snapshot to the new/other VM 4. zfs
Markus> import the pool there 5. zfs rollback the filesystem to
Markus> the snapshot taken in step 1 (or clone it and use that)
OK, various tests later: this does not really work. If you create the
snapshot, and make a backup, the snapshot does not show up on the
backup (whatever the reason, perhaps the disks so inconsistent zfs has
to rollback).
I was under the impression that metadata operations are always synchronous in zfs, so since 'zfs snapshot' is such an operation it should be on disk as soon as the command completes. But I've nerver actually confirmed this. So, it could be that zfs snapshots don't get commited to disk immediately after all, or EBS confirmed write/flush commands that we're not commited to whatever is considered stable storage in that cloud. I don't know how well-behaved EBS and its snapshots are when ich comes to flush commands, write order etc. So all just speculation, stopping here...
Post by Berend de Boer
But the biggest issue is that if writing is going on and you make the
EBS snapshot, you can't really mount it. Maybe zfs mounts after hours,
but I just gave if it didn't mount it after 1.5 hours.
By 'mount' do you mean the import of the pool? Did you use -F on import? In any case, this sounds too long. Was the system doing IO?
Post by Berend de Boer
Another interesting thing I've seen was a completely empty drive after
mount!
That's a bit unspecific. What's empty? Disk full of zeros? Partition full of zeros? Pool without file system on it? Pool with empty filesystems?
Post by Berend de Boer
So clearly EBS snapshots on a mounted multi-drive pool don't work.
I think with a lot of writes and transactions, you can't realiably avoid a scenario where zfs can't find a valid or good enough mutual transaction accross all disks. By avoiding writes while doing the EBS snapshots, you could more likely end up with something you can actually import. If you can't avoid the writes, you're out of luck. What you really need in that case is the ability to snapshot all EBS disks as group.

Anyway, with the tools at hand (no IO "freeze" like Linux, no EBS snapshots for groups of disks), I don't think you can acomplish what you originally intended.


Markus
Berend de Boer
2013-07-08 20:06:25 UTC
Permalink
Markus> By 'mount' do you mean the import of the pool? Did you use
Markus> -F on import? In any case, this sounds too long. Was the
Markus> system doing IO?

Yep, did -F as well, I believe I have seen the most disastrous imports
in that case, i.e. empty disks.

And it did continuous I/O, as best as I could determine at max disk
performance.
Post by Berend de Boer
Another interesting thing I've seen was a completely empty
drive after mount!
Markus> That's a bit unspecific. What's empty? Disk full of zeros?
Markus> Partition full of zeros? Pool without file system on it?
Markus> Pool with empty filesystems?

With empty file systems.


Markus> What you really need in that case is the ability to
Markus> snapshot all EBS disks as group.

Which Linux offers.


Markus> Anyway, with the tools at hand (no IO "freeze" like Linux,
Markus> no EBS snapshots for groups of disks), I don't think you
Markus> can acomplish what you originally intended.

Indeed.

I think my best strategy is doing frequent ZFS snapshots to a stand-by
server, and take the backups from there.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Jeremy Chadwick
2013-07-08 21:01:45 UTC
Permalink
Post by Berend de Boer
Markus> What you really need in that case is the ability to
Markus> snapshot all EBS disks as group.
Which Linux offers.
I'm experimenting with building a FreeBSD NFS server on Amazon AWS
EC2. I've created a zpool with 5 disks in a raidz2 configuration.
How can I make a consistent backup of this using EBS?
Therefore, the answer/solution for you at this stage seems to be: use
Linux. Linux does what you need -- it offers you guest-level utilities
that interface with the proprietary storage system back-end (EBS) that
is offered by your choice of hosting vendor (Amazon). So what's the
problem with using Linux? Why sound so apathetic-yet-confrontational
(re: "Linux offers this, FreeBSD doesn't")?

There is absolutely no shame in any way/shape/form in using an OS that
meets your needs/requirements. I can't speak for others, but if that's
Windows, great -- if that's Linux, great -- if that's some proprietary
thing that only 30 people use, great. I really couldn't care less what
someone uses, as long as it allows them to accomplish what they need.

Otherwise, if this is some sort of "deal-breaker" for you and you
absolutely need this functionality on FreeBSD, my advice is to talk to
Amazon. EBS is a closed, black-box-proprietary thing. The userland
utilities they may offer on Linux, depending on how they work, could be
made to work on FreeBSD (and NOT through Linux emulation, thank you very
much). If this is something you want, you should talk to them about it.
They aren't going to know/care unless you tell them it's something that
interests you as a customer. If they respond "that's nice, we have only
a small interest in FreeBSD", then you should be able to take that
response and make decisions based upon it, depending on what your needs
are. ***You*** are responsible for those choices, not anyone here. :-)

So please do not try to make this a "Linux vs. FreeBSD" thing when the
actual limitation here is being indirectly imposed on you by your choice
of hosting vendor.
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Freddie Cash
2013-07-08 21:33:22 UTC
Permalink
Post by Jeremy Chadwick
Post by Berend de Boer
Markus> What you really need in that case is the ability to
Markus> snapshot all EBS disks as group.
Which Linux offers.
I'm experimenting with building a FreeBSD NFS server on Amazon AWS
EC2. I've created a zpool with 5 disks in a raidz2 configuration.
How can I make a consistent backup of this using EBS?
Therefore, the answer/solution for you at this stage seems to be: use
Linux. Linux does what you need -- it offers you guest-level utilities
that interface with the proprietary storage system back-end (EBS) that
is offered by your choice of hosting vendor (Amazon). So what's the
problem with using Linux? Why sound so apathetic-yet-confrontational
(re: "Linux offers this, FreeBSD doesn't")?
Something else to consider is that this may not be a FreeBSD issue at all,
but a filesystem/storage system issue. Meaning, if you use ZFS on Linux
... EBS backups will not work. Same if you try to use Solaris or Illumos
or any other ZFS-enabled OS that will run in Amazon's cloud.

At which point, it would make more sense taking the discussion upstream to
Illumos to find a way to quiesce a ZFS pool in such a way that EBS backups
would work. Once that is done, then it can filter downstream to FreeBSD,
Linux, and others.
--
Freddie Cash
***@gmail.com
Berend de Boer
2013-07-08 22:31:49 UTC
Permalink
Freddie> Something else to consider is that this may not be a
Freddie> FreeBSD issue at all, but a filesystem/storage system
Freddie> issue.  Meaning, if you use ZFS on Linux ... EBS backups
Freddie> will not work.  Same if you try to use Solaris or Illumos
Freddie> or any other ZFS-enabled OS that will run in Amazon's
Freddie> cloud.

And you are exactly right. I can only freeze file systems on Linux
that's support that. For example lvm/xfs support that. Not sure about
ext, last time I tried that (Ubuntu 12.04) it didn't.

I didn't dare to use ZFS on Linux, so didn't check, but given my
experience with ZFS so far I doubt they would have added this. And
they would have, as it is a file system specific thing.


Freddie> At which point, it would make more sense taking the
Freddie> discussion upstream to Illumos to find a way to quiesce a
Freddie> ZFS pool in such a way that EBS backups would work.  Once
Freddie> that is done, then it can filter downstream to FreeBSD,
Freddie> Linux, and others.

Great tip. Didn't know exactly if the ZFS implementation in FreeBSD
was forked or not. I see on their home page about submitting patches
:-)

I've been on #zfs but not much feedback there. I'll join their mailing
list and ask this question.
--
Thanks again,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Freddie Cash
2013-07-08 22:37:46 UTC
Permalink
Post by Berend de Boer
Freddie> At which point, it would make more sense taking the
Freddie> discussion upstream to Illumos to find a way to quiesce a
Freddie> ZFS pool in such a way that EBS backups would work. Once
Freddie> that is done, then it can filter downstream to FreeBSD,
Freddie> Linux, and others.
Great tip. Didn't know exactly if the ZFS implementation in FreeBSD
was forked or not. I see on their home page about submitting patches
:-)
The FreeBSD implementation of ZFS isn't 100% identical to the Illumos (aka
"reference") implementation, mainly due to GEOM; however, the FreeBSD ZFS
maintainers try to keep it at feature parity with Illumos (and even push
patches upstream that get added to Illumos).

Same with the Linux implementation of ZFS, although there are more changes
made to that one to shoehorn it into that wonderful mess they call "a
storage stack". :) There are a handful of features available in the
ZFS-on-Linux implementation that aren't anywhere else (like "-o ashift="
for zpool create/add).

All in all, the ZFS-using OS projects try to stay as close to the Illumos
version as is reasonable for the OS.

It certainly would be interesting to have a "zfs freeze" and/or a "zpool
freeze" (depending on where you want to quiesce things), but it may not
play into how ZFS works (wanting to have complete control over the block
devices, meaning no special magic underneath like block-level snapshots).
:) Or, it may be the "next great feature" of ZFS. :)
--
Freddie Cash
***@gmail.com
Jeremy Chadwick
2013-07-09 00:05:08 UTC
Permalink
Post by Freddie Cash
Post by Berend de Boer
Freddie> At which point, it would make more sense taking the
Freddie> discussion upstream to Illumos to find a way to quiesce a
Freddie> ZFS pool in such a way that EBS backups would work. Once
Freddie> that is done, then it can filter downstream to FreeBSD,
Freddie> Linux, and others.
Great tip. Didn't know exactly if the ZFS implementation in FreeBSD
was forked or not. I see on their home page about submitting patches
:-)
The FreeBSD implementation of ZFS isn't 100% identical to the Illumos (aka
"reference") implementation, mainly due to GEOM; however, the FreeBSD ZFS
maintainers try to keep it at feature parity with Illumos (and even push
patches upstream that get added to Illumos).
Same with the Linux implementation of ZFS, although there are more changes
made to that one to shoehorn it into that wonderful mess they call "a
storage stack". :) There are a handful of features available in the
ZFS-on-Linux implementation that aren't anywhere else (like "-o ashift="
for zpool create/add).
All in all, the ZFS-using OS projects try to stay as close to the Illumos
version as is reasonable for the OS.
It certainly would be interesting to have a "zfs freeze" and/or a "zpool
freeze" (depending on where you want to quiesce things), but it may not
play into how ZFS works (wanting to have complete control over the block
devices, meaning no special magic underneath like block-level snapshots).
:) Or, it may be the "next great feature" of ZFS. :)
On Linux' file systems I can freeze a file system, start the backup of
all disks, and unfreeze. This freeze usually only takes 100ms or so.
I interpret this statement to mean, on Linux:

1. Some command is issued at the filesystem level that causes all I/O
operations (read and write) directed to/from that filesystem to block
(wait) indefinitely, and that all pending queued writes to the disk
are flushed to disk (on FreeBSD we would call this BIO_FLUSH),

2. Some other command is issued (at the Amazon EBS level, whether it be
done via a web page or via CLI commands on the same Linux box -- though
I don't know how that would work unless the CLI tools are on a
completely separate filesystem), where an EBS snapshot is taken (similar
to a filesystem snapshot but at the actual storage level, Possibly
if this is a Linux command there's an actual device driver that sits
between the storage layer and EBS which can effectively "halt" or
"control" things in some manner (would not be surprised! VMs often offer
this) -- I'll call this a "shim",

3. Some command is issued at the filesystem level that releases that
block/wait, and all future I/O requests go through.

What this means is that "block-level snapshots" are what would be
necessary -- the key here is that writes pending (scheduled to be
written to the disk) need to be flushed, and that any other I/O block.
I do not think something like CACHE FLUSH EXT (i.e. the ATA command used
to actually flush disk-level cache to the platters) matters -- EBS,
whether the data is "in its cache" or not has no bearing, it should know
what to do in either case. All this would be because of what EBS would
require/mandate.

On FreeBSD we don't have the Linux equivalent of #1/#3 -- the layer
where this would be done, ideally, is at the GEOM level (ex. "gfreeze"
command would block all I/O and also issue BIO_FLUSH to ensure things
had been written).

Due to the split between GEOM and filesystems (unrelated things per se),
one would have to issue "gfreeze" on the disks that make up the
filesystem, followed by doing the EBS backup/snapshot, followed by
"gfreeze -u" on all the disks. Wishful thinking, and very idealistic,
but that's my take on it.

I have no idea how you'd issue this command to select disks without
there being some risk; i.e. if a 5-disk raidz1, you'd issue that command
5 times (even if just in 1 single command, the kernel still has to
iterate over 5 items linearly), which means there's a chance the
filesystem could have successfully written parts of something to some of
those 5 disks, thus upon an EBS snapshot restore the filesystem is
actually inconsistent (ZFS reporting checksum failures, for example).

I have no idea how at the filesystem level (ex. zfs, not zpool) such
could be accomplished because again BIO_FLUSH is what's needed, and that
would be at the "provider" level (GEOM term) -- I think (kernel folks
please correct me).

I also have no idea how other layers (ex. CAM) would react to such a
"freeze". Likewise, I worry about userland applications; 100ms is a
nice and convenient number... ;-)

On FreeBSD I think what most folks do is avoid all of the above and use
filesystem snapshots exclusively, either ZFS or UFS, although UFS
snapshots... well... don't get me started.

Filesystem snapshots are "supposed" to be fast, but they depend greatly
on a lot of things and how they're implemented. But honestly they're
what most people turn to, rather than doing backups at the "block level"
(e.g. EBS). I've never encountered anything like a "block level" freeze
or snapshot on bare metal (this would have to be done somehow at the
controller level; SANs have this, I believe, but not simple HBAs that
I've worked with).

One can't even do something like extend sync(8) to somehow issue
BIO_FLUSH, because it doesn't guarantee contention between the BIO_FLUSH
and the time things are done -- more writes could enter the queue or
maybe enough that the queue is full + gets processed right then and
there, leading to the same situation.

This whole thing is a mess due to the layers of disconnect between all
the pieces (including on Linux -- it just so happens they have some
interesting way with **very specific filesystems** to accomplish
this task), and if you ask me, a complete disconnect from reality
between the "cloud providers" (Amazon, etc.) and how actual storage and
filesystems *work*. Very naughty assumptions being made on their
part, unless, of course, there is that "shim" I spoke about.
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Berend de Boer
2013-07-08 22:25:55 UTC
Permalink
Jeremy> Therefore, the answer/solution for you at this stage seems
Jeremy> to be: use Linux. Linux does what you need -- it offers
Jeremy> you guest-level utilities that interface with the
Jeremy> proprietary storage system back-end (EBS) that is offered
Jeremy> by your choice of hosting vendor (Amazon).

Ding dong, nothing to do with guest level utilities. Completely
irrelevant, and I've repeated that numerous times here. All Amazon's
guest level tools work perfectly fine on FreeBSD. Amazon has ZERO
Linux/Windows/FreeBSD/whatever specific stuff.


Jeremy> There is absolutely no shame in any way/shape/form in
Jeremy> using an OS that meets your needs/requirements.

And that's why I'm trying to use FreeBSD as it has features that Linux
doesn't have. Trust me, if Linux had everything FreeBSD has, why would
I use FreeBSD? It's even more expensive on AWS!


Jeremy> EBS is a closed, black-box-proprietary thing.

Ding dong, it's just block storage like every other block storage out
there. Sigh.

I fail to see that mentioning something Linux can do on this mailing
list would upset you? You don't want to know or read that??

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Jeremy Chadwick
2013-07-08 23:17:26 UTC
Permalink
Post by Berend de Boer
Jeremy> Therefore, the answer/solution for you at this stage seems
Jeremy> to be: use Linux. Linux does what you need -- it offers
Jeremy> you guest-level utilities that interface with the
Jeremy> proprietary storage system back-end (EBS) that is offered
Jeremy> by your choice of hosting vendor (Amazon).
Ding dong, nothing to do with guest level utilities. Completely
irrelevant, and I've repeated that numerous times here. All Amazon's
guest level tools work perfectly fine on FreeBSD. Amazon has ZERO
Linux/Windows/FreeBSD/whatever specific stuff.
Post by Berend de Boer
Markus> What you really need in that case is the ability to
Markus> snapshot all EBS disks as group.
Which Linux offers.
Because what you're saying here, vs. what you said above, is highly
conflicting as I read it. I do not understand how you can say "Amazon's
guest-level tools work on FreeBSD", then immediately say "Amazon has
zero OS-specific stuff", *while* saying "Linux offers {said thing I
want}". These statements are extremely confusing.

I think it would help if you would really start to provide *actual
commands* of what you're doing to accomplish the tasks you want to
accomplish -- both as you do them now on Linux, as well as the
ZFS-related stuff you've done on FreeBSD. To my knowledge, no where in
this thread have you actually shown scrollback/proof/etc. of what you've
been doing, just magical one-liners that seem strange (for example, the
"empty disk" thing would imply that EBS isn't even doing storage caching
correctly, which makes no sense and cannot be true else *nothing* would
work!).

It might turn out to be that what you're seeing can be traced down to
user error, or it can be traced back to the controller driver (on
FreeBSD) not behaving how you want (might require SCSI quirks, etc.),
etc... The possibilities are endless, and without hard data, I think
everyone is struggling/throwing their arms up in the air.

The reason I mention storage controller drivers is, for example, on ESXi
(I forget what versions) it was discovered that cache flushing (the OS's
way of telling the storage driver, which then tells the disks, "make
sure everything is written to the platters") was a no-op of sorts --
ESXi would say "yep I've done it" when in fact it hadn't yet.

With virtualisation this is a common situation, **solely because** of
all the abstraction and caching (at so many layers that it's almost
unfathomable -- no joke). I put EBS into the same category. Because
really, all you know ***TRULY*** is "I see disk daX using controller
xyzX", and the rest you have to pray/hope works (from the OS driver
level all the way down to the physical media this stuff is written to on
the back end / within EBS). The more abstraction = the more chances
something will not behave how bare metal expects.
Post by Berend de Boer
Jeremy> There is absolutely no shame in any way/shape/form in
Jeremy> using an OS that meets your needs/requirements.
And that's why I'm trying to use FreeBSD as it has features that Linux
doesn't have. Trust me, if Linux had everything FreeBSD has, why would
I use FreeBSD? It's even more expensive on AWS!
I'm glad you're trying it, but if you're at the stage where you're
saying "Linux can do X/Y/Z", then it's up to **you** to weigh the pros
and cons of moving away from Linux. Nobody here can make that decision
for you, whether it be based on technical need or money or whatever
else.
Post by Berend de Boer
Jeremy> EBS is a closed, black-box-proprietary thing.
Ding dong, it's just block storage like every other block storage out
there. Sigh.
You once again generalise. Every kind of storage back-end is different,
I don't know why you have such trouble understanding this. You're
quite literally telling me that EBS is the same as iSCSI is the same as
an ESXi file-based backing store is the same as a physical disk
(remember your words: "disks are just software"), which is completely
and entirely wrong -- their behaviours differ severely, not just as
solutions/methods, but in *how they behave* when submit with I/O
requests.

All that "black magic" that goes on under the hood? It matters. Every
single bit of it.
Post by Berend de Boer
I fail to see that mentioning something Linux can do on this mailing
list would upset you? You don't want to know or read that??
I am one of the most "anti-FreeBSD" people you will ever encounter -- go
ahead, ask anyone on this list, you will find that I am the one who is
usually quick to take my "well then FreeBSD needs to get it's sh**
together on a server level" soapbox. And that doesn't come lightly
given that I've been using/working professionally with FreeBSD since the
2.2.x days. Prior to that I ran Linux (0.99pl45 until early 1.3.x).
I'm just trying to give you some perspective into how I am, if it
matters.

What upsets me is that you're saying "Linux has XYZ", in turn creating a
"FreeBSD vs. Linux" conflict, when I personally strive to see that kind
of "OS war" advocacy end. I want to see people use whatever
tool/OS/thing solves their dilemmas. If that thing is Linux, cool. If
it's FreeBSD, cool. If it's Windows, cool. The dilemmas, limitations,
and needs/etc. vary per person, per org, per company.

I am against all forms of OS advocacy. I prefer to keep an open mind.
So while I give you two thumbs up for giving FreeBSD a try, and ZFS too,
but if Linux does something that you can't get on FreeBSD, then to me it
seems like the choice is obvious. But that's me -- I love
oversimplifying. ;-)
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Berend de Boer
2013-07-03 09:05:03 UTC
Permalink
Jeremy> It sounds to me like the Linux OS images on AWS have
Jeremy> utilities or the capability to create EBS images that are
Jeremy> snapshots of the "virtual disks" that make up the AWS
Jeremy> system, and that you can transfer these to another Linux
Jeremy> AWS machine and mount the EBS images, and that this is
Jeremy> being done within Linux itself.

Jeremy> Correct?

Not really. EBS is just block storage. To your OS (including FreeBSD)
they just appear as disks.

They are just disks for all intends and purposes. EBS has very little
to do with it. Any attached block storage with hardware based backup
will run into the same problem.


--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Berend de Boer
2013-07-03 09:07:09 UTC
Permalink
Kevin> The other option is to not freeze the filesystem before
Kevin> taking the snapshot, but again you risk leaving things in
Kevin> an inconsistent state, and/or the last few writes you think
Kevin> you made didn't actually get committed to disk yet. For
Kevin> automated systems that create then clone filesystems for
Kevin> new VMs, this can be a big problem. At best, you're going
Kevin> to get a warning that the filesystem wasn't cleanly
Kevin> unmounted.

But I have a couple of disks, and the EBS snapshots are taking
slightly one after another with a few ms between, so I don't think ZFS
will cover from this.

But indeed, you have explained my question very well, thanks!!

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Berend de Boer
2013-07-03 09:10:06 UTC
Permalink
Kevin> I know this is a sort of obscure use case, but Linux and
Kevin> Windows both have this functionality that VMWare will use
Kevin> if present (and the guest tools know about it).

May I correct you? This is not obscure. This is an extremely common
use-case on Linux, either using LVM/XFS file systems, and definitely
best practice on Amazon AWS. If you're not doing it this way, you're
doing it wrong.


Kevin> Linux goes a step further and ensures that it's not in the
Kevin> middle of writing anything to swap during the quiesce
Kevin> period, too. I don't think this would be terribly difficult
Kevin> to implement, a hook somewhere along the write chain that
Kevin> blocks (or queues up) anything trying to write until the
Kevin> unfreeze comes along, but I'm guessing there are all sorts
Kevin> of deadlock opportunities here.

Kevin> Either way, I'm not asking that anyone spend time to write
Kevin> this, I'm just trying to reword what the original requestor
Kevin> was talking about.

Heh, I'm asking that actually :-)

Would be great to have this in FreeBSD. Once you have used EBS
snapshots, you really don't want to go back.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Berend de Boer
2013-07-03 19:39:26 UTC
Permalink
Mark> On the other hand, every time I read about "block storage
Mark> snapshots" -- even if you quiesce the filesystem -- I start
Mark> to get really itchy thinking about the likeliness a high TPS
Mark> database is going to end up with corruption and require
Mark> recovery. :)

That's not how it works: if you freeze the file system at a consistent
point, you can use the roll-forward/backward capabilities of your db
to come back clean.

You can do this even fancier. Mysql or Mongo allow you to flush their
caches as well + put in place a global lock of the database.

Then you freeze the file system, take the snapshot, unfreeze file
system, then unfreeze mysql.

People have been using this for many years, for example a famous
utility for this was mylvmbackup: http://www.lenzg.net/mylvmbackup/

ZFS should work really well here, and people probably use zfs
snapshots in this manner. If performance is an issue, you do this on
the slave obviously.


But I don't get the itchy part: if a disk is just software, I can copy
it. I want to copy it. To another data centre (zone) for
example. That's a trivial operation in EBS, and you can clone huge
disks this way in minutes. Doing an zfs send/recv is just laughably
primitive and slow compared to this. It would take me days to send a
full snapshot this way.


Mark> This really does sound like Amazon needs to provide whatever
Mark> mechanism to communicate between the host and the guest so
Mark> this EBS snapshot can take place.

Again, my request has *nothing* to do with EBS. If you have multiple
disks in your pool, how can you make a backup you can restore from, at
the hardware level.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Gary Palmer
2013-07-03 20:02:41 UTC
Permalink
Post by Berend de Boer
Again, my request has *nothing* to do with EBS. If you have multiple
disks in your pool, how can you make a backup you can restore from, at
the hardware level.
Other than using SAN (FC or iSCSI), I know of no reason to do backups
at the raw disk level, nor any real demand.

I've worked with people who have done LUN based backups in the past and
they have one drawback - they tend to back up the entire LUN, irrespective
of whether it is an allocated block or not. Modern systems that implement
some kind of TRIM emulation (or cheat and sniff the filesystem block
allocation maps) may alleviate that problem.

However, in the vast majority of cases, people back up from above
the FS, not below. This makes your use case probably more tied to EBS
than you may otherwise think.

Regards,

Gary
Berend de Boer
2013-07-03 21:52:52 UTC
Permalink
Gary> Other than using SAN (FC or iSCSI), I know of no reason to
Gary> do backups at the raw disk level, nor any real demand.

Probably the hundreds of thousands of businesses that use Amazon AWS
disagree :-)


Gary> I've worked with people who have done LUN based backups in
Gary> the past and they have one drawback - they tend to back up
Gary> the entire LUN, irrespective of whether it is an allocated
Gary> block or not. Modern systems that implement some kind of
Gary> TRIM emulation (or cheat and sniff the filesystem block
Gary> allocation maps) may alleviate that problem.

That's not how EBS does a back up. It only backs up allocated blocks
for the first time, and for subsequent backups only back up the
changed blocks.


Gary> However, in the vast majority of cases, people back up from
Gary> above the FS, not below. This makes your use case probably
Gary> more tied to EBS than you may otherwise think.

People generally didn't have a choice I would say. Now millions of
servers run on top of block storage.

Disks are just software. That's the new world.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Adam Vande More
2013-07-03 23:09:00 UTC
Permalink
Post by Berend de Boer
Gary> Other than using SAN (FC or iSCSI), I know of no reason to
Gary> do backups at the raw disk level, nor any real demand.
Probably the hundreds of thousands of businesses that use Amazon AWS
disagree :-)
Well, that would be a SAN backup wouldn't it :) (Not NAS as you cited
earlier)
Post by Berend de Boer
Gary> However, in the vast majority of cases, people back up from
Gary> above the FS, not below. This makes your use case probably
Gary> more tied to EBS than you may otherwise think.
People generally didn't have a choice I would say. Now millions of
servers run on top of block storage.
"People generally" don't use Amazon and especially EBS. Servers have
pretty much always run on block storage.
Post by Berend de Boer
Disks are just software. That's the new world.
No matter how much you wish hard drive to be software they are still hard
drives.
--
Adam Vande More
Jeremy Chadwick
2013-07-03 23:36:31 UTC
Permalink
Post by Berend de Boer
Gary> Other than using SAN (FC or iSCSI), I know of no reason to
Gary> do backups at the raw disk level, nor any real demand.
Probably the hundreds of thousands of businesses that use Amazon AWS
disagree :-)
Gary> I've worked with people who have done LUN based backups in
Gary> the past and they have one drawback - they tend to back up
Gary> the entire LUN, irrespective of whether it is an allocated
Gary> block or not. Modern systems that implement some kind of
Gary> TRIM emulation (or cheat and sniff the filesystem block
Gary> allocation maps) may alleviate that problem.
That's not how EBS does a back up. It only backs up allocated blocks
for the first time, and for subsequent backups only back up the
changed blocks.
Gary> However, in the vast majority of cases, people back up from
Gary> above the FS, not below. This makes your use case probably
Gary> more tied to EBS than you may otherwise think.
People generally didn't have a choice I would say. Now millions of
servers run on top of block storage.
Disks are just software. That's the new world.
I understand what you're trying to say by this statement, but you're
stretching it big time. I was opting to stay out of the thread until I
saw your last line.

It doesn't matter how many layers of I/O abstraction there are,
eventually physical hardware for storage is involved. It doesn't matter
what type of disk (mechanical vs. solid-state vs. something
custom/proprietary) or what type of controller -- it does eventually end
up on bare metal.

I say this well-aware of the relationship between software and hardware
(ex. disk firmware (software) controlling the underlying hardware (drive
motor IC, underlying I/O controller, etc.)).

The problem with these "software solutions" (cloud, etc.) -- I'm not
sure what to call them because it varies -- are many. One of those
problems is that there is a great disconnect between the user of the
"solution" and the actual bare metal. And quite often the topology --
meaning the actual innards/how it all works/what transpires even on a
protocol level -- is never documented or made public to the user. Hell,
my experience in the enterprise world shows that quite often even
support personnel don't know how it works. Why this matters: when it
breaks -- and it will break, believe me -- that information becomes
critical/key to troubleshooting and providing a solution. I've even
encountered one "enterprise-grade storage solution" where when the
product broke (as in all filesystems inaccessible), multiple levels of
support engineers had no idea where the actual problem was because of
how much abstraction there was between the appliance itself and the bare
metal. How many engineers does it take to turn a light bulb?
Apparently too many.

As politely as I can: It sounds like you may have spent too much time
with these types of setups, or believe them to be "magical" in some way,
in turn forgetting the realities of bare metal and instead thinking
"everything is software". Bzzt.

And while generally I don't see eye-to-eye with Richard Stallman,
storage **is** the one area where I do:

http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman

KISS principle goes a long, long way when applied to storage. And no, I
am not saying "get away from this EBS/AWS stuff!" -- I'm simply saying
that your statement "disks are software in the new world" is utter
nonsense.
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Berend de Boer
2013-07-03 23:56:35 UTC
Permalink
Jeremy> As politely as I can: It sounds like you may have spent
Jeremy> too much time with these types of setups, or believe them
Jeremy> to be "magical" in some way, in turn forgetting the
Jeremy> realities of bare metal and instead thinking "everything
Jeremy> is software". Bzzt.

Heh. The solution with Amazon is even worse: if things go wrong,
you're screwed. Can't get your disks back. You can't call
anyone. There's no bare metal to touch, and no, they won't let you
into their data centres.

So I'm actually trying to avoid the magic.

The only guarantee I basically have is that if I have made an EBS
snapshot of my disk, I can, one day, restore that, and that this
snapshot is stored in some multi-redundancy (magic!) cloud.

(And obviously you can try to run a mirror in another data centre
using zfs send/recv, yes, will run that too).

If you go with AWS, there are no phone calls to make. Disk gone is
disk gone. So you need to have working backup strategies in place.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Jeremy Chadwick
2013-07-04 01:08:15 UTC
Permalink
Post by Berend de Boer
Jeremy> As politely as I can: It sounds like you may have spent
Jeremy> too much time with these types of setups, or believe them
Jeremy> to be "magical" in some way, in turn forgetting the
Jeremy> realities of bare metal and instead thinking "everything
Jeremy> is software". Bzzt.
Heh. The solution with Amazon is even worse: if things go wrong,
you're screwed. Can't get your disks back. You can't call
anyone. There's no bare metal to touch, and no, they won't let you
into their data centres.
So I'm actually trying to avoid the magic.
The only guarantee I basically have is that if I have made an EBS
snapshot of my disk, I can, one day, restore that, and that this
snapshot is stored in some multi-redundancy (magic!) cloud.
(And obviously you can try to run a mirror in another data centre
using zfs send/recv, yes, will run that too).
If you go with AWS, there are no phone calls to make. Disk gone is
disk gone. So you need to have working backup strategies in place.
How is being reliant on EBS (for readers: Amazon Elastic Block Store,
which is advertised as, and I quote, "a virtualised storage service")
"avoiding the magic"? You're still reliant on black-box voodoo. :-)

I think the limiting factor here is more related to your need to use AWS
and its services than using bare metal. I respect that/understand that,
and won't get into a debate about that. So, that said...

As I see them, your choices are these:

- Keep using EBS, doing all of this at a "higher level" (meaning the
Amazon level) by making snapshots of the actual "storage disks"
that are referenced/used by the underlying OS -- FreeBSD, as we have
stated, does not have a way to do this (AFAIK) from within the OS
(meaning "induce an EBS snapshot"). Linux may have that, but no
matter what, it's an Amazon proprietary thing. Are we clear?

You should still be able to use whatever Amazon's EBS or AWS provides
(as a user interface) to make "snapshots" of those disks, at least
that's what I'd assume. I have no familiarity with this, etc..

- Within the OS: raw disk dump. It doesn't matter what the "backing
store" is (e.g. EBS, something across iSCSI, etc.). Example command:
dd if=/dev/da0 of=/some/other/place bs=64k (or you can send it to
stdout and pipe that across ssh, netcat, etc.)

This will read every LBA on the device -- including unused/untouched
space, the partitioning scheme/layout (i.e. MBR/GPT), and the boot
blocks/bootstrap mechanisms -- and will be the size of the disk
itself (e.g. if 1TB, then the resulting file will be 1TB).

From what you've said, this does not work for you because of the
immense size (even if piped through gzip) does not allow for incremental
snapshots of changes to the disk, and takes a long time.

There is no way on FreeBSD or Linux, to my knowledge, to accomplish the
latter at the disk level -- at the filesystem level yes, disk level
no. Most people prefer to do this at the filesystem level (which if
done right is also very fast -- you know this already though).

- Within the OS: UFS+SU (do not use journalling/SUJ with this feature,
it's known to be busted/throws a nastygram) filesystem snapshots.

Commonly accomplished using dump(8), and restore using restore(8).
dump(8) accomplishes snapshot generation by calling mksnap_ffs(8)
(also a utility).

Snapshot generation is usually very fast (commonly a few seconds),
but depends on lots of things which I will not get into.

dump(8) and restore(8) both support incremental snapshots, and also
convenient, restore(8) has an interactive mode where you can navigate
a snapshot and extract individual files.

These are filesystem snapshots, not disk snapshots, and thus do
not include things like the partition table (MBR/GPT) nor the
bootstraps. This matters more if you're trying to do a "bare metal
restore" of a box (i.e. box #0 broke badly, need to turn box #1 into
the role of box #0 in every way/shape/form), so an admin in that
case has to recreate the partition table and reinstall bootstraps
manually. (There are ways to back these up as well via dd, but I
am not going to go into that).

And now some real-world experience: what isn't mentioned/discussed
aside from mailing lists and what not is that this methodology is
unreliable (for example I have avoided it and been a critic of it for
several years). There are problems with the UFS-specific snapshot
"stuff" that have existed for years, where sometimes the snapshot
generation never ends, sometimes it causes the system to lock up, and
lots of other problems). I will not provide all the details -- just
go looking at the mailing lists -stable and -fs over the past several
years and you'll see what I'm talking about.

Likewise real-world experience: these bugs are what drove me away
from using UFS snapshots, and I often boycott them for this reason.

- Within the OS: ZFS and use "zfs snapshot".

These are, of course, ZFS filesystem snapshots. Incrementals are
supported, and these are also usually very fast (few seconds). You
can also use "zfs {send,recv}" to send/receive the snapshots to
another system and have them restored on that system (many admins
really REALLY like this feature).

Likewise, because this is filesystem-based, again this does not back
up the partition tables nor the bootstraps.

There are some "gotchas" with ZFS snapshots but those really depend
on 1) how you're using them, and 2) your type of data. I won't go
into #2, but others here have already mentioned it.

For example, one bug that's been around for 3 years now has been if
you prefer to navigate the snapshots as a filesystem and use the
default filesystem attribute visibility=hidden (default) -- in this
case "pwd" will return "No such file or directory" when within a
snapshot. There are workarounds for this.

Occasionally I see problems reported by people when using "zfs
{send,recv}" and on (more rare) occasion issues with snapshot
generation entirely. Most of the problems with the latter, however,
have been worked out within stable/9 (so if you go the ZFS route,
PLEASE PLEASE PLEASE run stable/9, not 9.1-RELEASE or earlier).

There are also scripts in ports/sysutils to make management of ZFS
snapshots much easier. Some write their own, others use those
scripts.

Also, because nobody seems to warn others of this: if you go the
ZFS route on FreeBSD, please do not use features like dedup or
compression. I can expand more on this if asked, as they have
separate (and in one case identical/similar) caveats. (I'm always
willing to bend on compression as long as the user knows of the one
problem that still exists today and feels it's okay/acceptable)

- Within the OS: rsync and/or rsnapshot (which uses rsync).

These work at the file level (not filesystem, but file) -- think
"copying all the files". They are known to be reliable, and can
be used in conjunction with systems over a network (to back up from
system X to system Y; default is via SSH).

Naturally, this doesn't back up partition tables or bootstraps either.

rsnapshot provides it's "snapshot-like" behaviour using hard links,
which allows for incrementals in how it works (read about it on the
web for further details -- not rocket science). But be aware
"incremental" means "files that have been changed, added, or deleted",
it doesn't mean "store/back up only the portions of a file that changed".
I.e. if your MySQL table that's 2GB had a write done do it between the
last snapshot and now, the incremental is going to back up an entire 2GB.
That may be a drawback depending on what you're doing -- this is for
your sysadmin to figure out.

I have read of some problems relating to rsync when used with ZFS, but
that seems to stem more from the amount of I/O being done and the type
of data being used on the ZFS pool/filesystem, so rsync just happens
to tickle something odd in those cases. I have never personally
encountered this however (that's just me though), as I explain here:

Real-world experience: rsnapshot is what I used for my hosting
organisation of nearly 18 years to back-up 8 or 9 servers, nightly,
across a network (gigE LAN). Those servers also used ZFS as their
filesystems (for everything except root/var/tmp/usr), both the source
being copied as well as the filesystem used to store the backups,
and I only once had an issue during the early FreeBSD 8.x days (caused
by a ZFS bug that has since been fixed). I still use rsync/rsnapshot
to this day, even on my local system (which is ZFS-based barring
root/var/tmp/usr -- I choose to use rsnapshot rather than ZFS
snapshots for reasons I will not go into here as they're irrelevant).

However, I would not use this method where ""snapshots"" need to be
done very regularly (i.e. every hour), particularly on filesystems
where there are either a) lots and lots of files, or b) files of
immense size that change often. Filesystem snapshots are a better
choice in that case.

There are certainly other options available which I have not touched on,
but in general the filesystem snapshot choice is probably your best
bet.

Filesystem snapshots have one other advantage that you might not have
thought of: they're done within the OS, which means if Amazon's EBS
stuff changes in such a way where you lose backwards compatibility or
encounter bugs with it (during EBS snapshot generation), you can still
get access to your data in some manner of speaking.

I hope this has given you some details, avenues of choice, or at least
things to ponder. Choose wisely, and remember: **ALWAYS DO A RESTORE
TEST** when choosing a new backup strategy. I cannot tell you how many
times I encounter people "doing backups" who never test a restore until
that horrible day... only to find their backups were done wrong, or that
the restore process (or even software!) is just utterly broken.
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Berend de Boer
2013-07-04 01:40:07 UTC
Permalink
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.

Exactly the two reasons why I'm experimenting with FreeBSD on AWs.

Please tell me more.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Jeremy Chadwick
2013-07-04 02:15:35 UTC
Permalink
Post by Berend de Boer
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.
Exactly the two reasons why I'm experimenting with FreeBSD on AWs.
Please tell me more.
dedup has immense and crazy memory requirements; the commonly referenced
model (which is in no way precise, it's just a general recommendation)
is that for every 1TB of data you need 1GB of RAM just for the DDT
(deduplication table)) -- understand that ZFS's ARC also eats lots of
memory, so when I say 1GB of RAM, I'm talking about that being *purely
dedicated* to DDT. But as I said the need varies depending on the type
of data you have. When using dedup, the general attitude is "give ZFS
as much memory as possible. Max your DIMM slots out with the biggest
DIMMs the MCH can support".

Many problems I have seen on the FreeBSD lists -- and one horror story
on Solaris -- often pertain to people trying dedup. There have been
reported issues with resilvering pools that use dedup, or even simply
mounting filesystems using dedup. The situation when dedup is in use
becomes significantly more complex in a "something is broken" scenario.
The horror story I've heard and retell is this one, and this is me going
off of memory:

There was supposedly an Oracle customer who had been using dedup for
some time, and they began to have problems (I don't remember what; if it
was with ZFS, the controller, disks, or what). Anyway, the situation
was such that the client needed to either resilver their pool, or just
get their data -- but because they were using dedup, they could not.
The system could not be upgraded to have more RAM (which would have
alleviated the pains).

The solution which was chosen was for Oracle to actually ship the
customer an entire bare metal system with a gargantuan amount of RAM
(hundreds of gigabytes; I often say 384GB because that's what sticks in
my mind for some reason, maybe it was 192GB, doesn't matter), just to
recover from the situation.

compression is generally safe to use on FreeBSD, but there are often
surprising changes to certain behaviours that people don't consider: the
most common one I see reported is conflicting information between what
"df", "du", and "zfs list" show. AFAIK this applies to Solaris/Illumos
too, so it's just the nature of the beast. compression doesn't have the
crazy memory requirements of dedup, obviously -- two separate things,
don't confuse the two. :-)

The final item is the one that, still to this day, keeps me from using
either dedup or compression on FreeBSD (well actually I'd never consider
dedup, only compression): system interactivity is destroyed when using
either of these features. The system will regularly stall/lock up
(depending on the I/O, for a few seconds) regularly, even at VGA
console. This problem is specific to the FreeBSD port of ZFS as of this
writing; Solaris/Illumos addressed this long ago. Rather than re-write
it, I recommend you read my post from Feburary 2013 which references my
convo with Bob Friesenhahn in October 2011 (please read all the quoted
material too):

http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072171.html

Changing the compression scheme does not solve the issue; the less
CPU-intensive schemes (ex. lzjb) help decrease the impact but do not
solve it.

All that said: there are people (often FreeNAS folks using their systems
solely as a dedicated NAS, not as a shell server or desktop or other
things) who do use these features happily and do not care about the last
issue. Cool/great, I'm glad it works for them. But in my case it's not
acceptable. If/when the above issue is addressed (putting the ZFS
writer threads into their own priority/scheduling class), I look forward
to using compression (but never dedup, I don't have the hardware/memory
for that kind of thing).

Otherwise please spend an afternoon looking through freebsd-fs and
freebsd-stable lists over the past 2 years (see web archives) and
reading about different stories/situations. I always, *always* advocate
this.
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Berend de Boer
2013-07-04 02:24:48 UTC
Permalink
Jeremy> The solution which was chosen was for Oracle to actually
Jeremy> ship the customer an entire bare metal system with a
Jeremy> gargantuan amount of RAM (hundreds of gigabytes; I often
Jeremy> say 384GB because that's what sticks in my mind for some
Jeremy> reason, maybe it was 192GB, doesn't matter), just to
Jeremy> recover from the situation.

Yeah, well aware of the memory requirements. No problem. I see a lot
of people here who are apparently unaware of AWS and why it is so far
ahead of the pack.

If I have a memory problem, I make an image of my machine, stop the
old one, and boot up on a new machine with more memory. As simple as
that. Takes my 5 minutes.

8GB not enough? 5 minutes later you boot up on 32GB without a sweat
from your holiday vacation spot. 32GB not enough? What about 68GB?
That's not enough? Why not 244GB? By that time your credit card is
sweating, but hardware is trivially upgradable on EC2.

No waiting for hardware to arrive, minimum down time, all customers
remain happy.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Freddie Cash
2013-07-04 02:39:58 UTC
Permalink
Post by Jeremy Chadwick
Post by Berend de Boer
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.
Exactly the two reasons why I'm experimenting with FreeBSD on AWs.
Please tell me more.
dedup has immense and crazy memory requirements; the commonly referenced
model (which is in no way precise, it's just a general recommendation)
is that for every 1TB of data you need 1GB of RAM just for the DDT
(deduplication table)) -- understand that ZFS's ARC also eats lots of
memory, so when I say 1GB of RAM, I'm talking about that being *purely
dedicated* to DDT.
Correction: 1 GB of *ARC* space per TB of *unique* data in the pool. Each
unique block in the pool gets an entry in the DDT.

You can use L2ARC to store the DDT, although it takes ARC space to track
data in L2ARC, so you can't go crazy (512 GB L2 with only 16 GB ARC is a
no-no).

However, you do need a lot of RAM to make dedupe work, and your I/O does
drop through the floor.
Jeremy Chadwick
2013-07-04 02:51:00 UTC
Permalink
Post by Freddie Cash
Post by Jeremy Chadwick
Post by Berend de Boer
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.
Exactly the two reasons why I'm experimenting with FreeBSD on AWs.
Please tell me more.
dedup has immense and crazy memory requirements; the commonly referenced
model (which is in no way precise, it's just a general recommendation)
is that for every 1TB of data you need 1GB of RAM just for the DDT
(deduplication table)) -- understand that ZFS's ARC also eats lots of
memory, so when I say 1GB of RAM, I'm talking about that being *purely
dedicated* to DDT.
Correction: 1 GB of *ARC* space per TB of *unique* data in the pool. Each
unique block in the pool gets an entry in the DDT.
You can use L2ARC to store the DDT, although it takes ARC space to track
data in L2ARC, so you can't go crazy (512 GB L2 with only 16 GB ARC is a
no-no).
However, you do need a lot of RAM to make dedupe work, and your I/O does
drop through the floor.
Thanks Freddie -- I didn't know this (re: ARC space per TB of unique
data); wasn't aware that's where the DDT got placed. (Actually makes
sense now that I think about it...)
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Daniel Kalchev
2013-07-04 08:02:51 UTC
Permalink
Post by Jeremy Chadwick
Post by Freddie Cash
Post by Jeremy Chadwick
Post by Berend de Boer
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.
Exactly the two reasons why I'm experimenting with FreeBSD on AWs.
Please tell me more.
dedup has immense and crazy memory requirements; the commonly referenced
model (which is in no way precise, it's just a general recommendation)
is that for every 1TB of data you need 1GB of RAM just for the DDT
(deduplication table)) -- understand that ZFS's ARC also eats lots of
memory, so when I say 1GB of RAM, I'm talking about that being *purely
dedicated* to DDT.
Correction: 1 GB of *ARC* space per TB of *unique* data in the pool. Each
unique block in the pool gets an entry in the DDT.
You can use L2ARC to store the DDT, although it takes ARC space to track
data in L2ARC, so you can't go crazy (512 GB L2 with only 16 GB ARC is a
no-no).
However, you do need a lot of RAM to make dedupe work, and your I/O does
drop through the floor.
Thanks Freddie -- I didn't know this (re: ARC space per TB of unique
data); wasn't aware that's where the DDT got placed. (Actually makes
sense now that I think about it...)
The really bad thing about this is that the DDT actually competes with
everything else in ARC. You don't want to arrive at the point where you
trash the ARC with DDT...

ZFS with dedup is really "handy" for an non-interactive storage box,
such as an archive server. Mine get over 10x dedup ratio and that means
I fit the data in 24 disks instead of 240 disks... Extra RAM and L2ARC
is well worth the cost and the drop in performance.

If you need higher performance form the storage subsystem though, ignore
both dedup and compression. Even if they are bug free, some day.

Which brings us back to AWS. I believe AWS will charge for CPU time too,
which you will happily waste with both dedup and compression. Yet
another reason to avoid it, unless block storage (unlikely) is more
expensive.

Daniel
Jeremy Chadwick
2013-07-04 08:22:09 UTC
Permalink
----- Original Message ----- From: "Berend de Boer"
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.
While dedup is memory and sometimes cpu hungry, so HW spec
should be considered before using it, compression is not so
and I've not seen any valid reason not to use it should it
fit your uses.
We actually use compression extensivily here and we've
had nothing but positive results from it so sounds like
FUD to me.
The problem with the lack of separate and prioritised write threads for
dedup and compression, thus causing interactivity stalls, is not FUD,
it's fact. I explained this in the part of my reply to Berend which you
omitted, which included the proof and acknowledgement from folks who
are in-the-know (Bob Friesenhahn). :/ Nobody has told me "yeah that
got fixed", so there is no reason for me to believe anything has
changed.

If a person considering use of compression on FreeBSD ZFS doesn't mind
that problem, then by all means use it. It doesn't change the fact that
there's an issue, and one that folks should be made aware of up front.
It's not spreading FUD: it's spreading knowledge of a certain behaviour
that differs between FreeBSD and Solaris/Illumos. The issue is a
deal-breaker for me; if it's not for you, great.
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Steven Hartland
2013-07-04 08:47:26 UTC
Permalink
----- Original Message -----
From: "Jeremy Chadwick" <***@koitsu.org>
To: "Steven Hartland" <***@multiplay.co.uk>
Cc: "Berend de Boer" <***@pobox.com>; "freebsd-fs" <freebsd-***@freebsd.org>
Sent: Thursday, July 04, 2013 9:22 AM
Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?
Post by Jeremy Chadwick
----- Original Message ----- From: "Berend de Boer"
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.
While dedup is memory and sometimes cpu hungry, so HW spec
should be considered before using it, compression is not so
and I've not seen any valid reason not to use it should it
fit your uses.
We actually use compression extensivily here and we've
had nothing but positive results from it so sounds like
FUD to me.
The problem with the lack of separate and prioritised write threads for
dedup and compression, thus causing interactivity stalls, is not FUD,
it's fact. I explained this in the part of my reply to Berend which you
omitted, which included the proof and acknowledgement from folks who
are in-the-know (Bob Friesenhahn). :/ Nobody has told me "yeah that
got fixed", so there is no reason for me to believe anything has
changed.
Do you have an links to the discuss on this Jeremy as I'd be intereted
to read up on the this when I have some spare time?
Post by Jeremy Chadwick
If a person considering use of compression on FreeBSD ZFS doesn't mind
that problem, then by all means use it. It doesn't change the fact that
there's an issue, and one that folks should be made aware of up front.
It's not spreading FUD: it's spreading knowledge of a certain behaviour
that differs between FreeBSD and Solaris/Illumos. The issue is a
deal-breaker for me; if it's not for you, great.
Sounds like it could well be use case based then, as we've not had any
problems compression causing interactively problems. Quite the opposite
in fact, the reduced physical IO that compression results in improved
interactivity.

So I guess its like everything, one size doesn't fit all, so temporing
statements about blanket avoiding the these features seems like the
way to go :)

Regards
Steve




Regards
Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.
Jeremy Chadwick
2013-07-04 10:32:27 UTC
Permalink
----- Original Message ----- From: "Jeremy Chadwick"
Sent: Thursday, July 04, 2013 9:22 AM
Subject: Re: EBS snapshot backups from a FreeBSD zfs file system: zpool freeze?
Post by Jeremy Chadwick
----- Original Message ----- From: "Berend de Boer"
Jeremy> Also, because nobody seems to warn others of this: if
Jeremy> you go the ZFS route on FreeBSD, please do not use
Jeremy> features like dedup or compression.
While dedup is memory and sometimes cpu hungry, so HW spec
should be considered before using it, compression is not so
and I've not seen any valid reason not to use it should it
fit your uses.
We actually use compression extensivily here and we've
had nothing but positive results from it so sounds like
FUD to me.
The problem with the lack of separate and prioritised write threads for
dedup and compression, thus causing interactivity stalls, is not FUD,
it's fact. I explained this in the part of my reply to Berend which you
omitted, which included the proof and acknowledgement from folks who
are in-the-know (Bob Friesenhahn). :/ Nobody has told me "yeah that
got fixed", so there is no reason for me to believe anything has
changed.
Do you have an links to the discuss on this Jeremy as I'd be intereted
to read up on the this when I have some spare time?
Warning up front: sorry for the long mail (I did try to keep it terse)
but most of it is demonstrating the problem.

Useful FreeBSD links, specifically the conversations I've had over the
years about this problem, at least the most useful ones. The first one
is probably the most relevant, since it's a statement from Bob himself
explaining it:

http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012726.html
http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012752.html
http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072171.html
http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072178.html

To be clear (note the date and version): as of September 2011 I was able
to reproduce the problem on stable/8.

While you were writing your mail, I was off actually trying to find out
technical details (specifically the source code changes in OpenSolaris
or later) which fixed it / what Bob alluded to. I really had to jab at
search engines to find anything useful, and wasn't getting anywhere
until I found this:

http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/28192

This mentioned the OpenSolaris bug number 6586537. I then poked about
svnweb and found that this fix was imported into FreeBSD with the "ZFS
version 15" import. Commit log entry:

6586537 async zio taskqs can block out userland commands (142901-09)

Relevant revisions, dates, and branches for this:

r209962: Jul 2010: head: http://svnweb.freebsd.org/base?view=revision&revision=209962
r212668: Sep 2010: stable/8: http://svnweb.freebsd.org/base?view=revision&revision=212668

And that head became stable/9 as of September 2011, I believe.

So my testing as of September 2011 would have included the fix for
6586537. This makes me wonder if 6586537 is truly the issue I've been
describing or not.

It's easy enough to test for on stable/9 today (zfs create, zfs set
compression=on, do the dd and in another window do stuff and see what
happens, then later zfs destroy). So let's see if it's still there
almost 2 years later...

Yup, still there, but it seems improved in some way, possibly due to a
combination of things. This box is actually a C2Q (more powerful than
the one in Sep 2011) too, and is actively doing nothing. Relevant bits:

# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
backups 1.81T 463G 1.36T 24% 1.00x ONLINE -
data 2.72T 694G 2.04T 24% 1.00x ONLINE -
# zdb -C | grep ashift
ashift: 12
ashift: 12
# zfs create -o compression=lzjb -o mountpoint=/mnt backups/comptest
# zfs get all backups/comptest | grep compression
backups/comptest compression lzjb local

The "backups" pool is a single disk (WD20EFRX) running at SATA300 with
NCQ, backed by an Intel ICH9 in AHCI mode. The disk is a 4K sector
drive where the gnop trick was used (proof above). I could have used
the "data" pool (raidz1 driven by 3 disks (WD10EFRX) + gnop), but it
wouldn't matter -- the problem is consistent no matter what the pool.

I can't demonstrate the problem using "while : ; do date ; sleep 1 ;
done" because sleep 1 isn't granular enough (yes I'm aware FreeBSD
sleep(1) supports more granularity) and because date/strftime doesn't
show microseconds. So off into perl + Time::HiRes we go...

window1# date +%s ; dd if=/dev/zero of=/mnt/bigfile bs=64k
1372932367
^C123977+0 records in
123977+0 records out
8124956672 bytes transferred in 16.437748 secs (494286486 bytes/sec)
window2# perl -e 'use Time::HiRes qw(time sleep); $|=1; while(1) { print time, "\n"; sleep(0.2); }'

Now because even 0.2 seconds probably isn't granular enough, I ended up
pressing Enter in the middle of the running perl output every time I'd
notice that lines weren't coming across at consistent 0.2 second
intervals (I guess I have a good eye for this sort of thing). So blank
lines are me noticing the pauses/delays I've been talking about:

1372932411.90407
1372932412.10415
1372932412.30513
1372932412.50614
1372932412.70713
1372932412.90813
1372932413.10913
1372932413.31013
1372932413.51112

1372932413.71213
1372932413.91315
1372932414.11413
1372932414.31513
1372932414.51615
1372932414.71714
1372932415.00015

1372932415.27278
1372932415.47316
1372932415.67416
1372932415.87514
1372932416.07615
1372932416.27715
1372932416.48115
1372932416.78215

1372932416.98614
1372932417.18717
1372932417.38814
1372932417.58912
1372932417.79016
1372932417.99115

1372932418.40577
1372932418.60617
1372932418.80715
1372932419.00813
1372932419.20913
1372932419.41013
1372932419.64116
1372932419.85516

1372932420.11614
1372932420.31716
1372932420.51813
1372932420.71913
1372932420.92016
1372932421.12115
1372932421.32216
1372932421.58213

1372932421.78316
1372932421.98416
1372932422.18515
1372932422.38613
1372932422.58713
1372932422.80118
1372932423.05617
1372932423.34016

1372932423.54116
1372932423.74215
1372932423.94314
1372932424.14415
1372932424.43316
1372932424.63417
1372932424.85514

1372932425.05613
1372932425.25715
1372932425.45813
1372932425.65913
1372932425.86017
1372932426.18416

1372932426.51216
1372932426.71312
1372932426.91413
1372932427.11515
1372932427.31613
1372932427.74915

1372932428.00214
1372932428.20315
1372932428.40415
1372932428.60514
1372932428.80613
1372932429.00713
1372932429.38115

1372932429.58214
1372932429.78316
1372932429.98417
1372932430.18519
1372932430.38614
1372932430.58713
1372932430.92817

1372932431.12914
1372932431.33012
1372932431.53115
1372932431.73214
1372932431.93313
1372932432.13413

1372932432.48115
1372932432.73414
1372932432.93514
1372932433.13616
1372932433.33713
1372932433.53817
1372932433.73915
1372932433.95151

1372932434.28214
1372932434.48316
1372932434.68414
1372932434.88515
1372932435.08614
1372932435.28712
1372932435.48916
1372932435.84146

1372932436.05013
1372932436.25117
^C

There's a quite consistent pattern if you look closely: about every 8
lines of output. Each line = every 0.2 seconds, so about every 1.5
seconds is where I'd see a pause which would last for about 0.5 seconds.

And no, the above output *was not* being written to a file on ZFS, only
to stdout. :-)

What's interesting: I tried compression=gzip-9, which historically was
worse (I remember this clearly), but the stalls are about the same.
Maybe it's because I'm using /dev/zero rather than /dev/random, but the
issue there is that /dev/random would tax the CPU (entropy, etc.) more.

We didn't use compression at my previous job on Solaris (available CPU
time was very, very important given what the machines did), so I don't
have any context for comparison.

But: I can do this exact same procedure on the /backups filesystem/pool,
without compression of course, and there are no stalls -- just smooth
interactivity.

Now let me circle back to the convo I had with Fabian in 2013...

I have zero experience doing this "sched trace" stuff. I do not speak
Python, but looking at /usr/src/tools/sched/schedgraph.py almost implies
it has some kind of "visual graphing" (via X? I have no clue from the
code) and "borders" and "colour" support -- this is not an X system, so
unless this Python script generates image files somehow (I have no
image libraries installed on my system)...

My kernel does contain:

options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)

And I can follow the instructions at the top of the Python script and
provide the ktrdump somewhere if needed, but that's about it. I don't
know if that would help or be beneficial in any way -- because even
though I have some familiarity with userland profiling via *_p.a libs,
this is something at a completely different level.

So if someone wants this, I need a bit of hand-holding to know what all
I'm supposed to be doing. The instructions in the Python script make me
a little weary, particularly since it doesn't say to re-set
debug.ktr.mask to 536870912 afterward, so I'm not sure what the
implications are.
Post by Jeremy Chadwick
If a person considering use of compression on FreeBSD ZFS doesn't mind
that problem, then by all means use it. It doesn't change the fact that
there's an issue, and one that folks should be made aware of up front.
It's not spreading FUD: it's spreading knowledge of a certain behaviour
that differs between FreeBSD and Solaris/Illumos. The issue is a
deal-breaker for me; if it's not for you, great.
Sounds like it could well be use case based then, as we've not had any
problems compression causing interactively problems. Quite the opposite
in fact, the reduced physical IO that compression results in improved
interactivity.
So I guess its like everything, one size doesn't fit all, so temporing
statements about blanket avoiding the these features seems like the
way to go :)
While I see the logic in what you're saying, I prefer to publicly
disclose the differences in behaviours between Illumos ZFS and FreeBSD
ZFS.

I'm well-aware of the tremendous and positive effort to minimise those
differences (code-wise) -- I remember mm@ talking about this some time
ago -- but if this is somehow one of them, I do not see the harm in
telling people "FYI, there is this quirk/behavioural aspect specific to
FreeBSD that you should be aware of".

It doesn't mean ZFS on FreeBSD sucks, it doesn't mean it's broken, it
just means it's something that would completely surprise someone out of
the blue. Imagine the thread: "my system intermittently stalls, even at
VGA console... does anyone know what's causing this?" -- I doubt anyone
would think to check ZFS.
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Adam Vande More
2013-07-04 23:51:50 UTC
Permalink
Post by Jeremy Chadwick
The issue is a
deal-breaker for me; if it's not for you, great.
I'm not quite clear on why this is a "deal-breaker" for. The stalls are a
blink of an eyelash here, but reproducible at least sort of. Maybe if you
had a good use case scenario demonstrating the problem it would attract
more attention from those able to fix it. And a PR if one doesn't already
exist.
--
Adam Vande More
Daniel Feenberg
2013-07-04 11:09:13 UTC
Permalink
Post by Jeremy Chadwick
scripts.
Also, because nobody seems to warn others of this: if you go the
ZFS route on FreeBSD, please do not use features like dedup or
compression. I can expand more on this if asked, as they have
separate (and in one case identical/similar) caveats. (I'm always
willing to bend on compression as long as the user knows of the one
problem that still exists today and feels it's okay/acceptable)
Please expand on the problem with compression - we have a lot of very
large, very "fluffy" datasets that compress down about 90% (and are
accessed in pure sequential order) so compression is very attractive to
us.

dan feenberg
NBER
Jeremy Chadwick
2013-07-04 11:11:13 UTC
Permalink
Post by Daniel Feenberg
Post by Jeremy Chadwick
scripts.
Also, because nobody seems to warn others of this: if you go the
ZFS route on FreeBSD, please do not use features like dedup or
compression. I can expand more on this if asked, as they have
separate (and in one case identical/similar) caveats. (I'm always
willing to bend on compression as long as the user knows of the one
problem that still exists today and feels it's okay/acceptable)
Please expand on the problem with compression - we have a lot of
very large, very "fluffy" datasets that compress down about 90% (and
are accessed in pure sequential order) so compression is very
attractive to us.
Please see the rest of the thread; I explain the nuance there. :-)
(Maybe mail delays of some sort are happening somewhere and you haven't
seen the mail yet...)
--
| Jeremy Chadwick ***@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
Bruce Evans
2013-07-03 12:15:50 UTC
Permalink
...
This is because sync in ZFS is implemented as a ZIL commit, so transactions
that haven't yet made it to disk via the normal syncing context will at
least be committed via their ZIL blocks. Which can then be replayed when
the pool is imported later, in this case from the EBS snapshots.
And since the entire tree from the ÃŒberblock down in ZFS is COW, you can't
get an inconsistent pool simply by doing a virtual disk snapshot,
regardless of how that is implemented.
I'm a little confused about this statement, particularly as a result of
http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html
UFS is what's being discussed there, but there are some blanket
statements (maybe I'm taking them out of context, not entirely sure)
made by Bruce there that seem to imply that sync(2) may not actually
flush all memory buffers to disk when issued, only that they're
"scheduled" to be flushed.
That was for ffs.
...
So all this makes me wonder: why exactly does sync(2) result in
different behaviour on UFS than it does on ZFS? Do both of these
filesystems not use BIO_write() and friends? Does sync(2) not simply
iterate over all the queued BIO_write()s and BIO_FLUSH them all?
ffs uses the buffer cache, and zfs doesn't go anywhere near the buffer
cache (it calls driver i/o routines fairly directly, via geom). This
alone gives very different behaviour. But zfs is even more different
for sync(2).
Sorry if I'm overthinking this or missing something, but I just don't
understand why sync(2) would flush stuff to disk with one filesystem but
not another.
It is because zfs ignores sync(2)'s request to not wait for the i/o to
complete.

I don't know much else about zfs.

Bruce
David Xu
2013-07-03 06:21:40 UTC
Permalink
Post by Berend de Boer
Hi All,
I'm experimenting with building a FreeBSD NFS server on Amazon AWS
EC2. I've created a zpool with 5 disks in a raidz2 configuration.
How can I make a consistent backup of this using EBS?
On Linux' file systems I can freeze a file system, start the backup of
all disks, and unfreeze. This freeze usually only takes 100ms or so.
ZFS on FreeBSD does not appear to have such an option. I.e. what I'm
looking for is basically a hardware based snapshot. ZFS should simply
be suspended at a recoverable point for a few hundred ms.
http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots
Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is
going to be impossible. Unfortunately that means back to Linux sigh.
--
All the best,
Berend de Boer
------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
What you need is a tool to create snapshot on EBS server, and let EBS
server to transfer the snapshot image to anonther EBS server, it is
nothing to do with FreeBSD. EBS server which is based on Linux can
create snapshot at block level, AFAIK its device mapper is capabilities
of creating a snapshot for a disk volume, FreeBSD's geom lacks of such a
class. But it is nothing to do with your needs, it is only needed on
EBS server.

Regards,
David Xu
Berend de Boer
2013-07-03 09:17:43 UTC
Permalink
David> What you need is a tool to create snapshot on EBS server,

FYI, EBS is not a server. It is NAS. It's block storage.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
David Xu
2013-07-04 02:25:39 UTC
Permalink
Post by Berend de Boer
David> What you need is a tool to create snapshot on EBS server,
FYI, EBS is not a server. It is NAS. It's block storage.
--
All the best,
Berend de Boer
You can call it as NAS, but in pratical it is implemented as a server
farm, we do have implemented a local EBS farm, normally, a command can
be sent to a master server to request a snapshot for its client, the
client can continuely writes to its iSCSI disk without being suspended
by snapshot. HW snapshot is not a problem, because our clients uses
journal file system, the snapshot even in inconsistent state will be
recovered by its client OS when it is mounted. So our client does not
need to support file system suspending or snapshot, only needs a journal
file system.

Regards,
David Xu
Steven Hartland
2013-07-04 00:13:44 UTC
Permalink
----- Original Message -----
Post by Berend de Boer
Hi All,
I'm experimenting with building a FreeBSD NFS server on Amazon AWS
EC2. I've created a zpool with 5 disks in a raidz2 configuration.
How can I make a consistent backup of this using EBS?
On Linux' file systems I can freeze a file system, start the backup of
all disks, and unfreeze. This freeze usually only takes 100ms or so.
ZFS on FreeBSD does not appear to have such an option. I.e. what I'm
looking for is basically a hardware based snapshot. ZFS should simply
be suspended at a recoverable point for a few hundred ms.
http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots
Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is
going to be impossible. Unfortunately that means back to Linux sigh.
Not been following the thread really so excuse if this has already
been mentioned ;-)

There is a zpool freeze <pool> which stops spa_sync() from doing
anything, so that the only way to record changes is on the ZIL.

The comment in the zpool_main is: "'freeze' is a vile debugging
abomination" so it's evil but might be what you want if you're up to
writing some code.

For more info have a look at ztest.

Regards
Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.
Charles Sprickman
2013-07-04 00:28:44 UTC
Permalink
Post by Steven Hartland
Post by Berend de Boer
Hi All,
I'm experimenting with building a FreeBSD NFS server on Amazon AWS
EC2. I've created a zpool with 5 disks in a raidz2 configuration.
How can I make a consistent backup of this using EBS?
On Linux' file systems I can freeze a file system, start the backup of
all disks, and unfreeze. This freeze usually only takes 100ms or so.
ZFS on FreeBSD does not appear to have such an option. I.e. what I'm
looking for is basically a hardware based snapshot. ZFS should simply
be suspended at a recoverable point for a few hundred ms.
http://thr3ads.net/zfs-discuss/2010/11/580781-how-to-quiesce-and-unquiesc-zfs-and-zpool-for-array-hardware-snapshots
Absent a "zfs freeze" it seems using FreeBSD/zfs on AWS with EBS is
going to be impossible. Unfortunately that means back to Linux sigh.
Not been following the thread really so excuse if this has already
been mentioned ;-)
There is a zpool freeze <pool> which stops spa_sync() from doing
anything, so that the only way to record changes is on the ZIL.
I don't use EC2 or any of the other Amazon "cloud" stuff, but I'd assume
you could even have another chunk of block storage as a dedicated ZIL
and you could pass on snapshotting that. What effect would that have
on the pool while it's "frozen"? Are all writes, sync or not, sent
to ZIL while frozen?

I do have an interest in this as I do run some ESXi hosts, and in
addition to file-level backups, I do take snapshots with vmware.
Knowing that the "quiesce writes" option is actually doing something
and is not just a no-op when running the FreeBSD guest tools would be
nice.

Any arguments that claim "people don't do this" I think are a bit dated,
not only are the Amazon services widely used, but host-your-own
virtualization, be it VMware, Xen, VirtualBox, or Hyper-V is extremely common.
Post by Steven Hartland
The comment in the zpool_main is: "'freeze' is a vile debugging
abomination" so it's evil but might be what you want if you're up to
writing some code.
Anything going on with this on the Illumos or ZFS on Linux side?

Charles
Post by Steven Hartland
For more info have a look at ztest.
Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
Berend de Boer
2013-07-04 00:55:36 UTC
Permalink
Steven> Not been following the thread really so excuse if this has
Steven> already been mentioned ;-)

Not yet.


Steven> There is a zpool freeze <pool> which stops spa_sync() from
Steven> doing anything, so that the only way to record changes is
Steven> on the ZIL.

Pardon my ignorance, I don't really understand this. First of all, is
there a way to recover from a freeze? I.e. do I need to unfreeze?

What you're saying is that it is a one way street only? I just did
this on my semi-production server to see what is going to
happen. Nothing so far.

I had a look at the code, this is what it calls:

static int
zfs_ioc_pool_freeze(zfs_cmd_t *zc)
{
spa_t *spa;
int error;

error = spa_open(zc->zc_name, &spa, FTAG);
if (error == 0) {
spa_freeze(spa);
spa_close(spa, FTAG);
}
return (error);
}

And spa_freeze is this:

void
spa_freeze(spa_t *spa)
{
uint64_t freeze_txg = 0;

spa_config_enter(spa, SCL_ALL, FTAG, RW_WRITER);
if (spa->spa_freeze_txg == UINT64_MAX) {
freeze_txg = spa_last_synced_txg(spa) + TXG_SIZE;
spa->spa_freeze_txg = freeze_txg;
}
spa_config_exit(spa, SCL_ALL, FTAG);
if (freeze_txg != 0)
txg_wait_synced(spa_get_dsl(spa), freeze_txg);
}

All nicely undocumented code.


Steven> The comment in the zpool_main is: "'freeze' is a vile
Steven> debugging abomination" so it's evil but might be what you
Steven> want if you're up to writing some code.

Yeah!

But thanks for digging this up, I hadn't expected undocumented
commands for zpool!


Steven> For more info have a look at ztest.

Another undocumented tool. How would I use this in this case?

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Will Andrews
2013-07-05 02:36:45 UTC
Permalink
Post by Steven Hartland
Not been following the thread really so excuse if this has already
been mentioned ;-)
There is a zpool freeze <pool> which stops spa_sync() from doing
anything, so that the only way to record changes is on the ZIL.
The comment in the zpool_main is: "'freeze' is a vile debugging
abomination" so it's evil but might be what you want if you're up to
writing some code.
zpool freeze is a debugging-only command, as the comment suggests. It
is not really of much use outside of testing changes to ZIL code.
Once run, the only thing you can do to get normal I/O running again is
to export the pool and import it again.

The point of the command is to ensure that ZIL blocks exist on a pool
when it is exported, so they are guaranteed to have to be replayed on
import. It is used in the STF test suite for the express purpose of
testing the ZIL replay. They write some stuff, freeze the pool, write
some more stuff, export the pool, use zdb to check for ZIL blocks,
then import it and check again, both to see that the changes were
applied, and to see that the ZIL blocks are gone.

Most likely, doing something more like Berend wants requires a
slightly different approach.

--Will.
Berend de Boer
2013-07-08 20:08:57 UTC
Permalink
Will> zpool freeze is a debugging-only command, as the comment
Will> suggests. It is not really of much use outside of testing
Will> changes to ZIL code. Once run, the only thing you can do to
Will> get normal I/O running again is to export the pool and
Will> import it again.

Right, so I might have screwed my server? It's only semi-production,
but you're saying something is wrong with it right now?

Does it not commit stuff to disk anymore? Because I might have screwed
up some EBS snapshot tests as I tested this "zfs freeze" thing a few
times.

--
All the best,

Berend de Boer


------------------------------------------------------
Awesome Drupal hosting: https://www.xplainhosting.com/
Continue reading on narkive:
Loading...