Rebuild (resilvering) time on NX2 /ZX is slow

  1. RAIDz resilvering is very slow in \ZFS-based zpools.  Basically, it starts with every transaction that's ever happened in the pool and plays them back one-by-one to the new drive. This is very IO-intensive. If you're using hard drives larger than 1TB. From a certain point of view, one might think that RAIDz's only legitimate use case is to use all-SSD pools for faster rebuilds.
  2. RAID-Z2 is slower then a RAID-Z1, as it offers a very nice balance between capacity and redundancy. Wider RAID-Z2 VDEVs are more space efficient, but it is also clear that resilver operations take longer by a factor of 2x due to double the parity. Because RAID-Z2 can tolerate the loss of two drives, so longer resilver times are a reasonable tradeoff.
  3. The other reason could be the type of files, many smaller files can take longer then few larger file.

Rebuild speed
After a drive goes bad and is replaced, the data from the bad drive needs to be regenerated onto the new drive. This process is typically called rebuild, but ZFS calls it resilvering. There are two significant metrics:

  • Rebuild speed, measured in megabytes per second.
  • Rebuild time, amount of time required to rebuild all the missing data.

and two significant considerations:

  1. Rebuild speed for traditional RAID is much faster. However, traditional RAID has to rebuild both used and free blocks.
  2. Rebuild speed for ZFS RAIDZ is slower. However, RAIDZ only needs to rebuild blocks that do hold data. RAIDZ does not rebuild the empty blocks, thus completing rebuilds faster when pool has significant free space on it.
Tags: ZFS
Roger Beck
2020-04-13 12:54
Return to