cism
logo
Benchmarks
ZFS on 3 OS

(2012 december)


Introduction

The CISM manage different mass storage systems running an old and no more updated OS (Solaris 10 10/09 s10x_u8wos_08a X86 with ZFS pool version 10 or Solaris 10 10/08 s10x_u6wos_07b X86 ZFS pool version 15). The installation and configuration of new server is more and more difficult because of hardware such as disk controller not supported by a 4 years old OS.  The ZFS have lots of valuate feature we appreciate such as data integrity, raid support, virtual storage pools support,...
Transtec IT company rented us a storage system in order to evaluate the performance and the portability of the ZFS on different OS.

The Hardware

A SuperMicro X9DR3-F/i with 128GB DDR3 RAM, 2x6 core Xeon E5-2620@2GHz  (HyperThreading OFF, power set up for performance)
We called this server "tdemo".
The disks controler: LSI SAS2308_2 (Bios 7.25.00.00, Firmware 13.00.57.00)
The disks: 2 60GB SSD (ATA-INTELSSDSC2CT06-300i)
                 18 2TB SAS Toshiba de 2TB (TOSHIBA-MK2001TRKB-0106)
                 1 100GB SSD de 100GB (ATA-INTELSSDSA2BZ10-0362)
                 1  500GB SATA Hitachi (ATA-HITACHIHDS7250S-A20A), our disk on which we installed the 3 OS.

The 3 Operating Systems

  1. OpenIndiana SunOS tdemo 5.11 oi_151a5 i86pc i386 i86pc (release Sept 14, 2011) with ZFS pool version 28
  2. FreeBSD 9.1-RC3 #0 r242324: Tue Oct 30 00:58:57 UTC 2012 amd64(ZFS pool release 28)
  3. GNU/Linux CentOS 6.3 Linux tdemo.cism.ucl.ac.be 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09
    UTC 2012 x86_64 x86_64 x86_64 GNU/Linux with Native ZFS on Linux Produced at Lawrence Livermore National Laboratory v0.6.0-rc12. (ZFS pool release 28)

The ZFS configuration

We choose a raidz2 pool configuration, with 2 sets of 8 disks, 2 spares disks. The 100GB SSD was configured as a cache disk and the 2 60GB SSD set in mirror for logs.
see (for OpenIndiana):
root@tdemo:~# zpool create storage raidz2 c4t5000039438C81EEAd0 c4t5000039438C81F06d0 c4t5000039438C82B52d0 c4t5000039438C82DB2d0 c4t5000039438C820CEd0 c4t5000039438C821DAd0 c4t5000039438C824FEd0 c4t5000039438C830CAd0
root@tdemo:~# zpool add storage spare c4t5000039438C833BAd0
root@tdemo:~# zpool add storage raidz2 c4t5000039438C8215Ad0 c4t5000039438C8244Ad0  c4t5000039438C8351Ad0 c4t5000039438C82266d0 c4t5000039438C82442d0 c4t5000039438C82636d0 c4t5000039438C82992d0 c4t5000039438C83516d0
root@tdemo:~# zpool add storage spare c4t5000039438C83522d0
root@tdemo:~# zpool add storage cache c3t5001517BB29DFF50d0
root@tdemo:~# zpool add storage log mirror c3t5001517BB2A1DEA4d0 c3t5001517BB29CF723d0
root@tdemo:~# zfs set compression=on storage
root@tdemo:~# zpool status storage
  pool: storage
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        storage                    ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c4t5000039438C81EEAd0  ONLINE       0     0     0
            c4t5000039438C81F06d0  ONLINE       0     0     0
            c4t5000039438C82B52d0  ONLINE       0     0     0
            c4t5000039438C82DB2d0  ONLINE       0     0     0
            c4t5000039438C820CEd0  ONLINE       0     0     0
            c4t5000039438C821DAd0  ONLINE       0     0     0
            c4t5000039438C824FEd0  ONLINE       0     0     0
            c4t5000039438C830CAd0  ONLINE       0     0     0
          raidz2-1                 ONLINE       0     0     0
            c4t5000039438C8215Ad0  ONLINE       0     0     0
            c4t5000039438C8244Ad0  ONLINE       0     0     0
            c4t5000039438C8351Ad0  ONLINE       0     0     0
            c4t5000039438C82266d0  ONLINE       0     0     0
            c4t5000039438C82442d0  ONLINE       0     0     0
            c4t5000039438C82636d0  ONLINE       0     0     0
            c4t5000039438C82992d0  ONLINE       0     0     0
            c4t5000039438C83516d0  ONLINE       0     0     0
        logs
          mirror-2                 ONLINE       0     0     0
            c3t5001517BB2A1DEA4d0  ONLINE       0     0     0
            c3t5001517BB29CF723d0  ONLINE       0     0     0
        cache
          c3t5001517BB29DFF50d0    ONLINE       0     0     0
        spares
          c4t5000039438C833BAd0    AVAIL  
          c4t5000039438C83522d0    AVAIL  


The IOZONE benchmark

Iozone  is a filesystem benchmark tool (we used vers 4.1.3) . The benchmark generates and measures a variety of file operations, but we decided to limit the operations to the (re-)write (-i 0) and (re-)read ( -i 1 ) of a large file 16GB (-s16g) by 16KB or 256KB blocks (-r16k or -r256k)
with 1, 10 and 20 threads ( -t 1, -t 10 and -t 20)
We repeated this tests with the ssd cache (used for read optimization) and without the ssd cache (zpool offline storage c3t5001517BB29DFF50d0)


iozone -i 0 -i 1 -t 1 -s16g -r16k  -t 1  /storage
iozone -i 0 -i 1 -t 1 -s16g -r16k  -t 1  /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g -r16k  -t 10 /storage
iozone -i 0 -i 1 -t 1 -s16g -r16k  -t 10 /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g -r16k  -t 20 /storage
iozone -i 0 -i 1 -t 1 -s16g -r16k  -t 20 /storage (cache offline)

fig1


write







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 601 505 491 457 575 622

T=10 2342 2173 1088 1123 1414 1435

T=20 2315 2579 992 951 1132 1151


fig2


rewrite







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 593
702
653
612
717
720


T=10 629
523
519
502
360
349


T=20 257 236
245
233
274
249


fig3


read







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 3061
2469
2534
2486
2017
2028


T=10 4074
4026
3612
3583
3293
3291


T=20 3738
4058
2910
3108
3282
3375


fig4


reread







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 2783
3154
2504
2684
2177
2189


T=10 4785
4063
2872
2856
3297
3200


T=20 4057
3825
3437
3070
3301
3476


iozone -i 0 -i 1 -t 1 -s16g -r256k  -t 1  /storage
iozone -i 0 -i 1 -t 1 -s16g -r256k  -t 1  /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g -r256k  -t 10 /storage
iozone -i 0 -i 1 -t 1 -s16g -r256k  -t 10 /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g -r256k  -t 20 /storage
iozone -i 0 -i 1 -t 1 -s16g -r256k  -t 20 /storage (cache offline)
fig5


write







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 746
841
720
712
569 477


T=10 2900 3072
1005
940
1812 1613

T=20 3340
3518
1147
1155
1646 1644


fig6


rewrite







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 827
812
1117
835
824
904


T=10 2875
2640
1017
987
2011
1860


T=20 3146
3159
966
910
1997
2091

fig7



read







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 1347
1422
4827
4884
4561
4762


T=10 3957
3401
4080
3542
4155
3640


T=20 4086
4077
3338
3255
4348
4023


fig8


reread







OpenIndiana OI w/o ssd cache FreeBSD 9.1 FBSD w/o ssd cache GNU/Linux GNU/Linux w/o ssd cache
16k T=1 1272
1399
4877
4929
4734
4680


T=10 5482
5470
3411
3655
4273
5211


T=20 4119
3959
3337
3326
4427
4570


Observation:

For the (re-)write operations, OpenIndiana is the best, especially in case of heavy I/O load (threads t=10 or  t=20). GNU/Linux is second
The read benchmarks shows GNU/Linux good results, for large blocks (-r256k).
OpenIndiana is bad at single (re-)read operation, but is first or second when the number of threads is important (t=10 or t=20)
The advantage of the cache ssd for the read operations is not clear, around 3% of speedup but not al the time

The NFS performance with IOZONE

For this test, we used 1, 4 or 10 Linux clients from the Green Cluster, node001 to node011, Scientific Linux 5.8, on the same Gb ethernet Switch.
Iozone offers the ability to evaluate the i/o performance of several client with the -+m filename  option, where filename is  the client_list:

       -+m filename
              Used to specify a filename that will  be  used  to  specify  the
              clients in a distributed measurement. The file contains one line
              for each client. The fields are space delimited. Field 1 is  the
              client  name.  Field  2 is the working directory, on the client,
              where Iozone will run. Field 3 is the path to the executable Io-
              zone on the client.


We share the ZFS pool in openindiana :
zfs set sharenfs=rw,anon=0,root=@10.141.0.0/255.255.0.0 storage
or in GNU/Linux:
zfs set sharenfs='rw=10.141.0.0/16,no_root_squash,async' storage

On the Linux Client, the /etc/fstab contains
10.141.0.111:/storage /mnt/tdemo/storage nfs rsize=32768,wsize=32768,hard,async,noauto,noatime,nodiratime,noacl 0 0

iozone -e -s 16g -r16k -t  [1|4|10] -i 0 -i 1 -+m clientlist /mnt/tdemo/storage


 in [MB/s]
1 client, OI server
4 clients, OI server
10 clients, OI server
1 client, Lx server
4 clients, Lx server
10 clients, Lx server
Write
54
82
96
53
57
82
Rewrite
55
81
88
55
57
60
Read
101
111
111
2493
2494
23924
Reread
2376
8351
21291
2528
2527
24370

On a Gigabit ethernet link, the theoretical maximal throughput is 128 MB/s. We can count 20% lost due to normal TCP/IP overhead (so max is around 100MB/s), For the write operations, OpenIndiana gets 75%, Linux get 64% of the max throughput, which is very good. The extra high values for the read and re-read are explained by the nfs cache capabilities on each client side. Read poor performance with OpenIndiana server was already observed with the previous local iozone benchmarks.

The ZFS portability between OS

Is a zpool created into OpenIndiana seen and properly mounted under GNU/Linux or FreeBSD, Or any other combination (OI <--> Lx, OI <--> FDSD, Lx <--> FDSD)

The zpool created in OI cannot be imported into Linux or FreeBSD, due to "imcompatible" version, even if the 3 OS have a ZFS pool version 28.
This is due to  the  Illuminos ZFS feature flags concept: http://blog.delphix.com/csiden/files/2012/01/ZFS_Feature_Flags.pdf. In this context the "zpool version" becomes a legacy concept, and the number is set to 5000 on existing pools during a "zpool upgrade" run.)

[root@tdemo ~]# zpool import
   pool: storage
     id: 13020041761373380056
  state: UNAVAIL
 status: The pool is formatted using an incompatible version.
 action: The pool cannot be imported.  Access the pool on a system running newer software, or recreate the pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-A5
 config:

    storage                                         UNAVAIL  newer version

 [...]

The zpool created with GNU/Linux or FreeBSD are properly imported into OpenIndiana, except for the cache and spare disks, which are not recognized because of a different identifier name ( c3t5001517BB29DFF50d0 in IO is scsi-SATA_INTEL_SSDSA2BZ1CVLV230400CJ100AGN in Linux and gptid/c491847f-7c62-22ce-b875-d4e31f58c9df in FreeBSD)

  But the data integrity is guaranteed and the cache + spare can be added by hand.

There is no problem for a zfs import between GNU/Linux <--> FreeBSD.

The sas2ircu and lsiutil.i386 tools

LSI offers two utilities, available for download, in Solaris, Linux and FreeBSD : sas2ircu and lsiutil
With this tools, we can easily find the S/N ( or SAS address) of the disks and the bay number they fit into.

root@tdemo:/root/bin # ./sas2ircu LIST
LSI Corporation SAS2 IR Configuration Utility.
Version 11.00.00.00 (2011.08.11)
Copyright (c) 2009-2011 LSI Corporation. All rights reserved.


         Adapter      Vendor  Device                       SubSys  SubSys
 Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
 -----  ------------  ------  ------  -----------------    ------  ------
   0     SAS2308_2     1000h    87h   00h:04h:00h:00h      1000h   3030h
SAS2IRCU: Utility Completed Successfully.

SI Corporation SAS2 IR Configuration Utility.
Version 11.00.00.00 (2011.08.11)
Copyright (c) 2009-2011 LSI Corporation. All rights reserved.

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS2308_2
  BIOS version                            : 7.25.00.00
  Firmware version                        : 13.00.57.00
  Channel description                     : 1 Serial Attached SCSI
  Initiator ID                            : 0
  Maximum physical devices                : 1023
  Concurrent commands supported           : 10240
  Slot                                    : 2
  Segment                                 : 0
  Bus                                     : 4
  Device                                  : 0
  Function                                : 0
  RAID Support                            : No
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #0

Device is a Hard disk
  Enclosure #                             : 2
  Slot #                                  : 0
  SAS Address                             : 5003048-0-01bb-624c
  State                                   : Ready (RDY)
  Size (in MB)/(in sectors)               : 57241/117231407
  Manufacturer                            : ATA
  Model Number                            : INTEL SSDSC2CT06
  Firmware Revision                       : 300i
  Serial No                               : CVMP229501YD060AGN
  GUID                                    : N/A
  Protocol                                : SATA
  Drive Type                              : SATA_SSD
Device is a Hard disk
  Enclosure #                             : 2
  Slot #                                  : 1
  SAS Address                             : 5003048-0-01bb-624d
  State                                   : Ready (RDY)
  Size (in MB)/(in sectors)               : 57241/117231407
  Manufacturer                            : ATA
  Model Number                            : INTEL SSDSC2CT06
  Firmware Revision                       : 300i
  Serial No                               : CVMP23420040060AGN
  GUID                                    : N/A
  Protocol                                : SATA
  Drive Type                              : SATA_SSD

Device is a Hard disk
  Enclosure #                             : 2
  Slot #                                  : 2
  SAS Address                             : 5000039-4-38c8-3516
  State                                   : Ready (RDY)
  Size (in MB)/(in sectors)               : 1907729/3907029167
  Manufacturer                            : TOSHIBA
  Model Number                            : MK2001TRKB
  Firmware Revision                       : 0106
  Serial No                               : 8230A17IFM13
  GUID                                    : N/A
  Protocol                                : SAS
  Drive Type                              : SAS_HDD

[...]



root@tdemo:/root/bin # ./sas2ircu 0 LOCATE 2:10 ON

The red led of the  disk 10 is blinking


[root@tdemo]:[13]:[~/LSIUtil Kit 1.63/Solaris]:$ ./lsiutil.i386

LSI Logic MPT Configuration Utility, Version 1.63, June 4, 2009

1 MPT Port found

     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  mpt_sas0          LSI Logic 0086 05         200      0d003900     0

Select a device:  [1-1 or 0 to quit] 1

 1.  Identify firmware, BIOS, and/or FCode
 2.  Download firmware (update the FLASH)
 4.  Download/erase BIOS and/or FCode (update the FLASH)
 8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
23.  Reset target
42.  Display operating system names for devices
43.  Diagnostic Buffer actions
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
 e   Enable expert mode in menus
 p   Enable paged mode
 w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 99

Resetting port...

On the kernel dmesg, we can see:

Dec 14 09:19:25 tdemo scsi: [ID 365881 kern.info] /pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14 09:19:25 tdemo   mpt0 Firmware version v13.0.57.0 (?)
Dec 14 09:19:25 tdemo scsi: [ID 365881 kern.info] /pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14 09:19:25 tdemo   mpt0: IOC Operational.

robustness

For this tests, we removed two disks of the server bays and set a 2x7 raidz2 + 2 spares ZFS pool configuration. The two disks on the shelf are used for hot-swap of disks.
We removed one or more disks in order to see if the failure is automatically detected and if a spare is activated for the replacement. 

briefly, We see in the 3 subsections below:
 

  1. ZFS under Openindiana do not detect a disk failure directly, but after a "zpool scrub", the error is seen and the spare is put automatically in place.
  2. ZFS under FreeBSD is automatically aware of a disk error ( disk 'REMOVED', pool 'DEGRADED') but the spare disk have to be activated by hand (zpool replace)
  3. ZFS under GNU/Linux needs also a "zpool scrub" to take the disk error into account. Then, the spare have to be activated by hand (zpool remove & replace),

Even it the spare activation is not automatic under FreeBSD or Linux, this can be monitored and scripted as the kernel detects the failure and gives the sas_addr of the faulted disk.

OpenIndiana

I removed the disk from bay12 (labeled 5000039438c81eea )
Dec 14 14:32:51 tdemo scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14 14:32:51 tdemo   mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000
Dec 14 14:32:51 tdemo scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14 14:32:51 tdemo   mptsas_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31170000
A "zpool status" shows it as  'ONLINE' but the kernel dmesg shows:
Dec 14 14:33:35 tdemo scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c81eea,0 (sd17):
Dec 14 14:33:35 tdemo   drive offline
Dec 14 14:33:35 tdemo scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c81eea,0 (sd17):
Dec 14 14:33:35 tdemo   drive offline
The a   "zpool scrub storage" change the "ONLINE" into "UNAVAIL"
       NAME                         STATE     READ WRITE CKSUM
        storage                      DEGRADED     0     0     0
          raidz2-0                   DEGRADED     0     0     0
            spare-0                  UNAVAIL      0     0     0
              c4t5000039438C81EEAd0  UNAVAIL      0     0     0  cannot open
[...]
       spares
          c4t5000039438C830CAd0      INUSE     currently in use
          c4t5000039438C83522d0      AVAIL  
dmesg:
Dec 14 14:34:06 tdemo fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
Dec 14 14:34:06 tdemo EVENT-TIME: Fri Dec 14 14:34:06 CET 2012
Dec 14 14:34:06 tdemo PLATFORM: X9DR3-F, CSN: 1234567890, HOSTNAME: tdemo
Dec 14 14:34:06 tdemo SOURCE: zfs-diagnosis, REV: 1.0
Dec 14 14:34:06 tdemo EVENT-ID: c7cecf37-4b42-4c58-e666-bba09d044eba
Dec 14 14:34:06 tdemo DESC: A ZFS device failed.  Refer to http://illumos.org/msg/ZFS-8000-D3 for more information.
Dec 14 14:34:06 tdemo AUTO-RESPONSE: No automated response will occur.
Dec 14 14:34:06 tdemo IMPACT: Fault tolerance of the pool may be compromised.
Dec 14 14:34:06 tdemo REC-ACTION: Run 'zpool status -x' and replace the bad device.

The fault manager daemon (fmd) is a rich feature of (Open)Solaris/OpenIndiana. fmd receives telemetry information relating to problems detected by the system software, diagnoses these  problems,  and  initiates proactive self-healing activities such as disabling faulty components.  When appropriate, the fault manager  also  sends a message to the syslogd.
It can be trapped by SNMP or a nagios agent.

"format" instruction shows the removed disk as "type unknown":
       4. c4t5000039438C81EEAd0 <drive type unknown>
          /pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c81eea,0
There are 19 entries.

I insert a new disk (disk18):
Dec 14 14:41:25 tdemo scsi: [ID 583861 kern.info] sd14 at mpt_sas1: unit-address w5000039438c8351a,0: w5000039438c8351a,0
Dec 14 14:41:25 tdemo genunix: [ID 936769 kern.info] sd14 is /pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c8351a,0
Dec 14 14:41:25 tdemo genunix: [ID 408114 kern.info] /pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c8351a,0 (sd14) online
and "format" detects 20 entries... no need of an  LSIutil reset.

FreeBSD:

As the 500GB Hitachi disk containing the OS is placed into the bay 23, at the end of the sata chain and FreeBSD notes its / as /dev/da21s4a into /etc/fstab, it cannot boot with the 2 disk18 disk19 removed.
So we put all the disks into the bays.
With "gpart list", we can determine the gptid of the disk18 and disk19 and create a 2x7 raidz2 + 2 spare zpool.

zpool create -f storage raidz2 gptid/9405aeb5-57d0-ab40-97d0-506842ca2756 gptid/7ed3ee32-f475-7c46-b5f4-96c06977dea5 gptid/caf99fbd-beaf-7a43-a5f4-021acaed081e gptid/f2c05018-e106-944b9e2e-e9b8505d3849 gptid/9604ccc7-9226-7b49-b163-e81b2679889b gptid/cec867eb-4c44-6d44-90d8-559de51529b4 gptid/97e6f52c-f424-2f4c-a37e-3974710d3b1f
zpool add -f storage spare gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
zpool add -f storage raidz2 gptid/23542ce8-6c03-aa4c-b852-847d491e03de gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5 gptid/e52b91be-879b-7545-81c9-189713806764 gptid/881dc14b-1b47-a341baf0-7b2c967fb772 gptid/5672be7f-a220-a546-92d9-59a7bfe40e60 gptid/84d83cb9-9993-174c-a244-a2ee558d81d9 gptid/703ed4d5-5c09-3645-bba4-50a3e4a09fa0
zpool add -f storage spare gptid/1afa5104-d70e-9848-97e6-1840b74ed9df

remain:
da16: gptid/3a876b92-df40-6145-87b8-b12d2879ce4c
da17: gptid/eda06dff-8e9c-ac47-8975-69ae4fbab0b6
da18 (55G) gptid/8ba5e745-d877-9a43-b07f-4bc4d7a1c9b5
da19 (55G) gptid/b80b4ba6-b16d-0340-bec1-f1df69d720b0
da20 (93G) gptid/2eac9798-78b4-294c-800c-468e72859461

I remove disk from the bay 19, which is not a member of the storage pool:
mps0: mpssas_alloc_tm freezing simq
mps0: mpssas_remove_complete on handle 0x001e, IOCStatus= 0x0
mps0: mpssas_free_tm releasing simq
(da16:(pass18:mps0:0:mps0:0:28:28:0): lost device - 0 outstanding, 1 refs 0): passdevgonecb: devfs entry is gone
(da16:mps0:0:28:0): removing device entry
When I put it back:
a16 at mps0 bus 0 scbus0 target 28 lun 0
da16: <TOSHIBA MK2001TRKB 0106> Fixed Direct Access SCSI-5 device
da16: 600.000MB/s transfers
da16: Command Queueing enabled
da16: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
We can see it's dynamic.

Now I do the same test with disk12, which is a spare member of the sotrage zpool.
mps0: mpssas_alloc_tm freezing simq
mps0: mpssas_remove_complete on handle 0x0017, IOCStatus= 0x0
mps0: mpssas_free_tm releasing simq
(da15:(pass17:mps0:0:mps0:0:27:27:0): lost device - 0 outstanding, 2 refs 0): passdevgonecb: devfs entry is gone
(da15:mps0:0:27:0): removing device entry
 "zpool status" give directly the correct 'REMOVED' status, without the need of a "zpool scrub storage"
    spares
      gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b    AVAIL  
      16399849032075385311                          REMOVED   was /dev/gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
When I put back the disk12, it stays into 'REMOVED' state
da15 at mps0 bus 0 scbus0 target 27 lun 0
da15: <TOSHIBA MK2001TRKB 0106> Fixed Direct Access SCSI-5 device
da15: 600.000MB/s transfers
da15: Command Queueing enabled
da15: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
still  'REMOVED':
    spares
      gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b    AVAIL  
      16399849032075385311                          REMOVED   was /dev/gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
So I have to  "zpool remove" it before add it back!
root@tdemo:/root # zpool replace storage gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
cannot replace gptid/1afa5104-d70e-9848-97e6-1840b74ed9df with gptid/1afa5104-d70e-9848-97e6-1840b74ed9df: device is reserved as a hot spare
root@tdemo:/root # zpool remove storage gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
root@tdemo:/root # zpool add -f storage spare gptid/1afa5104-d70e-9848-97e6-1840b74ed9df

For a third test with FreeBSD, I decided the remove disk6, a member of the raidz-1 of storage zpool:
The zpool is directly tagged as 'DEGRADED' with the disk 'REMOVED'
      raidz2-1                                      DEGRADED     0     0     0
        gptid/23542ce8-6c03-aa4c-b852-847d491e03de  ONLINE       0     0     0
        4717764212357227262                         REMOVED      0     0     0  was /dev/gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
[...]
Unfortunately, a spare disk do not remplace it, so I've to do it by hand:
root@tdemo:/root # zpool replace storage gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5 gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b

      raidz2-1                                        DEGRADED     0     0     0
        gptid/23542ce8-6c03-aa4c-b852-847d491e03de    ONLINE       0     0     0
        spare-1                                       REMOVED      0     0     0
          4717764212357227262                         REMOVED      0     0     0  was /dev/gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
          gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b  ONLINE       0     0     0
        gptid/e52b91be-879b-7545-81c9-189713806764    ONLINE       0     0     0
        gptid/881dc14b-1b47-a341-baf0-7b2c967fb772    ONLINE       0     0     0
        gptid/5672be7f-a220-a546-92d9-59a7bfe40e60    ONLINE       0     0     0
        gptid/84d83cb9-9993-174c-a244-a2ee558d81d9    ONLINE       0     0     0
        gptid/703ed4d5-5c09-3645-bba4-50a3e4a09fa0    ONLINE       0     0     0
    spares
      1527390717547328893                             INUSE     was /dev/gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
      gptid/1afa5104-d70e-9848-97e6-1840b74ed9df      AVAIL  
I declass the removed disk6 out of the zpool:
zpool offline storage gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
zpool detach storage gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
I can insert it back, even in an other free bay (bay21):
da9 at mps0 bus 0 scbus0 target 21 lun 0
da9: <TOSHIBA MK2001TRKB 0106> Fixed Direct Access SCSI-5 device
da9: 600.000MB/s transfers
da9: Command Queueing enabled
da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
And set it as a spare disk:
zpool add -f storage spare gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5

GNU/Linux:

I removed One disk ( sata label WWN: 5000039438C81EE8) This is the first disk of the raidz2-0.
The kernel see it's removed:

sd 0:0:12:0: [sdm] Synchronizing SCSI cache
sd 0:0:12:0: [sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
mpt2sas0: removing handle(0x0017), sas_addr(0x5000039438c81eea)

But the scsi-35000039438c81ee8 remains labeled as 'ONLINE' until the next "zpool scrub storage". Then it's set as ''UNAVAIL' and the storage pool becomes 'DEGRADED'.

No spare is automatically used as the removed disk. I've to do the operation by hand:
I remove one spare and replace the "UNAVAIL" disk by it:

zpool remove storage scsi-35000039438c830c8
zpool replace storage scsi-35000039438c81ee8  scsi-35000039438c830c8

The zpool is no more into degraded state.

When I put a "new" disk (WWN: 5000039438C83518) into a free bay, the kernel discovers it:

scsi 0:0:22:0: Direct-Access     TOSHIBA  MK2001TRKB       0106 PQ: 0 ANSI: 5
scsi 0:0:22:0: SSP: handle(0x0021), sas_addr(0x5000039438c8351a), phy(33), devi
ce_name(0x5000039438c8351a)
scsi 0:0:22:0: SSP: enclosure_logical_id(0x5003048001bb627f), slot(21)
scsi 0:0:22:0: qdepth(254), tagged(1), simple(1), ordered(0), scsi_level(6), cm
d_que(1)
sd 0:0:22:0: Attached scsi generic sg12 type 0
sd 0:0:22:0: [sdm] Spinning up disk...................ready
sd 0:0:22:0: [sdm] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
sd 0:0:22:0: [sdm] Write Protect is off
sd 0:0:22:0: [sdm] Mode Sense: d3 00 00 08
sd 0:0:22:0: [sdm] Write cache: enabled, read cache: enabled, doesn't support D
PO or FUA
 sdm: sdm1 sdm9
sd 0:0:22:0: [sdm] Attached SCSI disk

zpool add -f storage spare scsi-35000039438c83518
zpool status
[...]
    spares
      scsi-35000039438c83520                         AVAIL  
      scsi-35000039438c83518                         AVAIL  
errors: No known data errors

I can add it as a spare:

zpool add -f storage spare scsi-35000039438c83518

zpool status
[...]
    spares
      scsi-35000039438c83520                         AVAIL  
      scsi-35000039438c83518                         AVAIL  

errors: No known data errors

Attention:
Les résultats présentés sur cette page ne sont donnés qu'à titre indicatif et pour un usage privé. Le CISM décline toute responsabilité en cas de non reproductibilité de certains benchmarks.


| UCL | Services | CISM |

Last Update :  2013, Jan 14