 |
 |
Benchmarks
ZFS on 3 OS
(2012 december)
|
Introduction
The
CISM manage different mass storage systems running an old and no more
updated OS (Solaris 10 10/09 s10x_u8wos_08a X86 with ZFS pool version
10 or Solaris 10 10/08 s10x_u6wos_07b X86 ZFS pool version 15). The
installation and configuration of new server is more and more difficult
because of hardware such as disk controller not supported by a 4 years
old OS. The ZFS
have lots of valuate feature we appreciate such as data integrity, raid
support, virtual storage pools support,...
Transtec IT company rented us a storage system in order to evaluate the
performance and the portability of the ZFS on different OS.
The Hardware
A
SuperMicro X9DR3-F/i with 128GB DDR3 RAM, 2x6 core Xeon
E5-2620@2GHz (HyperThreading OFF, power set up for performance)
We called this server "tdemo".
The disks controler: LSI SAS2308_2 (Bios 7.25.00.00, Firmware
13.00.57.00)
The disks: 2 60GB SSD (ATA-INTELSSDSC2CT06-300i)
18 2TB SAS Toshiba de 2TB (TOSHIBA-MK2001TRKB-0106)
1 100GB
SSD de 100GB (ATA-INTELSSDSA2BZ10-0362)
1
500GB SATA Hitachi (ATA-HITACHIHDS7250S-A20A), our disk on which we
installed the 3 OS.
The 3 Operating Systems
- OpenIndiana SunOS tdemo 5.11
oi_151a5 i86pc i386 i86pc (release Sept 14, 2011) with ZFS pool version
28
- FreeBSD
9.1-RC3 #0 r242324: Tue Oct 30 00:58:57 UTC 2012 amd64(ZFS pool release
28)
- GNU/Linux CentOS 6.3 Linux tdemo.cism.ucl.ac.be
2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09
UTC 2012 x86_64 x86_64 x86_64 GNU/Linux with Native ZFS on Linux Produced at Lawrence
Livermore National Laboratory v0.6.0-rc12. (ZFS
pool release 28)
The ZFS configuration
We choose a raidz2 pool configuration, with 2 sets of 8 disks, 2 spares
disks. The 100GB SSD was configured as a cache disk and the 2 60GB SSD
set in mirror for logs.
see (for OpenIndiana):
root@tdemo:~# zpool create
storage raidz2 c4t5000039438C81EEAd0 c4t5000039438C81F06d0
c4t5000039438C82B52d0 c4t5000039438C82DB2d0 c4t5000039438C820CEd0
c4t5000039438C821DAd0 c4t5000039438C824FEd0 c4t5000039438C830CAd0
root@tdemo:~# zpool add storage
spare c4t5000039438C833BAd0
root@tdemo:~# zpool add storage
raidz2 c4t5000039438C8215Ad0 c4t5000039438C8244Ad0
c4t5000039438C8351Ad0 c4t5000039438C82266d0 c4t5000039438C82442d0
c4t5000039438C82636d0 c4t5000039438C82992d0 c4t5000039438C83516d0
root@tdemo:~# zpool add storage
spare c4t5000039438C83522d0
root@tdemo:~# zpool add storage
cache c3t5001517BB29DFF50d0
root@tdemo:~# zpool add storage
log mirror c3t5001517BB2A1DEA4d0 c3t5001517BB29CF723d0
root@tdemo:~# zfs set
compression=on storage
root@tdemo:~# zpool status storage
pool: storage
state: ONLINE
scan: none requested
config:
NAME
STATE READ WRITE CKSUM
storage
ONLINE 0
0 0
raidz2-0
ONLINE 0
0 0
c4t5000039438C81EEAd0 ONLINE
0 0 0
c4t5000039438C81F06d0 ONLINE
0 0 0
c4t5000039438C82B52d0 ONLINE
0 0 0
c4t5000039438C82DB2d0 ONLINE
0 0 0
c4t5000039438C820CEd0 ONLINE
0 0 0
c4t5000039438C821DAd0 ONLINE
0 0 0
c4t5000039438C824FEd0 ONLINE
0 0 0
c4t5000039438C830CAd0 ONLINE
0 0 0
raidz2-1
ONLINE 0
0 0
c4t5000039438C8215Ad0 ONLINE
0 0 0
c4t5000039438C8244Ad0 ONLINE
0 0 0
c4t5000039438C8351Ad0 ONLINE
0 0 0
c4t5000039438C82266d0 ONLINE
0 0 0
c4t5000039438C82442d0 ONLINE
0 0 0
c4t5000039438C82636d0 ONLINE
0 0 0
c4t5000039438C82992d0 ONLINE
0 0 0
c4t5000039438C83516d0 ONLINE
0 0 0
logs
mirror-2
ONLINE 0
0 0
c3t5001517BB2A1DEA4d0 ONLINE
0 0 0
c3t5001517BB29CF723d0 ONLINE
0 0 0
cache
c3t5001517BB29DFF50d0
ONLINE 0
0 0
spares
c4t5000039438C833BAd0 AVAIL
c4t5000039438C83522d0 AVAIL
The IOZONE benchmark
Iozone is a filesystem benchmark
tool (we used vers 4.1.3) . The benchmark generates and measures a
variety of file operations, but we decided to limit the operations to
the (re-)write (-i 0) and (re-)read ( -i 1 ) of a large file 16GB
(-s16g) by 16KB or 256KB blocks (-r16k or -r256k)
with 1, 10 and 20 threads ( -t 1, -t 10 and -t 20)
We repeated this tests with the ssd cache (used for read optimization)
and without the ssd cache (zpool
offline storage c3t5001517BB29DFF50d0)
iozone -i 0 -i 1 -t 1 -s16g
-r16k -t 1 /storage
iozone -i 0 -i 1 -t 1 -s16g
-r16k -t 1 /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g
-r16k -t 10 /storage
iozone -i 0 -i 1 -t 1 -s16g
-r16k -t 10 /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g
-r16k -t 20 /storage
iozone -i 0 -i 1 -t 1 -s16g
-r16k -t 20 /storage (cache offline)

|
|
write |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
601 |
505 |
491 |
457 |
575 |
622 |
|
|
T=10 |
2342 |
2173 |
1088 |
1123 |
1414 |
1435 |
|
|
T=20 |
2315 |
2579 |
992 |
951 |
1132 |
1151 |
|
|
|
rewrite |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
593
|
702
|
653
|
612
|
717
|
720
|
|
|
T=10 |
629
|
523
|
519
|
502
|
360
|
349
|
|
|
T=20 |
257 |
236
|
245
|
233
|
274
|
249
|
|
|
|
read |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
3061
|
2469
|
2534
|
2486
|
2017
|
2028
|
|
|
T=10 |
4074
|
4026
|
3612
|
3583
|
3293
|
3291
|
|
|
T=20 |
3738
|
4058
|
2910
|
3108
|
3282
|
3375
|
|
|
|
reread |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
2783
|
3154
|
2504
|
2684
|
2177
|
2189
|
|
|
T=10 |
4785
|
4063
|
2872
|
2856
|
3297
|
3200
|
|
|
T=20 |
4057
|
3825
|
3437
|
3070
|
3301
|
3476
|
|
iozone -i 0 -i 1 -t 1 -s16g
-r256k -t 1 /storage
iozone -i 0 -i 1 -t 1 -s16g
-r256k -t 1 /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g
-r256k -t 10 /storage
iozone -i 0 -i 1 -t 1 -s16g
-r256k -t 10 /storage (cache offline)
iozone -i 0 -i 1 -t 1 -s16g
-r256k -t 20 /storage
iozone -i 0 -i 1 -t 1 -s16g
-r256k -t 20 /storage (cache offline)

|
|
write |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
746
|
841
|
720
|
712
|
569 |
477
|
|
|
T=10 |
2900 |
3072
|
1005
|
940
|
1812 |
1613 |
|
|
T=20 |
3340
|
3518
|
1147
|
1155
|
1646 |
1644
|
|
|
|
rewrite |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
827
|
812
|
1117
|
835
|
824
|
904
|
|
|
T=10 |
2875
|
2640
|
1017
|
987
|
2011
|
1860
|
|
|
T=20 |
3146
|
3159
|
966
|
910
|
1997
|
2091
|
|
|
|
read |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
1347
|
1422
|
4827
|
4884
|
4561
|
4762
|
|
|
T=10 |
3957
|
3401
|
4080
|
3542
|
4155
|
3640
|
|
|
T=20 |
4086
|
4077
|
3338
|
3255
|
4348
|
4023
|
|
|
|
reread |
|
|
|
|
|
|
|
|
OpenIndiana |
OI w/o
ssd cache |
FreeBSD 9.1 |
FBSD
w/o ssd cache |
GNU/Linux |
GNU/Linux
w/o ssd cache |
|
16k |
T=1 |
1272
|
1399
|
4877
|
4929
|
4734
|
4680
|
|
|
T=10 |
5482
|
5470
|
3411
|
3655
|
4273
|
5211
|
|
|
T=20 |
4119
|
3959
|
3337
|
3326
|
4427
|
4570
|
|
Observation:
For
the (re-)write operations, OpenIndiana is the best, especially in case
of heavy I/O load (threads t=10 or t=20). GNU/Linux is second
The read benchmarks shows GNU/Linux good results, for large blocks
(-r256k).
OpenIndiana is bad at single (re-)read operation, but is first or
second when the number of threads is important (t=10 or t=20)
The advantage of the cache ssd for the read operations is not clear,
around 3% of speedup but not al the time
The NFS performance with IOZONE
For this test, we used 1, 4 or 10 Linux clients from the Green Cluster,
node001 to node011, Scientific Linux 5.8, on the same Gb ethernet
Switch.
Iozone offers the ability to evaluate the i/o performance of several
client with the -+m filename option, where filename is the
client_list:
-+m filename
Used to specify a filename that will be used to
specify the
clients in a distributed measurement. The file contains one line
for each client. The fields are space delimited. Field 1 is the
client name. Field 2 is the working directory, on the
client,
where Iozone will run. Field 3 is the path to the executable Io-
zone on the client.
We share the ZFS pool in openindiana :
zfs set
sharenfs=rw,anon=0,root=@10.141.0.0/255.255.0.0 storage
or in GNU/Linux:
zfs set
sharenfs='rw=10.141.0.0/16,no_root_squash,async' storage
On the Linux Client, the /etc/fstab contains
10.141.0.111:/storage
/mnt/tdemo/storage nfs
rsize=32768,wsize=32768,hard,async,noauto,noatime,nodiratime,noacl 0 0
iozone -e -s
16g -r16k -t [1|4|10] -i 0 -i 1 -+m clientlist /mnt/tdemo/storage
in [MB/s]
|
1 client, OI server
|
4 clients, OI server
|
10 clients, OI server
|
1 client, Lx server
|
4 clients, Lx server
|
10 clients, Lx server
|
Write
|
54
|
82
|
96
|
53
|
57
|
82
|
Rewrite
|
55
|
81
|
88
|
55
|
57
|
60
|
Read
|
101
|
111
|
111
|
2493
|
2494
|
23924
|
Reread
|
2376
|
8351
|
21291
|
2528
|
2527
|
24370
|
On
a Gigabit ethernet link, the theoretical maximal throughput is 128
MB/s. We can count 20% lost due to normal TCP/IP overhead (so max is
around 100MB/s), For the write operations, OpenIndiana gets 75%, Linux
get 64% of the max throughput, which is very good. The extra high
values for the read and re-read are explained by the nfs cache
capabilities on each client side. Read poor performance with
OpenIndiana server was already observed with the previous local iozone
benchmarks.
The ZFS portability between OS
Is a zpool created into OpenIndiana seen and properly mounted under
GNU/Linux or FreeBSD, Or any other combination (OI <--> Lx, OI
<--> FDSD, Lx <--> FDSD)
The zpool created in OI cannot be imported into Linux or FreeBSD,
due to "imcompatible" version, even if the 3 OS have a ZFS pool version
28.
This is due to the Illuminos
ZFS feature flags concept: http://blog.delphix.com/csiden/files/2012/01/ZFS_Feature_Flags.pdf.
In this context the "zpool version" becomes a legacy concept, and the
number is set to 5000 on existing pools during a "zpool upgrade" run.)
[root@tdemo ~]#
zpool import
pool: storage
id: 13020041761373380056
state:
UNAVAIL
status:
The pool is formatted using an incompatible version.
action:
The pool cannot be imported. Access the pool on a system running
newer software, or
recreate the pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-A5
config:
storage
UNAVAIL newer version
[...]
The zpool created with GNU/Linux or FreeBSD are properly imported
into OpenIndiana, except for the cache and spare disks, which are not
recognized because of a different identifier name (
c3t5001517BB29DFF50d0 in IO is
scsi-SATA_INTEL_SSDSA2BZ1CVLV230400CJ100AGN in Linux and
gptid/c491847f-7c62-22ce-b875-d4e31f58c9df in FreeBSD)
But the data integrity is guaranteed and the cache + spare
can be added by hand.
There is no problem for a zfs import between GNU/Linux <-->
FreeBSD.
The sas2ircu and lsiutil.i386 tools
LSI offers two utilities, available for download, in Solaris, Linux and
FreeBSD : sas2ircu and lsiutil
With this tools, we can easily find the S/N ( or SAS address) of the
disks and the bay number they fit into.
root@tdemo:/root/bin
# ./sas2ircu LIST
LSI
Corporation SAS2 IR Configuration Utility.
Version
11.00.00.00 (2011.08.11)
Copyright (c)
2009-2011 LSI Corporation. All rights reserved.
Adapter Vendor
Device
SubSys SubSys
Index
Type
ID ID Pci
Address Ven
ID Dev ID
-----
------------ ------ ------
----------------- ------ ------
0 SAS2308_2
1000h 87h
00h:04h:00h:00h 1000h 3030h
SAS2IRCU:
Utility Completed Successfully.
SI
Corporation SAS2 IR Configuration Utility.
Version
11.00.00.00 (2011.08.11)
Copyright (c)
2009-2011 LSI Corporation. All rights reserved.
Read
configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller
information
------------------------------------------------------------------------
Controller
type
: SAS2308_2
BIOS
version
: 7.25.00.00
Firmware
version
: 13.00.57.00
Channel
description
: 1 Serial Attached SCSI
Initiator
ID
: 0
Maximum physical
devices
: 1023
Concurrent commands
supported :
10240
Slot
: 2
Segment
: 0
Bus
: 4
Device
: 0
Function
: 0
RAID
Support
: No
------------------------------------------------------------------------
IR Volume
information
------------------------------------------------------------------------
------------------------------------------------------------------------
Physical
device information
------------------------------------------------------------------------
Initiator at
ID #0
Device is a
Hard disk
Enclosure
#
: 2
Slot
#
: 0
SAS
Address
: 5003048-0-01bb-624c
State
: Ready (RDY)
Size
(in MB)/(in
sectors)
: 57241/117231407
Manufacturer
: ATA
Model
Number
: INTEL SSDSC2CT06
Firmware
Revision
: 300i
Serial
No
: CVMP229501YD060AGN
GUID
: N/A
Protocol
: SATA
Drive
Type
: SATA_SSD
Device is a
Hard disk
Enclosure
#
: 2
Slot
#
: 1
SAS
Address
: 5003048-0-01bb-624d
State
: Ready (RDY)
Size
(in MB)/(in
sectors)
: 57241/117231407
Manufacturer
: ATA
Model
Number
: INTEL SSDSC2CT06
Firmware
Revision
: 300i
Serial
No
: CVMP23420040060AGN
GUID
: N/A
Protocol
: SATA
Drive
Type
: SATA_SSD
Device is a
Hard disk
Enclosure
#
: 2
Slot
#
: 2
SAS
Address
: 5000039-4-38c8-3516
State
: Ready (RDY)
Size
(in MB)/(in
sectors)
: 1907729/3907029167
Manufacturer
: TOSHIBA
Model
Number
: MK2001TRKB
Firmware
Revision
: 0106
Serial
No
: 8230A17IFM13
GUID
: N/A
Protocol
: SAS
Drive
Type
: SAS_HDD
[...]
root@tdemo:/root/bin
# ./sas2ircu 0 LOCATE 2:10 ON
The red led of the disk 10 is blinking
[root@tdemo]:[13]:[~/LSIUtil
Kit 1.63/Solaris]:$ ./lsiutil.i386
LSI Logic MPT
Configuration Utility, Version 1.63, June 4, 2009
1 MPT Port
found
Port Name Chip
Vendor/Type/Rev MPT Rev Firmware Rev IOC
1.
mpt_sas0 LSI
Logic 0086 05
200 0d003900 0
Select a
device: [1-1 or 0 to quit] 1
1.
Identify firmware, BIOS, and/or FCode
2.
Download firmware (update the FLASH)
4.
Download/erase BIOS and/or FCode (update the FLASH)
8.
Scan for devices
10.
Change IOC settings (interrupt coalescing)
13.
Change SAS IO Unit settings
16.
Display attached devices
20.
Diagnostics
21.
RAID actions
23.
Reset target
42.
Display operating system names for devices
43.
Diagnostic Buffer actions
45.
Concatenate SAS firmware and NVDATA files
59.
Dump PCI config space
60.
Show non-default settings
61.
Restore default settings
66.
Show SAS discovery errors
69.
Show board manufacturing information
97.
Reset SAS link, HARD RESET
98.
Reset SAS link
99.
Reset port
e
Enable expert mode in menus
p
Enable paged mode
w
Enable logging
Main menu,
select an option: [1-99 or e/p/w or 0 to quit] 99
Resetting
port...
On the kernel
dmesg, we can see:
Dec 14
09:19:25 tdemo scsi: [ID 365881 kern.info]
/pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14
09:19:25 tdemo mpt0 Firmware version v13.0.57.0 (?)
Dec 14
09:19:25 tdemo scsi: [ID 365881 kern.info]
/pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14
09:19:25 tdemo mpt0: IOC Operational.
robustness
For this tests, we removed two disks of the server bays and set a
2x7 raidz2 + 2 spares ZFS pool configuration. The two disks on the
shelf are used for hot-swap of disks.
We removed one or more disks in order to see if the failure is
automatically detected and if a spare is activated for the
replacement.
briefly, We see in the 3 subsections below:
- ZFS under Openindiana do not detect a disk failure directly, but
after a "zpool scrub", the error is seen and the spare is put
automatically in place.
- ZFS under FreeBSD is automatically aware of a disk error ( disk
'REMOVED', pool 'DEGRADED') but the spare disk have to be activated by
hand (zpool replace)
- ZFS under GNU/Linux needs also a "zpool scrub" to take the disk
error into account. Then, the spare have to be activated by hand (zpool
remove & replace),
Even it the spare activation is not automatic under FreeBSD or
Linux, this can be monitored and scripted as the kernel detects the
failure and gives the sas_addr of the faulted disk.
OpenIndiana
I removed the disk from bay12 (labeled 5000039438c81eea )
Dec
14 14:32:51 tdemo scsi: [ID 243001 kern.warning] WARNING:
/pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14 14:32:51 tdemo mptsas_handle_event_sync:
IOCStatus=0x8000, IOCLogInfo=0x31170000
Dec 14 14:32:51 tdemo scsi: [ID 243001 kern.warning] WARNING:
/pci@0,0/pci8086,3c04@2/pci1000,3030@0 (mpt_sas0):
Dec 14 14:32:51 tdemo mptsas_handle_event:
IOCStatus=0x8000, IOCLogInfo=0x31170000
A "zpool status" shows it as 'ONLINE' but the kernel dmesg shows:
Dec
14 14:33:35 tdemo scsi: [ID 107833 kern.warning] WARNING:
/pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c81eea,0
(sd17):
Dec 14 14:33:35 tdemo drive offline
Dec 14 14:33:35 tdemo scsi: [ID 107833 kern.warning] WARNING:
/pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c81eea,0
(sd17):
Dec 14 14:33:35 tdemo drive offline
The a "zpool scrub storage" change the "ONLINE" into
"UNAVAIL"
NAME
STATE READ WRITE CKSUM
storage
DEGRADED 0
0 0
raidz2-0
DEGRADED 0
0 0
spare-0
UNAVAIL 0
0 0
c4t5000039438C81EEAd0 UNAVAIL
0 0 0 cannot open
[...]
spares
c4t5000039438C830CAd0
INUSE currently in use
c4t5000039438C83522d0 AVAIL
dmesg:
Dec
14 14:34:06 tdemo fmd: [ID 377184 daemon.error] SUNW-MSG-ID:
ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
Dec 14 14:34:06 tdemo EVENT-TIME: Fri Dec 14 14:34:06 CET 2012
Dec 14 14:34:06 tdemo PLATFORM: X9DR3-F, CSN: 1234567890, HOSTNAME:
tdemo
Dec 14 14:34:06 tdemo SOURCE: zfs-diagnosis, REV: 1.0
Dec 14 14:34:06 tdemo EVENT-ID: c7cecf37-4b42-4c58-e666-bba09d044eba
Dec 14 14:34:06 tdemo DESC: A ZFS device failed. Refer to
http://illumos.org/msg/ZFS-8000-D3 for more information.
Dec 14 14:34:06 tdemo AUTO-RESPONSE: No automated response will occur.
Dec 14 14:34:06 tdemo IMPACT: Fault tolerance of the pool may be
compromised.
Dec 14 14:34:06 tdemo REC-ACTION: Run 'zpool status -x' and replace the
bad device.
The fault manager daemon (fmd) is a rich feature of
(Open)Solaris/OpenIndiana. fmd receives telemetry information relating
to problems detected by the system software, diagnoses these
problems, and initiates proactive self-healing activities
such as disabling faulty components. When appropriate, the fault
manager also sends a message to the syslogd.
It can be trapped by SNMP
or a nagios
agent.
"format" instruction shows the removed disk as "type unknown":
4.
c4t5000039438C81EEAd0 <drive type unknown>
/pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c81eea,0
There are 19 entries.
I insert a new disk (disk18):
Dec 14
14:41:25 tdemo scsi: [ID 583861 kern.info] sd14 at mpt_sas1:
unit-address w5000039438c8351a,0: w5000039438c8351a,0
Dec 14
14:41:25 tdemo genunix: [ID 936769 kern.info] sd14 is
/pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c8351a,0
Dec 14
14:41:25 tdemo genunix: [ID 408114 kern.info]
/pci@0,0/pci8086,3c04@2/pci1000,3030@0/iport@f0/disk@w5000039438c8351a,0
(sd14) online
and "format" detects 20 entries... no need of an LSIutil reset.
FreeBSD:
As the 500GB Hitachi disk containing the OS is placed into the bay 23,
at the end of the sata chain and FreeBSD notes its / as /dev/da21s4a
into /etc/fstab, it cannot boot with the 2 disk18 disk19 removed.
So we put all the disks into the bays.
With "gpart
list", we can determine the gptid of the disk18 and disk19 and
create a 2x7 raidz2 + 2 spare zpool.
zpool
create -f storage raidz2 gptid/9405aeb5-57d0-ab40-97d0-506842ca2756
gptid/7ed3ee32-f475-7c46-b5f4-96c06977dea5
gptid/caf99fbd-beaf-7a43-a5f4-021acaed081e
gptid/f2c05018-e106-944b9e2e-e9b8505d3849
gptid/9604ccc7-9226-7b49-b163-e81b2679889b
gptid/cec867eb-4c44-6d44-90d8-559de51529b4
gptid/97e6f52c-f424-2f4c-a37e-3974710d3b1f
zpool add -f storage spare gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
zpool add -f storage raidz2 gptid/23542ce8-6c03-aa4c-b852-847d491e03de
gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
gptid/e52b91be-879b-7545-81c9-189713806764
gptid/881dc14b-1b47-a341baf0-7b2c967fb772
gptid/5672be7f-a220-a546-92d9-59a7bfe40e60
gptid/84d83cb9-9993-174c-a244-a2ee558d81d9
gptid/703ed4d5-5c09-3645-bba4-50a3e4a09fa0
zpool add -f storage spare gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
remain:
da16:
gptid/3a876b92-df40-6145-87b8-b12d2879ce4c
da17: gptid/eda06dff-8e9c-ac47-8975-69ae4fbab0b6
da18 (55G) gptid/8ba5e745-d877-9a43-b07f-4bc4d7a1c9b5
da19 (55G) gptid/b80b4ba6-b16d-0340-bec1-f1df69d720b0
da20 (93G) gptid/2eac9798-78b4-294c-800c-468e72859461
I remove disk from the bay 19, which is not a member of the storage
pool:
mps0:
mpssas_alloc_tm freezing simq
mps0: mpssas_remove_complete on handle 0x001e, IOCStatus= 0x0
mps0: mpssas_free_tm releasing simq
(da16:(pass18:mps0:0:mps0:0:28:28:0): lost device - 0 outstanding, 1
refs 0): passdevgonecb: devfs entry is gone
(da16:mps0:0:28:0): removing device entry
When I put it back:
a16
at mps0 bus 0 scbus0 target 28 lun 0
da16: <TOSHIBA MK2001TRKB 0106> Fixed Direct Access SCSI-5 device
da16: 600.000MB/s transfers
da16: Command Queueing enabled
da16: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
We can see it's dynamic.
Now I do the same test with disk12, which is a spare member of the
sotrage zpool.
mps0:
mpssas_alloc_tm freezing simq
mps0: mpssas_remove_complete on handle 0x0017, IOCStatus= 0x0
mps0: mpssas_free_tm releasing simq
(da15:(pass17:mps0:0:mps0:0:27:27:0): lost device - 0 outstanding, 2
refs 0): passdevgonecb: devfs entry is gone
(da15:mps0:0:27:0): removing device entry
"zpool status" give directly the correct 'REMOVED' status,
without the need of a "zpool scrub storage"
spares
gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
AVAIL
16399849032075385311
REMOVED was /dev/gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
When I put back the disk12, it stays into 'REMOVED' state
da15
at mps0 bus 0 scbus0 target 27 lun 0
da15: <TOSHIBA MK2001TRKB 0106> Fixed Direct Access SCSI-5 device
da15: 600.000MB/s transfers
da15: Command Queueing enabled
da15: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
still 'REMOVED':
spares
gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
AVAIL
16399849032075385311
REMOVED was /dev/gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
So I have to "zpool remove" it before add it back!
root@tdemo:/root
# zpool replace storage gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
cannot replace gptid/1afa5104-d70e-9848-97e6-1840b74ed9df with
gptid/1afa5104-d70e-9848-97e6-1840b74ed9df: device is reserved as a hot
spare
root@tdemo:/root # zpool remove storage
gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
root@tdemo:/root # zpool add -f storage spare
gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
For a third test with FreeBSD, I decided the remove disk6, a member of
the raidz-1 of storage zpool:
The zpool is directly tagged as 'DEGRADED' with the disk 'REMOVED'
raidz2-1
DEGRADED 0
0 0
gptid/23542ce8-6c03-aa4c-b852-847d491e03de
ONLINE 0
0 0
4717764212357227262
REMOVED 0
0 0 was
/dev/gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
[...]
Unfortunately, a spare disk do not remplace it, so I've to do it by
hand:
root@tdemo:/root
# zpool replace storage gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
raidz2-1
DEGRADED 0
0 0
gptid/23542ce8-6c03-aa4c-b852-847d491e03de
ONLINE 0
0 0
spare-1
REMOVED 0
0 0
4717764212357227262
REMOVED 0
0 0 was
/dev/gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
ONLINE 0
0 0
gptid/e52b91be-879b-7545-81c9-189713806764
ONLINE 0
0 0
gptid/881dc14b-1b47-a341-baf0-7b2c967fb772
ONLINE 0
0 0
gptid/5672be7f-a220-a546-92d9-59a7bfe40e60
ONLINE 0
0 0
gptid/84d83cb9-9993-174c-a244-a2ee558d81d9
ONLINE 0
0 0
gptid/703ed4d5-5c09-3645-bba4-50a3e4a09fa0
ONLINE 0
0 0
spares
1527390717547328893
INUSE was
/dev/gptid/04034a3c-aea7-b64d-9c92-50ce1e3e844b
gptid/1afa5104-d70e-9848-97e6-1840b74ed9df
AVAIL
I declass the removed disk6 out of the zpool:
zpool
offline storage gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
zpool detach storage gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
I can insert it back, even in an other free bay (bay21):
da9
at mps0 bus 0 scbus0 target 21 lun 0
da9: <TOSHIBA MK2001TRKB 0106> Fixed Direct Access SCSI-5 device
da9: 600.000MB/s transfers
da9: Command Queueing enabled
da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
And set it as a spare disk:
zpool
add -f storage spare gptid/f1698532-5e87-2342-a273-d1cdf4aed9d5
GNU/Linux:
I removed One disk ( sata label WWN: 5000039438C81EE8) This is the
first disk of the raidz2-0.
The kernel see it's removed:
sd 0:0:12:0: [sdm]
Synchronizing SCSI cache
sd 0:0:12:0:
[sdm] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
mpt2sas0:
removing handle(0x0017), sas_addr(0x5000039438c81eea)
But the scsi-35000039438c81ee8 remains labeled as 'ONLINE' until the
next "zpool
scrub storage". Then it's set as ''UNAVAIL' and the storage pool
becomes 'DEGRADED'.
No spare is automatically used as the removed disk. I've to do the
operation by hand:
I remove one spare and replace the "UNAVAIL" disk by it:
zpool remove
storage scsi-35000039438c830c8
zpool replace
storage scsi-35000039438c81ee8 scsi-35000039438c830c8
The zpool is no more into degraded state.
When I put a "new" disk (WWN: 5000039438C83518) into a free bay, the
kernel discovers it:
scsi
0:0:22:0: Direct-Access TOSHIBA
MK2001TRKB 0106 PQ: 0 ANSI: 5
scsi 0:0:22:0: SSP: handle(0x0021), sas_addr(0x5000039438c8351a),
phy(33), devi
ce_name(0x5000039438c8351a)
scsi 0:0:22:0: SSP: enclosure_logical_id(0x5003048001bb627f), slot(21)
scsi 0:0:22:0: qdepth(254), tagged(1), simple(1), ordered(0),
scsi_level(6), cm
d_que(1)
sd 0:0:22:0: Attached scsi generic sg12 type 0
sd 0:0:22:0: [sdm] Spinning up disk...................ready
sd 0:0:22:0: [sdm] 3907029168 512-byte logical blocks: (2.00 TB/1.81
TiB)
sd 0:0:22:0: [sdm] Write Protect is off
sd 0:0:22:0: [sdm] Mode Sense: d3 00 00 08
sd 0:0:22:0: [sdm] Write cache: enabled, read cache: enabled, doesn't
support D
PO or FUA
sdm: sdm1 sdm9
sd 0:0:22:0: [sdm] Attached SCSI disk
zpool add -f storage spare scsi-35000039438c83518
zpool status
[...]
spares
scsi-35000039438c83520
AVAIL
scsi-35000039438c83518
AVAIL
errors: No known data errors
I can add it as a spare:
zpool
add -f storage spare scsi-35000039438c83518
zpool status
[...]
spares
scsi-35000039438c83520
AVAIL
scsi-35000039438c83518
AVAIL
errors: No known data errors
Attention:
Les résultats présentés sur cette
page ne sont donnés qu'à titre indicatif et pour un usage privé. Le
CISM décline toute responsabilité en cas de non reproductibilité de
certains benchmarks.