After checking the status of one of my zpools today, I was faced with the following:
root@server: zpool status -v myPool
pool: myPool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed after 3h6m with 0 errors on Tue Sep 28 11:15:11 2010
config:
NAME STATE READ WRITE CKSUM
myPool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c6t7d0 ONLINE 0 0 0
c6t8d0 ONLINE 0 0 0
spare ONLINE 0 0 0
c6t9d0 ONLINE 54 0 0
c6t36d0 ONLINE 0 0 0
c6t10d0 ONLINE 0 0 0
c6t11d0 ONLINE 0 0 0
c6t12d0 ONLINE 0 0 0
spares
c6t36d0 INUSE currently in use
c6t37d0 AVAIL
c6t38d0 AVAIL
errors: No known data errors
From what I can see, c6t9d0 has encountered 54 write errors. It seems as though it has automatically resilvered with the spare disk c6t36d0, which is now currently in use.
My question is, where exactly am I at? Yes the 'action' tells me to determine whether or not the disk needs replacing, but is this disk currently still in use? Can I replace / remove it?
Any explanation would be much appreciated as I'm quite new to this stuff :)
update: After following the advice from C10k Consulting, ie detaching:
zpool detach myPool c6t9d0
and adding as a spare:
zpool add myPool spare c6t9d0
It appears as though all is well. The new status of my zpool is:
root@server: zpool status -v myPool
pool: myPool
state: ONLINE
scrub: resilver completed after 3h6m with 0 errors on Tue Sep 28 11:15:11 2010
config:
NAME STATE READ WRITE CKSUM
muPool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c6t7d0 ONLINE 0 0 0
c6t8d0 ONLINE 0 0 0
c6t36d0 ONLINE 0 0 0
c6t10d0 ONLINE 0 0 0
c6t11d0 ONLINE 0 0 0
c6t12d0 ONLINE 0 0 0
spares
c6t37d0 AVAIL
c6t38d0 AVAIL
c6t9d0 AVAIL
errors: No known data errors
Thanks for your help c10k consulting :)
-
zpool remove myPool c6t37d0
zpool replace myPool c6t9d0 c6t37d0
This will make one of your hot spares usable as a normal disk (c6t37d0) and then replace the bad disk (c6t9d0) with the now free disk (c6t37d0) Once everyhting is happy physically replace c6t9d0 and then :
zppol add myPool spare c6t9d0
And you will be back to a happy setup with 3 available hot spares.
Or you can simply pull c6t9d0 and let c6t36d0 take its place by issuing :
zpool detach myPool c6t9d0
And then replace c6t9d0 and re-add it as a spare.
JT.WK : Thanks very much for the response. Could tell me what c6t36d0 is currently doing? Why can I not 'zpool replace the damaged disk with c6t36d0? Cheersc10k Consulting : ZFS is using it for read/write to it by the looks of things (the ZFS tools are sadly lacking sometimes in what they tell you) and it looks like its maintaining c6t9d0 as a read/write mirror too. as for can you simply replace c6t9d0 with c6t36d0; `zpool detach myPool c6t9d0` will do just that ;-)JT.WK : Cheers mate - will give the detach a go. Thanks againFrom c10k Consulting -
Sounds like you are just scratching the surface in terms of managing ZFS storage.
Suggest using these 2 links and I think you will pick up some additional data points to get you going:
For managing zpools: http://docs.huihoo.com/opensolaris/solaris-zfs-administration-guide/html/ch04s04.html
General ZFS Admin reference: http://www.filibeto.org/~aduritz/truetrue/solaris10/zfsadminguide-html/toc.html
There are many more but these 2 stuck out in my mind for your particular topic.
JT.WK : thanks mate - will be sure to look into them :)From mxmader -
(I only wanted to comment, but don't have the points.) Just in case JT.WK wanted to know where in the docs this is, this is very similar to "Oracle® Solaris ZFS Administration Guide" page 88 "Activating and Deactivating Hot Spares in Your Storage Pool". I'm still new to ZFS, and the Admin Guide helps me a lot.
JT.WK : Thanks Scott - much appreciated.From Scott McClenning
0 comments:
Post a Comment