Thursday, February 3, 2011

Zpool disk failure - Where am I at?

After checking the status of one of my zpools today, I was faced with the following:

root@server: zpool status -v myPool


pool: myPool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 3h6m with 0 errors on Tue Sep 28 11:15:11 2010
config:

        NAME           STATE     READ WRITE CKSUM
        myPool         ONLINE       0     0     0
          raidz1       ONLINE       0     0     0
            c6t7d0     ONLINE       0     0     0
            c6t8d0     ONLINE       0     0     0
            spare      ONLINE       0     0     0
              c6t9d0   ONLINE      54     0     0
              c6t36d0  ONLINE       0     0     0
            c6t10d0    ONLINE       0     0     0
            c6t11d0    ONLINE       0     0     0
            c6t12d0    ONLINE       0     0     0
        spares
          c6t36d0      INUSE     currently in use
          c6t37d0      AVAIL   
          c6t38d0      AVAIL   

errors: No known data errors

From what I can see, c6t9d0 has encountered 54 write errors. It seems as though it has automatically resilvered with the spare disk c6t36d0, which is now currently in use.

My question is, where exactly am I at? Yes the 'action' tells me to determine whether or not the disk needs replacing, but is this disk currently still in use? Can I replace / remove it?

Any explanation would be much appreciated as I'm quite new to this stuff :)


update: After following the advice from C10k Consulting, ie detaching:

zpool detach myPool c6t9d0

and adding as a spare:

zpool add myPool spare c6t9d0

It appears as though all is well. The new status of my zpool is:

root@server: zpool status -v myPool
  pool: myPool
 state: ONLINE
 scrub: resilver completed after 3h6m with 0 errors on Tue Sep 28 11:15:11 2010
config:

        NAME         STATE     READ WRITE CKSUM
        muPool      ONLINE       0     0     0
          raidz1     ONLINE       0     0     0
            c6t7d0   ONLINE       0     0     0
            c6t8d0   ONLINE       0     0     0
            c6t36d0  ONLINE       0     0     0
            c6t10d0  ONLINE       0     0     0
            c6t11d0  ONLINE       0     0     0
            c6t12d0  ONLINE       0     0     0
        spares
          c6t37d0    AVAIL   
          c6t38d0    AVAIL   
          c6t9d0     AVAIL   

errors: No known data errors

Thanks for your help c10k consulting :)

  • zpool remove myPool c6t37d0

    zpool replace myPool c6t9d0 c6t37d0

    This will make one of your hot spares usable as a normal disk (c6t37d0) and then replace the bad disk (c6t9d0) with the now free disk (c6t37d0) Once everyhting is happy physically replace c6t9d0 and then :

    zppol add myPool spare c6t9d0

    And you will be back to a happy setup with 3 available hot spares.

    Or you can simply pull c6t9d0 and let c6t36d0 take its place by issuing :

    zpool detach myPool c6t9d0

    And then replace c6t9d0 and re-add it as a spare.

    JT.WK : Thanks very much for the response. Could tell me what c6t36d0 is currently doing? Why can I not 'zpool replace the damaged disk with c6t36d0? Cheers
    c10k Consulting : ZFS is using it for read/write to it by the looks of things (the ZFS tools are sadly lacking sometimes in what they tell you) and it looks like its maintaining c6t9d0 as a read/write mirror too. as for can you simply replace c6t9d0 with c6t36d0; `zpool detach myPool c6t9d0` will do just that ;-)
    JT.WK : Cheers mate - will give the detach a go. Thanks again
  • Sounds like you are just scratching the surface in terms of managing ZFS storage.

    Suggest using these 2 links and I think you will pick up some additional data points to get you going:

    For managing zpools: http://docs.huihoo.com/opensolaris/solaris-zfs-administration-guide/html/ch04s04.html

    General ZFS Admin reference: http://www.filibeto.org/~aduritz/truetrue/solaris10/zfsadminguide-html/toc.html

    There are many more but these 2 stuck out in my mind for your particular topic.

    JT.WK : thanks mate - will be sure to look into them :)
    From mxmader
  • (I only wanted to comment, but don't have the points.) Just in case JT.WK wanted to know where in the docs this is, this is very similar to "Oracle® Solaris ZFS Administration Guide" page 88 "Activating and Deactivating Hot Spares in Your Storage Pool". I'm still new to ZFS, and the Admin Guide helps me a lot.

    JT.WK : Thanks Scott - much appreciated.

0 comments:

Post a Comment