Tips & Tricks Sun Clusters

, par  Olivier Duquesne aka DaffyDuke , popularité : 3%

Cas concret : un disque défaillant

Etat : Le disque défaillant (c2t0d0), après unplug/replug, reboot sans sds + drvconfig+disks+devlinks => a été revu OK par "iostat -En" après reboot.

Mais l’état des metadevices était pitoyable.

disque c2t0d0 malade /pci@6,4000/scsi@4,1/sd@0,0  (sd75)  corrupt label - wrong magic number
On l’a dépluggé, puis repluggé + drvconfig + disks + devlinks, il faudrait maintenant rebooter.

cmapqlf01:root} metastat d50
d50: Mirror
    Submirror 0: d51
      State: Needs maintenance 
    Submirror 1: d52
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 308826 blocks

d51: Submirror of d50
    State: Needs maintenance 
    Invoke: metareplace d50 c2t0d0s0 <new device>
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c2t0d0s0                   0     No    Maintenance  

d52: Submirror of d50
    State: Okay         
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c2t1d0s0                   0     No    Okay         
cmapqlf01:root} metadb -i
        flags           first blk       block count
     a m   c luo        16              1034            /dev/dsk/c1t0d0s7
     a     c luo        1050            1034            /dev/dsk/c1t0d0s7
     a     c luo        2084            1034            /dev/dsk/c1t0d0s7
     a     c luo        16              1034            /dev/dsk/c1t1d0s7
     a     c luo        1050            1034            /dev/dsk/c1t1d0s7
     a     c luo        2084            1034            /dev/dsk/c1t1d0s7
    M      c            unknown         unknown         /dev/dsk/c2t0d0s7
    M      c            unknown         unknown         /dev/dsk/c2t0d0s7
    M      c            unknown         unknown         /dev/dsk/c2t0d0s7
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors
cmapqlf01:root} metadb -d /dev/dsk/c2t0d0s7
cmapqlf01:root} metadb
        flags           first blk       block count
     a m   c luo        16              1034            /dev/dsk/c1t0d0s7
     a     c luo        1050            1034            /dev/dsk/c1t0d0s7
     a     c luo        2084            1034            /dev/dsk/c1t0d0s7
     a     c luo        16              1034            /dev/dsk/c1t1d0s7
     a     c luo        1050            1034            /dev/dsk/c1t1d0s7
     a     c luo        2084            1034            /dev/dsk/c1t1d0s7
cmapqlf01:root} metastat d20       
d20: Mirror
    Submirror 0: d21
      State: Needs maintenance 
    Submirror 1: d22
      State: Needs maintenance 
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 1231713 blocks

d21: Submirror of d20
    State: Needs maintenance 
    Invoke: metareplace d20 c1t0d0s4 <new device>
    Size: 1231713 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s4                   0     No    Maintenance  

d22: Submirror of d20
    State: Needs maintenance 
    Invoke: after replacing "Maintenance" components:
                metareplace d20 c1t1d0s4 <new device>
    Size: 1231713 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s4                   0     No    Last Erred   

cmapqlf01:root} metareplace -e d20 c1t0d0s4 
metareplace: cmapqlf01: c1t0d0s4: is mounted on /var

cmapqlf01:root} metareplace -e d20 c1t1d0s4 
metareplace: cmapqlf01: d20: c1t1d0s4: component in invalid state to replace - Replace "Maintenance" components first

cmapqlf01:root} metadetach -f d20 d22
metadetach: cmapqlf01: d20: operation would result in no readable submirrors

cmapqlf01:root} metadetach -f d20 d21
d20: submirror d21 is detached
cmapqlf01:root} metaclear d20
metaclear: cmapqlf01: d20: attempted to clear mirror with submirror(s) in invalid state

cmapqlf01:root} metastat d20
d20: Mirror
    Submirror 1: d22
      State: Needs maintenance 
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 1231713 blocks

d22: Submirror of d20
    State: Needs maintenance 
    Invoke: after replacing "Maintenance" components:
                metareplace d20 c1t1d0s4 <new device>
    Size: 1231713 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s4                   0     No    Last Erred   


cmapqlf01:root} metaclear -f d20
d20: Mirror is cleared
cmapqlf01:root} metastat d20
metastat: cmapqlf01: d20: unit not set up

cmapqlf01:root} metastat d21
d21: Concat/Stripe
    Size: 1231713 blocks
    Stripe 0:
        Device              Start Block  Dbase
        c1t0d0s4                   0     No   

cmapqlf01:root} metastat d22
d22: Concat/Stripe
    Size: 1231713 blocks
    Stripe 0:
        Device              Start Block  Dbase
        c1t1d0s4                   0     No   

cmapqlf01:root} metainit d20 -m d21
d20: Mirror is setup
cmapqlf01:root} metastat d20
d20: Mirror
    Submirror 0: d21
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 1231713 blocks

d21: Submirror of d20
    State: Okay         
    Size: 1231713 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s4                   0     No    Okay         

cmapqlf01:root} df -k
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c1t0d0s0     771110  352347  364786    50%    /
/dev/dsk/c1t0d0s3    1018382  178193  779087    19%    /usr
/proc                      0       0       0     0%    /proc
fd                         0       0       0     0%    /dev/fd
mnttab                     0       0       0     0%    /etc/mnttab
/dev/dsk/c1t0d0s4     578351  365642  154874    71%    /var
swap                 4653176      24 4653152     1%    /var/run
swap                 4653160       8 4653152     1%    /tmp
/dev/dsk/c1t0d0s5    4492386 1979687 2467776    45%    /sybase

cmapqlf01:root} metastat d50      
d50: Mirror
    Submirror 0: d51
      State: Needs maintenance 
    Submirror 1: d52
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 308826 blocks

d51: Submirror of d50
    State: Needs maintenance 
    Invoke: metareplace d50 c2t0d0s0 <new device>
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c2t0d0s0                   0     No    Maintenance  

d52: Submirror of d50
    State: Okay         
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c2t1d0s0                   0     No    Okay         

cmapqlf01:root} metadetach -f d50 d51
d50: submirror d51 is detached
cmapqlf01:root} metaclear d50
d50: Mirror is cleared

cmapqlf01:root} metastat d51   
d51: Concat/Stripe
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase
        c2t0d0s0                   0     No   

cmapqlf01:root} metastat d52
d52: Concat/Stripe
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase
        c2t1d0s0                   0     No   

A l’iostat -En, on voit que le disque c2t0d0 est OK
Mais au format, il demande à être labellé. => il faut y virer d’abord tous les metadevices

cmapqlf01:root} metaclear -f d51
d51: Concat/Stripe is cleared

cmapqlf01:root} metastat d60
d60: Mirror
    Submirror 0: d61
      State: Needs maintenance 
    Submirror 1: d62
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 308826 blocks

d61: Submirror of d60
    State: Needs maintenance 
    Invoke: metareplace d60 c2t0d0s1 <new device>
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c2t0d0s1                   0     No    Maintenance  

d62: Submirror of d60
    State: Okay         
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c2t1d0s1                   0     No    Okay         

cmapqlf01:root} metadetach -f d60 d61
d60: submirror d61 is detached
cmapqlf01:root} metaclear d60
d60: Mirror is cleared
cmapqlf01:root} metaclear -f d61
d61: Concat/Stripe is cleared
cmapqlf01:root} metastat d62
d62: Concat/Stripe
    Size: 308826 blocks
    Stripe 0:
        Device              Start Block  Dbase
        c2t1d0s1                   0     No   

Idem : d71 need maintenance

cmapqlf01:root} metadetach -f d70 d71
d70: submirror d71 is detached

cmapqlf01:root} metaclear d70
d70: Mirror is cleared

cmapqlf01:root} metaclear -f d71
d71: Concat/Stripe is cleared

cmapqlf01:root}  metastat d72
d72: Concat/Stripe
    Size: 2050461 blocks
    Stripe 0:
        Device              Start Block  Dbase
        c2t1d0s3                   0     No   

Idem: d81 need maint
cmapqlf01:root} metadetach -f d80 d81
d80: submirror d81 is detached
cmapqlf01:root} metaclear d80
d80: Mirror is cleared
cmapqlf01:root} metaclear -f d81
d81: Concat/Stripe is cleared

Idem d90 need maint
cmapqlf01:root} metadetach -f d90 d91
d90: submirror d91 is detached
cmapqlf01:root} metaclear d90
d90: Mirror is cleared
cmapqlf01:root} metaclear -f d91
d91: Concat/Stripe is cleared
cmapqlf01:root} metastat d92
d92: Concat/Stripe

Reformatage du disque c2t0d0

cmapqlf01:root} metainit d51 1 1 /dev/dsk/c2t0d0s0
d51: Concat/Stripe is setup
cmapqlf01:root} metainit d61 1 1 /dev/dsk/c2t0d0s1
d61: Concat/Stripe is setup
cmapqlf01:root}  metainit d71 1 1 /dev/dsk/c2t0d0s3
d71: Concat/Stripe is setup
cmapqlf01:root} metainit d81 1 1 /dev/dsk/c2t0d0s4
d81: Concat/Stripe is setup
cmapqlf01:root} metainit d91 1 1 /dev/dsk/c2t0d0s5
d91: Concat/Stripe is setup

metinit d50 -m d51
metattach d50 d52

idem d60, d70, d80, d90 + attente synchro
refaire les newfs des d70 et d80, peut-être malmenés ?
Modif vfstab pour ré-intégrer tous les miroirs sauf pour / et swap
On prévoit de rebooter....

Avant : mise aux droits sybase des devices correspondants aux miroirs + aux sub-miroirs des
raw qu’utilisera sybase

cmapqlf01:root} ls -l /dev/ASE*
total 12
lrwxrwxrwx   1 sybase   sybase        16 Sep 30 16:19 log_rdev01 -> /dev/md/rdsk/d90
lrwxrwxrwx   1 root     other         18 Oct  7 09:18 master -> /dev/rdsk/c2t0d0s0
lrwxrwxrwx   1 sybase   sybase        16 Sep 30 16:08 master_1 -> /dev/md/rdsk/d50
lrwxrwxrwx   1 root     other         18 Oct  7 09:19 sybsystem -> /dev/rdsk/c2t0d0s1
lrwxrwxrwx   1 sybase   sybase        16 Sep 30 16:17 sybsystem_1 -> /dev/md/rdsk/d60
lrwxrwxrwx   1 sybase   sybase        17 Sep 30 16:29 user_rdev01 -> /dev/md/rdsk/d110

cmapqlf01:root} metastat -p d50
d50 -m d51 d52 1
d51 1 1 c2t0d0s0
d52 1 1 c2t1d0s0

cmapqlf01:root} ls -l /dev/rdsk/c2t0d0s0
lrwxrwxrwx   1 root     root          46 Aug 20 18:56 /dev/rdsk/c2t0d0s0 -> ../../devices/pci@6,4000/scsi@4,1/sd@0,0:a,raw

cmapqlf01:root} ls -l /devices/pci@6,4000/scsi@4,1/sd@0,0:a,raw
crw-r-----   1 sybase   sybase    32,600 Oct  7 09:21 /devices/pci@6,4000/scsi@4,1/sd@0,0:a,raw

cmapqlf01:root} ls -l /dev/rdsk/c2t1d0s0
lrwxrwxrwx   1 root     root          46 Aug 20 18:56 /dev/rdsk/c2t1d0s0 -> ../../devices/pci@6,4000/scsi@4,1/sd@1,0:a,raw

cmapqlf01:root} ls -l /devices/pci@6,4000/scsi@4,1/sd@1,0:a,raw
crw-r-----   1 root     sys       32,608 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@1,0:a,raw

cmapqlf01:root} chown sybase:sybase /devices/pci@6,4000/scsi@4,1/sd@1,0:a,raw

cmapqlf01:root} ls -l /devices/pci@6,4000/scsi@4,1/sd@1,0:a,raw
crw-r-----   1 sybase   sybase    32,608 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@1,0:a,raw

cmapqlf01:root} metastat -p d60
d60 -m d61 d62 1
d61 1 1 c2t0d0s1
d62 1 1 c2t1d0s1

cmapqlf01:root} ls -l  /dev/rdsk/c2t0d0s1 /dev/rdsk/c2t1d0s1
lrwxrwxrwx   1 root     root          46 Aug 20 18:56 /dev/rdsk/c2t0d0s1 -> ../../devices/pci@6,4000/scsi@4,1/sd@0,0:b,raw
lrwxrwxrwx   1 root     root          46 Aug 20 18:56 /dev/rdsk/c2t1d0s1 -> ../../devices/pci@6,4000/scsi@4,1/sd@1,0:b,raw

cmapqlf01:root} ls -l /devices/pci@6,4000/scsi@4,1/sd@0,0:b,raw /devices/pci@6,4000/scsi@4,1/sd@1,0:b,raw    
crw-r-----   1 sybase   sybase    32,601 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@0,0:b,raw
crw-r-----   1 root     sys       32,609 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@1,0:b,raw
cmapqlf01:root} chown sybase:sybase /devices/pci@6,4000/scsi@4,1/sd@1,0:b,raw

cmapqlf01:root} metastat -p d90
d90 -m d91 d92 1
d91 1 1 c2t0d0s5
d92 1 1 c2t1d0s5

cmapqlf01:root} ls -l /dev/rdsk/c2t0d0s5 /dev/rdsk/c2t1d0s5 
lrwxrwxrwx   1 root     root          46 Aug 20 18:56 /dev/rdsk/c2t0d0s5 -> ../../devices/pci@6,4000/scsi@4,1/sd@0,0:f,raw
lrwxrwxrwx   1 root     root          46 Aug 20 18:56 /dev/rdsk/c2t1d0s5 -> ../../devices/pci@6,4000/scsi@4,1/sd@1,0:f,raw

cmapqlf01:root} ls -l /devices/pci@6,4000/scsi@4,1/sd@0,0:f,raw /devices/pci@6,4000/scsi@4,1/sd@1,0:f,raw
crw-r-----   1 root     sys       32,605 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@0,0:f,raw
crw-r-----   1 root     sys       32,613 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@1,0:f,raw

cmapqlf01:root} chown sybase:sybase /devices/pci@6,4000/scsi@4,1/sd@0,0:f,raw /devices/pci@6,4000/scsi@4,1/sd@1,0:f,raw

cmapqlf01:root} ls -l /devices/pci@6,4000/scsi@4,1/sd@0,0:f,raw /devices/pci@6,4000/scsi@4,1/sd@1,0:f,raw
crw-r-----   1 sybase   sybase    32,605 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@0,0:f,raw
crw-r-----   1 sybase   sybase    32,613 Aug 20 18:56 /devices/pci@6,4000/scsi@4,1/sd@1,0:f,raw

cmapqlf01:root} metastat -p d110
d110 -m d111 d112 1
d111 1 1 c3t0d0s1
d112 1 1 c3t1d0s1

cmapqlf01:root} ls -l /dev/rdsk/c3t0d0s1 /dev/rdsk/c3t1d0s1
lrwxrwxrwx   1 root     other         45 Sep 29 17:59 /dev/rdsk/c3t0d0s1 -> ../../devices/pci@1f,4000/scsi@3/sd@0,0:b,raw
lrwxrwxrwx   1 root     other         45 Sep 29 17:59 /dev/rdsk/c3t1d0s1 -> ../../devices/pci@1f,4000/scsi@3/sd@1,0:b,raw

cmapqlf01:root} ls -l /devices/pci@1f,4000/scsi@3/sd@0,0:b,raw /devices/pci@1f,4000/scsi@3/sd@1,0:b,raw
crw-r-----   1 root     sys       32,  1 Sep 29 17:59 /devices/pci@1f,4000/scsi@3/sd@0,0:b,raw
crw-r-----   1 root     sys       32,  9 Sep 29 17:59 /devices/pci@1f,4000/scsi@3/sd@1,0:b,raw
cmapqlf01:root} chown sybase:sybase /devices/pci@1f,4000/scsi@3/sd@0,0:b,raw /devices/pci@1f,4000/scsi@3/sd@1,0:b,raw

cmapqlf01:root} ls -l /dev/md/rdsk/d50
lrwxrwxrwx   1 root     other         37 Sep 30 11:01 /dev/md/rdsk/d50 -> ../../../devices/pseudo/md@0:0,50,raw

cmapqlf01:root} ls -l /devices/pseudo/md@0:0,50,raw
crw-r-----   1 sybase   sybase    85, 50 Oct  7 09:16 /devices/pseudo/md@0:0,50,raw

cmapqlf01:root} ls -l /dev/md/rdsk/d60
lrwxrwxrwx   1 root     other         37 Sep 30 11:01 /dev/md/rdsk/d60 -> ../../../devices/pseudo/md@0:0,60,raw

cmapqlf01:root} ls -l /devices/pseudo/md@0:0,60,raw
crw-r-----   1 sybase   sybase    85, 60 Sep 30 11:01 /devices/pseudo/md@0:0,60,raw

cmapqlf01:root} ls -l /dev/md/rdsk/d90
lrwxrwxrwx   1 root     other         37 Sep 30 11:01 /dev/md/rdsk/d90 -> ../../../devices/pseudo/md@0:0,90,raw

cmapqlf01:root} ls -l /devices/pseudo/md@0:0,90,raw
crw-r-----   1 sybase   sybase    85, 90 Sep 30 11:01 /devices/pseudo/md@0:0,90,raw

cmapqlf01:root} ls -l /dev/md/rdsk/d110
lrwxrwxrwx   1 root     other         38 Sep 30 11:01 /dev/md/rdsk/d110 -> ../../../devices/pseudo/md@0:0,110,raw
cmapqlf01:root} ls -l /devices/pseudo/md@0:0,110,raw
crw-r-----   1 sybase   sybase    85,110 Sep 30 11:01 /devices/pseudo/md@0:0,110,raw

=> tout est bien à sybase !
Et rétablissement des bons liens vers les miroirs :

cmapqlf01:root} ls -l /dev/ASE01
total 12
lrwxrwxrwx   1 sybase   sybase        16 Sep 30 16:19 log_rdev01 -> /dev/md/rdsk/d90
lrwxrwxrwx   1 sybase   sybase        16 Sep 30 16:08 master -> /dev/md/rdsk/d50
lrwxrwxrwx   1 root     other         18 Oct  7 09:18 old_master -> /dev/rdsk/c2t0d0s0
lrwxrwxrwx   1 root     other         18 Oct  7 09:19 old_sybsystem -> /dev/rdsk/c2t0d0s1
lrwxrwxrwx   1 sybase   sybase        16 Sep 30 16:17 sybsystem -> /dev/md/rdsk/d60
lrwxrwxrwx   1 sybase   sybase        17 Sep 30 16:29 user_rdev01 -> /dev/md/rdsk/d110

Ajout d’un replica

cmapqlf01:root} metadb
        flags           first blk       block count
     a m  pc luo        16              1034            /dev/dsk/c1t0d0s7
     a    pc luo        1050            1034            /dev/dsk/c1t0d0s7
     a    pc luo        2084            1034            /dev/dsk/c1t0d0s7
     a    pc luo        16              1034            /dev/dsk/c1t1d0s7
     a    pc luo        1050            1034            /dev/dsk/c1t1d0s7
     a    pc luo        2084            1034            /dev/dsk/c1t1d0s7

cmapqlf01:root} metadb -a /dev/dsk/c2t1d0s7

cmapqlf01:root} metadb
        flags           first blk       block count
     a m  pc luo        16              1034            /dev/dsk/c1t0d0s7
     a    pc luo        1050            1034            /dev/dsk/c1t0d0s7
     a    pc luo        2084            1034            /dev/dsk/c1t0d0s7
     a    pc luo        16              1034            /dev/dsk/c1t1d0s7
     a    pc luo        1050            1034            /dev/dsk/c1t1d0s7
     a    pc luo        2084            1034            /dev/dsk/c1t1d0s7
     a        u         16              1034            /dev/dsk/c2t1d0s7

reboot, donc....
Avec tous les miroirs sauf / (root), c’est OK.
Si on fait juste l’échange des # dans vfstab pour / (d40 au lieu de c1t0d0s0 ) =>
alors ça marche pas au boot (et il faut à nouveau booter cdrom pour inverser la vapeur
dans vfstab).

Après un nouveau reboot sur c1t0d0s0, on corrige :

metadetach d40 d42
metaclear d40 

(d40 disparaît, reste les deux sous-miroirs)

metainit d40 -m d41
metaroot d40
lockfs -fa

Après reboot (ok) : metattach d40 d42 (synchro)

Il faut refaire un essai de boot sur le miroir....