Bonjour,
Je me suis monté un NAS (sous Debian 8.8) avec mes données sur un groupe RAID5 (md0) composé de quatre partitions : sdb1, sdc1, sdd1, sde1.
Les disques sont des Seagate 6 To SATA modèles grand public.
Je constate un comportement assez "funky" de la part du disque sdd : il a décidé que son nom est devenu "has been" et que dorénavant il s'appellera sdf !
sdf est toujours vu par fdisk et hdparm et renvoie ses informations (taille, numéro de série, etc.), donc le disque n'est à priori pas mort.
C'est certes rigolo mais cela m'embête dans la mesure où cela entraîne évidement la bascule de md0 en mode dégradé vu que l'un de ses composants disparait.
Après reboot, le disque facétieux reprend son identité originelle (sdd) et peut être ajouté à md0 sans provoquer de resynchronisation (je suppose que c'est parce qu'il n'y a pas eu d'écritures sur md0 durant le mode dégradé).
Or moins de deux/trois heures après ce reboot, sdf refait son apparition !
Le problème semblant donc récurrrent, je viens demander vos conseils…
Le disque est-il bon à changer ?
Si non, y a t il une astuce pour l'empécher de changer d'ID ?
La carte mère est-elle bonne à changer ?
Obi Wan Kenobi ?
J'ai lancé le test "smartctl -t long -C /dev/sdf" et je devrais avoir le résultat ce soir.
Merci d'avance pour vos suggestions.
# hello
Posté par ouafnico (site web personnel) . Évalué à 4.
chelou ton soucis !
tu as vu des trucs dans dmesg ?
A mon avis, ton disque perd le jus, reboot, et arrive en sdf, car l'ancien est pas vraiment "disparu".
sinon, question con, y'a pas moyen d'utiliser des UUID avec mdadm (de mémoire non) ?
[^] # Re: hello
Posté par gzgtrhe . Évalué à 1.
Je suis au bureau sans accès à la machine et posterais le dmesg ce soir (tard).
J'avais regardé et il m'a semblé voir des erreurs relatives à md0 et sdd.
Bonne question, je vais vérifier dans la doc de mdadm.
Mais en supposant que cela soit permis, est-il possible de changer cela sans détruire le groupe RAID ?
Sinon, il n'est pas possible de fixer avec udev l'ID d'un HDD à partir de son UUID ?
Je vais chercher sur Google…
[^] # Re: hello
Posté par 42nodid . Évalué à 1.
Salut,
Apparemment, il n'est pas prévu de faire la création de ton RAID avec les UUID. Mais indiquer les partitions dans ton mdadm.conf ne serait pas obligatoire. (cf. http://www.linuxpedia.fr/doku.php/expert/mdadm )
Si les identifiants de device sont absents, mdadm scanne les superblocks de tous les disques accessible pour construire l'array, c'est du moins ce que je comprends de la réponse ici : https://unix.stackexchange.com/questions/52321/using-uuids-with-mdadm
J'aurais bien fait un test… mais je n'ai pas le matériel à dispo.
Bon courage.
[^] # Re: hello
Posté par gzgtrhe . Évalué à 1.
J'ai un laptop avec un VMware Workstation. Cela me permettra de tester…
Mais weekend de trois jours oblige, je n'y aurai accès qu'à partir de mardi soir.
Je vais regarder les deux liens que tu as posté.
Merci !
[^] # Re: hello
Posté par Anonyme . Évalué à 2.
j'y pense parfois j'ai ce genre de truc avec ma cartes réseau qui passe de eth1 a eth2 lors d'un reboot, ou autre.
du coup je modifie les regles udev pour avoir toujours ma carte avec eth0, même en cas de souci inconnu. Avec le réseau sa passe …
[^] # Re: hello
Posté par dzamlo . Évalué à 1.
Il me semble que c'est possible en utilisant
/dev/disk/by-uuid
ou/dev/disk/by-uuid
.Mais laisser mdadm trouver tout seul les disques me semble la meilleur solution.
# Log
Posté par gzgtrhe . Évalué à 1.
Alors voilà le résultat du test "smartctl -t long -C /dev/sdf".
root@nas:~ # smartctl -a /dev/sdf
smartctl 6.4 2014-10-07 r4002 x86_64-linux-3.16.0-4-amd64
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: ST6000VN0021-1ZA17Z
Serial Number: Z4D3X0H8
LU WWN Device Id: 5 000c50 090b88ed7
Firmware Version: SC61
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is: Fri Jun 2 23:46:43 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 41) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 630) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 083 064 006 Pre-fail Always - 215668399
3 Spin_Up_Time 0x0003 086 084 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 22
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 088 060 045 Pre-fail Always - 623069511
9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7069
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 22
183 Runtime_Bad_Block 0x0032 092 092 000 Old_age Always - 8
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 094 094 000 Old_age Always - 4295032861
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 047 038 040 Old_age Always In_the_past 53 (Min/Max 21/57 #175)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 096 096 000 Old_age Always - 9011
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 9213
194 Temperature_Celsius 0x0022 053 062 000 Old_age Always - 53 (0 19 0 0 0)
195 Hardware_ECC_Recovered 0x001a 024 019 000 Old_age Always - 215668399
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 198 000 Old_age Always - 32
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 6754 (91 225 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 23345377867
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 198936941895
SMART Error Log Version: 1
ATA Error Count: 1
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1 occurred at disk power-on lifetime: 7053 hours (293 days + 21 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
04 51 00 00 00 00 00 Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
00 00 00 00 00 00 00 ff 10d+12:24:27.106 NOP [Abort queued commands]
b0 d4 00 82 4f c2 00 00 10d+12:24:06.934 SMART EXECUTE OFF-LINE IMMEDIATE
b0 d0 01 00 4f c2 00 00 10d+12:24:06.376 SMART READ DATA
ec 00 01 00 00 00 00 00 10d+12:24:06.369 IDENTIFY DEVICE
ec 00 01 00 00 00 00 00 10d+12:24:06.369 IDENTIFY DEVICE
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
1 Extended captive Interrupted (host reset) 90% 7053 -
2 Short offline Completed without error 00% 7030 -
3 Extended offline Interrupted (host reset) 00% 6905 -
4 Short offline Completed without error 00% 6904 -
5 Short offline Completed without error 00% 6880 -
6 Short offline Completed without error 00% 6856 -
7 Short offline Completed without error 00% 6832 -
8 Short offline Completed without error 00% 6808 -
9 Extended offline Interrupted (host reset) 90% 6802 -
10 Short offline Completed without error 00% 6782 -
11 Short offline Completed without error 00% 6758 -
12 Extended offline Completed without error 00% 6745 -
13 Short offline Completed without error 00% 6734 -
14 Short offline Completed without error 00% 6710 -
15 Short offline Completed without error 00% 6686 -
16 Short offline Completed without error 00% 6685 -
17 Extended offline Interrupted (host reset) 00% 6675 -
18 Short offline Completed without error 00% 6643 -
19 Short offline Completed without error 00% 6619 -
20 Short offline Completed without error 00% 6595 -
21 Short offline Completed without error 00% 6571 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[^] # Re: Log
Posté par gzgtrhe . Évalué à 1.
Et la sortie (tronquée) de dmesg.
[ 1.346622] md: md0 stopped.
[ 1.349973] md: bind
[ 1.350107] md: bind
[ 1.350262] md: bind
[ 1.350386] md: bind
[ 1.350399] md: kicking non-fresh sdd1 from array!
[ 1.350402] md: unbind
[ 1.369737] md: export_rdev(sdd1)
[ 1.445777] raid6: sse2x1 10049 MB/s
[ 1.446300] EXT4-fs (sda1): re-mounted. Opts: user_xattr,errors=remount-ro,commit=300,barrier
[ 1.513821] raid6: sse2x2 12339 MB/s
[ 1.581884] raid6: sse2x4 13218 MB/s
[ 1.581885] raid6: using algorithm sse2x4 (13218 MB/s)
[ 1.581886] raid6: using ssse3x2 recovery algorithm
[ 1.661979] xor: using function: prefetch64-sse (15506.000 MB/sec)
[ 1.665195] md: raid6 personality registered for level 6
[ 1.665197] md: raid5 personality registered for level 5
[ 1.665198] md: raid4 personality registered for level 4
[ 1.665653] md/raid:md0: device sdb1 operational as raid disk 0
[ 1.665655] md/raid:md0: device sde1 operational as raid disk 3
[ 1.665656] md/raid:md0: device sdc1 operational as raid disk 1
[ 1.665862] md/raid:md0: allocated 0kB
[ 1.665936] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
[ 1.666042] RAID conf printout:
[ 1.666044] --- level:5 rd:4 wd:3
[ 1.666045] disk 0, o:1, dev:sdb1
[ 1.666046] disk 1, o:1, dev:sdc1
[ 1.666046] disk 3, o:1, dev:sde1
[ 1.666251] created bitmap (44 pages) for device md0
[ 1.667120] md0: bitmap initialized from disk: read 3 pages, set 1 of 89423 bits
[ 1.691878] md0: detected capacity change from 0 to 18003117735936
[ 1.700797] md0: p1
[ 3.144729] EXT4-fs (md0p1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,commit=60,barrier
[ 5119.583230] md: bind
[ 5119.651398] RAID conf printout:
[ 5119.651402] --- level:5 rd:4 wd:3
[ 5119.651405] disk 0, o:1, dev:sdb1
[ 5119.651407] disk 1, o:1, dev:sdc1
[ 5119.651409] disk 2, o:1, dev:sdd1
[ 5119.651411] disk 3, o:1, dev:sde1
[ 5119.651531] md: recovery of RAID array md0
[ 5119.651534] md: minimum guaranteed speed: 1000 KB/sec/disk.
[ 5119.651536] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 5119.651539] md: using 128k window, over a total of 5860389888k.
[ 5120.297413] md: md0: recovery done.
[ 5120.434313] RAID conf printout:
[ 5120.434318] --- level:5 rd:4 wd:4
[ 5120.434322] disk 0, o:1, dev:sdb1
[ 5120.434324] disk 1, o:1, dev:sdc1
[ 5120.434326] disk 2, o:1, dev:sdd1
[ 5120.434329] disk 3, o:1, dev:sde1
[ 5585.595365] EXT4-fs (md0p1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,commit=60,barrier
[13275.995059] ata5.00: exception Emask 0x52 SAct 0x20000000 SErr 0x400c01 action 0x6 frozen
[13275.995081] ata5.00: irq_stat 0x08000000, interface fatal error
[13275.995095] ata5: SError: { RecovData Proto HostInt Handshk }
[13275.995111] ata5.00: failed command: READ FPDMA QUEUED
[13275.995125] ata5.00: cmd 60/00:e8:00:4f:06/01:00:e6:00:00/40 tag 29 ncq 131072 in
res 40/00:e8:00:4f:06/00:00:e6:00:00/40 Emask 0x52 (ATA bus error)
[13275.995159] ata5.00: status: { DRDY }
[13275.995169] ata5: hard resetting link
[13276.315487] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[13276.317272] ata5.00: configured for UDMA/133
[13276.317278] ata5: EH complete
[13305.669128] ata5.00: exception Emask 0x52 SAct 0x4000 SErr 0x400c01 action 0x6 frozen
[13305.669146] ata5.00: irq_stat 0x08000000, interface fatal error
[13305.669159] ata5: SError: { RecovData Proto HostInt Handshk }
[13305.669172] ata5.00: failed command: READ FPDMA QUEUED
[13305.669185] ata5.00: cmd 60/80:70:00:16:f1/00:00:e5:00:00/40 tag 14 ncq 65536 in
res 40/00:70:00:16:f1/00:00:e5:00:00/40 Emask 0x52 (ATA bus error)
[13305.669215] ata5.00: status: { DRDY }
[13305.669224] ata5: hard resetting link
[13305.989604] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[13305.991407] ata5.00: configured for UDMA/133
[13305.991414] ata5: EH complete
[13438.778600] ata5.00: exception Emask 0x52 SAct 0x400000 SErr 0x400c01 action 0x6 frozen
[13438.778641] ata5.00: irq_stat 0x0c000000, interface fatal error
[13438.778674] ata5: SError: { RecovData Proto HostInt Handshk }
[13438.778708] ata5.00: failed command: READ FPDMA QUEUED
[13438.778742] ata5.00: cmd 60/00:b0:00:10:21/01:00:e6:00:00/40 tag 22 ncq 131072 in
res 40/00:b0:00:10:21/00:00:e6:00:00/40 Emask 0x52 (ATA bus error)
[13438.778847] ata5.00: status: { DRDY }
[13438.778876] ata5: hard resetting link
[13439.098927] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[13439.100929] ata5.00: configured for UDMA/133
[13439.100936] ata5: EH complete
[13442.361819] ata5: limiting SATA link speed to 1.5 Gbps
[13442.361824] ata5.00: exception Emask 0x52 SAct 0x8 SErr 0x400c01 action 0x6 frozen
[13442.361876] ata5.00: irq_stat 0x08000000, interface fatal error
[13442.361907] ata5: SError: { RecovData Proto HostInt Handshk }
[13442.361938] ata5.00: failed command: READ FPDMA QUEUED
[13442.361970] ata5.00: cmd 60/00:18:00:06:2a/01:00:e6:00:00/40 tag 3 ncq 131072 in
res 40/00:18:00:06:2a/00:00:e6:00:00/40 Emask 0x52 (ATA bus error)
[13442.362069] ata5.00: status: { DRDY }
[13442.362097] ata5: hard resetting link
[13442.682094] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[13442.684378] ata5.00: configured for UDMA/133
[13442.684385] ata5: EH complete
[32837.723041] Peer 62.235.169.7:34364/57170 unexpectedly shrunk window 654345481:654345545 (repaired)
[43158.662052] perf interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[44421.529747] Peer 91.179.23.164:31553/32950 unexpectedly shrunk window 1077027531:1077027595 (repaired)
[47876.484438] ata5.00: exception Emask 0x52 SAct 0x800000 SErr 0x400c01 action 0x6 frozen
[47876.484497] ata5.00: irq_stat 0x08000000, interface fatal error
[47876.484531] ata5: SError: { RecovData Proto HostInt Handshk }
[47876.484565] ata5.00: failed command: READ FPDMA QUEUED
[47876.484599] ata5.00: cmd 60/00:b8:00:1e:f9/01:00:e0:01:00/40 tag 23 ncq 131072 in
res 40/00:b8:00:1e:f9/00:00:e0:01:00/40 Emask 0x52 (ATA bus error)
[47876.484704] ata5.00: status: { DRDY }
[47876.484733] ata5: hard resetting link
[47876.804743] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[47876.806699] ata5.00: configured for UDMA/133
[47876.806706] ata5: EH complete
[47876.820755] ata5.00: exception Emask 0x52 SAct 0x6000000 SErr 0x400c01 action 0x6 frozen
[47876.820800] ata5.00: irq_stat 0x08000000, interface fatal error
[47876.820827] ata5: SError: { RecovData Proto HostInt Handshk }
[47876.820853] ata5.00: failed command: READ FPDMA QUEUED
[47876.820879] ata5.00: cmd 60/00:c8:00:1f:f9/02:00:e0:01:00/40 tag 25 ncq 262144 in
res 40/00:c8:00:1f:f9/00:00:e0:01:00/40 Emask 0x52 (ATA bus error)
[47876.820963] ata5.00: status: { DRDY }
[47876.820984] ata5.00: failed command: READ FPDMA QUEUED
[47876.821009] ata5.00: cmd 60/00:d0:00:22:f9/01:00:e0:01:00/40 tag 26 ncq 131072 in
res 40/00:c8:00:1f:f9/00:00:e0:01:00/40 Emask 0x52 (ATA bus error)
[47876.821093] ata5.00: status: { DRDY }
[47876.821115] ata5: hard resetting link
[47877.141037] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[47877.143145] ata5.00: configured for UDMA/133
[47877.143153] ata5: EH complete
[47880.644427] ata5.00: exception Emask 0x50 SAct 0x1c SErr 0x4490801 action 0xe frozen
[47880.644472] ata5.00: irq_stat 0x04400040, connection status changed
[47880.644499] ata5: SError: { RecovData HostInt PHYRdyChg 10B8B Handshk DevExch }
[47880.644542] ata5.00: failed command: READ FPDMA QUEUED
[47880.644567] ata5.00: cmd 60/00:10:00:ff:fb/01:00:e0:01:00/40 tag 2 ncq 131072 in
res 40/00:20:00:0a:fc/00:00:e0:01:00/40 Emask 0x50 (ATA bus error)
[47880.644651] ata5.00: status: { DRDY }
[47880.644672] ata5.00: failed command: READ FPDMA QUEUED
[47880.644697] ata5.00: cmd 60/00:18:00:07:fc/02:00:e0:01:00/40 tag 3 ncq 262144 in
res 40/00:20:00:0a:fc/00:00:e0:01:00/40 Emask 0x50 (ATA bus error)
[47880.644780] ata5.00: status: { DRDY }
[47880.644802] ata5.00: failed command: READ FPDMA QUEUED
[47880.644827] ata5.00: cmd 60/00:20:00:0a:fc/01:00:e0:01:00/40 tag 4 ncq 131072 in
res 40/00:20:00:0a:fc/00:00:e0:01:00/40 Emask 0x50 (ATA bus error)
[47880.644910] ata5.00: status: { DRDY }
[47880.644933] ata5: hard resetting link
[47881.368776] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[47881.384148] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[47881.384151] ata5.00: revalidation failed (errno=-5)
[47886.373407] ata5: hard resetting link
[47888.591395] ata5: SATA link down (SStatus 1 SControl 310)
[47888.593035] ata5: hard resetting link
[47890.809184] ata5: SATA link down (SStatus 1 SControl 310)
[47890.809189] ata5.00: disabled
[47890.809201] sd 4:0:0:0: [sdd]
[47890.809202] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[47890.809203] sd 4:0:0:0: [sdd]
[47890.809204] Sense Key : Aborted Command [current] [descriptor]
[47890.809206] Descriptor sense data with sense descriptors (in hex):
[47890.809206] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 01
[47890.809211] e0 fc 0a 00
[47890.809212] sd 4:0:0:0: [sdd]
[47890.809213] Add. Sense: No additional sense information
[47890.809214] sd 4:0:0:0: [sdd] CDB:
[47890.809215] Read(16): 88 00 00 00 00 01 e0 fb ff 00 00 00 01 00 00 00
[47890.809220] end_request: I/O error, dev sdd, sector 8069578496
[47890.809258] sd 4:0:0:0: [sdd]
[47890.809259] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[47890.809260] sd 4:0:0:0: [sdd]
[47890.809260] Sense Key : Aborted Command [current] [descriptor]
[47890.809262] Descriptor sense data with sense descriptors (in hex):
[47890.809262] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 01
[47890.809266] e0 fc 0a 00
[47890.809268] sd 4:0:0:0: [sdd]
[47890.809269] Add. Sense: No additional sense information
[47890.809270] sd 4:0:0:0: [sdd] CDB:
[47890.809270] Read(16): 88 00 00 00 00 01 e0 fc 07 00 00 00 02 00 00 00
[47890.809274] end_request: I/O error, dev sdd, sector 8069580544
[47890.809303] sd 4:0:0:0: [sdd]
[47890.809304] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[47890.809305] sd 4:0:0:0: [sdd]
[47890.809305] Sense Key : Aborted Command [current] [descriptor]
[47890.809306] Descriptor sense data with sense descriptors (in hex):
[47890.809307] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 01
[47890.809311] e0 fc 0a 00
[47890.809313] sd 4:0:0:0: [sdd]
[47890.809313] Add. Sense: No additional sense information
[47890.809314] sd 4:0:0:0: [sdd] CDB:
[47890.809315] Read(16): 88 00 00 00 00
[47890.809317] sd 4:0:0:0: rejecting I/O to offline device
[47890.809324] sd 4:0:0:0: rejecting I/O to offline device
[47890.809352] sd 4:0:0:0: rejecting I/O to offline device
[47890.809377] sd 4:0:0:0: rejecting I/O to offline device
[47890.809403] sd 4:0:0:0: rejecting I/O to offline device
[47890.809429] sd 4:0:0:0: rejecting I/O to offline device
[47890.809454] sd 4:0:0:0: rejecting I/O to offline device
[47890.809484] 01 e0 fc 0a 00 00 00 01 00 00 00
[47890.809487] end_request: I/O error, dev sdd, sector 8069581312
[47890.809518] ata5: EH complete
[47890.809524] ata5.00: detaching (SCSI 4:0:0:0)
[47890.810360] end_request: I/O error, dev sdd, sector 2064
[47890.810388] md: super_written gets error=-5, uptodate=0
[47890.810390] md/raid:md0: Disk failure on sdd1, disabling device.
md/raid:md0: Operation continuing on 3 devices.
[47890.810551] sd 4:0:0:0: [sdd] Synchronizing SCSI cache
[47890.810567] sd 4:0:0:0: [sdd]
[47890.810568] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[47890.810569] sd 4:0:0:0: [sdd] Stopping disk
[47890.810572] sd 4:0:0:0: [sdd] START_STOP FAILED
[47890.810573] sd 4:0:0:0: [sdd]
[47890.810573] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[47890.820266] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[47890.820310] ata5: irq_stat 0x00000040, connection status changed
[47890.820336] ata5: SError: { CommWake DevExch }
[47890.820361] ata5: hard resetting link
[47891.002595] RAID conf printout:
[47891.002599] --- level:5 rd:4 wd:3
[47891.002601] disk 0, o:1, dev:sdb1
[47891.002615] disk 1, o:1, dev:sdc1
[47891.002616] disk 2, o:0, dev:sdd1
[47891.002617] disk 3, o:1, dev:sde1
[47891.017318] RAID conf printout:
[47891.017335] --- level:5 rd:4 wd:3
[47891.017337] disk 0, o:1, dev:sdb1
[47891.017338] disk 1, o:1, dev:sdc1
[47891.017339] disk 3, o:1, dev:sde1
[47893.051031] ata5: SATA link down (SStatus 1 SControl 300)
[47893.051038] ata5: EH complete
[47893.060686] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[47893.060729] ata5: irq_stat 0x00000040, connection status changed
[47893.060766] ata5: SError: { CommWake DevExch }
[47893.060791] ata5: limiting SATA link speed to 1.5 Gbps
[47893.060793] ata5: hard resetting link
[47895.288999] ata5: SATA link down (SStatus 1 SControl 310)
[47895.289006] ata5: EH complete
[47895.290356] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[47895.290418] ata5: irq_stat 0x00000040, connection status changed
[47895.290444] ata5: SError: { CommWake DevExch }
[47895.290473] ata5: limiting SATA link speed to 1.5 Gbps
[47895.290474] ata5: hard resetting link
[47897.518995] ata5: SATA link down (SStatus 1 SControl 310)
[47897.519002] ata5: EH complete
[47897.520372] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[47897.520418] ata5: irq_stat 0x00000040, connection status changed
[47897.520445] ata5: SError: { CommWake DevExch }
[47897.520474] ata5: limiting SATA link speed to 1.5 Gbps
[47897.520475] ata5: hard resetting link
[47899.756984] ata5: SATA link down (SStatus 1 SControl 310)
[47899.756997] ata5: EH complete
[47899.761507] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[47899.761563] ata5: irq_stat 0x00000040, connection status changed
[47899.761597] ata5: SError: { CommWake DevExch }
[47899.761633] ata5: limiting SATA link speed to 1.5 Gbps
[47899.761636] ata5: hard resetting link
[47901.990938] ata5: SATA link down (SStatus 1 SControl 310)
[47901.990946] ata5: EH complete
[47901.992304] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[47901.992363] ata5: irq_stat 0x00000040, connection status changed
[47901.992390] ata5: SError: { CommWake DevExch }
[47901.992419] ata5: limiting SATA link speed to 1.5 Gbps
[47901.992421] ata5: hard resetting link
[47904.220851] ata5: COMRESET failed (errno=-32)
[47904.220878] ata5: reset failed (errno=-32), retrying in 8 secs
[47911.999727] ata5: hard resetting link
[47913.396947] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[47913.512030] ata5.00: ATA-10: ST6000VN0021-1ZA17Z, SC61, max UDMA/133
[47913.512033] ata5.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[47913.516401] ata5.00: configured for UDMA/133
[47913.516407] ata5: EH complete
[47913.516464] scsi 4:0:0:0: Direct-Access ATA ST6000VN0021-1ZA SC61 PQ: 0 ANSI: 5
[47913.516614] sd 4:0:0:0: [sdf] 11721045168 512-byte logical blocks: (6.00 TB/5.45 TiB)
[47913.516616] sd 4:0:0:0: [sdf] 4096-byte physical blocks
[47913.516686] sd 4:0:0:0: [sdf] Write Protect is off
[47913.516689] sd 4:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[47913.516698] sd 4:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[47913.516794] sd 4:0:0:0: Attached scsi generic sg3 type 0
[47913.573790] sdf: sdf1
[47913.574088] sd 4:0:0:0: [sdf] Attached SCSI disk
[47913.637189] ata5.00: exception Emask 0x52 SAct 0x380 SErr 0x400c01 action 0x6 frozen
[47913.637233] ata5.00: irq_stat 0x0c000000, interface fatal error
[47913.637259] ata5: SError: { RecovData Proto HostInt Handshk }
[47913.637286] ata5.00: failed command: READ FPDMA QUEUED
[47913.637312] ata5.00: cmd 60/08:38:10:00:00/00:00:00:00:00/40 tag 7 ncq 4096 in
res 40/00:40:40:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
[47913.637395] ata5.00: status: { DRDY }
[47913.637423] ata5.00: failed command: READ FPDMA QUEUED
[47913.637448] ata5.00: cmd 60/38:40:40:00:00/00:00:00:00:00/40 tag 8 ncq 28672 in
res 40/00:40:40:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
[47913.637531] ata5.00: status: { DRDY }
[47913.637553] ata5.00: failed command: READ FPDMA QUEUED
[47913.637588] ata5.00: cmd 60/80:48:80:00:00/01:00:00:00:00/40 tag 9 ncq 196608 in
res 40/00:40:40:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
[47913.637671] ata5.00: status: { DRDY }
[47913.637694] ata5: hard resetting link
[47913.957440] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[47913.959447] ata5.00: configured for UDMA/133
[47913.959453] ata5: EH complete
[47944.148114] ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x400001 action 0x6 frozen
[47944.148173] ata5: SError: { RecovData Handshk }
[47944.148197] ata5.00: failed command: READ FPDMA QUEUED
[47944.148223] ata5.00: cmd 60/08:00:90:03:00/00:00:00:00:00/40 tag 0 ncq 4096 in
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[47944.148305] ata5.00: status: { DRDY }
[47944.148328] ata5: hard resetting link
[47944.468373] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[47944.470091] ata5.00: configured for UDMA/133
[47944.470097] ata5: EH complete
[86631.258712] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[86631.258743] ata5.00: failed command: SMART
[86631.258767] ata5.00: cmd b0/d4:00:82:4f:c2/00:00:00:00:00/00 tag 16
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[86631.258834] ata5.00: status: { DRDY }
[86631.258857] ata5: hard resetting link
[86631.579011] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[86631.581116] ata5.00: configured for UDMA/133
[86631.581125] ata5: EH complete
[^] # Re: Log
Posté par ouafnico (site web personnel) . Évalué à 2.
on dirait que tu as un disque qui pète en effet, et qui restart.
à mon avis c'est ça ton défaut :/
# Résolu (temporairement ?)
Posté par gzgtrhe . Évalué à 1.
J'ai réinséré sdf1 dans la grappe RAID. Cela fut un peu laborieux car j'ai du stopper/redémarrer la grappe pour que sdf1 puisse prendre la place de sdd1.
La reconstruction s'est bien passée et md0 est actuellement "clean" selon mdadm.
Si ça pète de nouveau, je considèrerais sérieusement le remplacement du disque…
Suivre le flux des commentaires
Note : les commentaires appartiennent à celles et ceux qui les ont postés. Nous n’en sommes pas responsables.