Bonjour,
J'ai eu un souci il y a quelque temps avec un post qui m'a valu à peu de choses près d'être traité de débile sur un autre sous-forum, problème de casse de système, je n'y reviens pas, sauf sur un point : on m'a reproché d'être irresponsable et de vivre dangereusement parce qu'un de mes disques (un WD "WDC WD20EFRX-68EUZN0") en était à >21000 erreurs , mais aussi à chaque compte-rendu rendait entre autres le message suivant :
"SMART overall-health self-assessment test result: PASSED"
et un peu plus loin :
"Warning: ATA error count 21nnn inconsistent with error log pointer 2/3"
(le 2/3 c'est parce que c'est en général le pointer 2, mais parfois le pointer 3)
J'ai effectivement continué à me servir de ce disque, pour voir, en faisant des sauvegardes fréquentes (je suis curieux et je ne m'offre pas 2To de disque tous les jours). Il en est maintenant à près de 46000 erreurs, et pas le moindre fichier perdu, juste parfois une I/o qui semble tarder à se terminer.
J'ai un peu googlé avec les mots-clés
"ATA error count inconsistent with error log pointer 2"
et même avec cette longue liste de mots-clés, on obtient pas mal de réponses: le problème semble répandu.
Un truc que je ne comprends pas, c'est le "inconsistent".
J'ai acheté en même temps 2 dd identiques, l'un me donne des erreurs (environ 46000) et parle d'inconsistence entre 2 mesures, l'autre ne parle pas d' "inconsisitence"
et ne présente que 13 erreurs, pour un nombre d'heures plus faible mais comparable (620 contre 760).
Ces pb peuvent-ils s'expliquer par des faux-contacts dans les cables SATA lors de la première utilisation, par exemple (car j'ai bien eu des bruits bizarres avec certains cables SATA trop raides), et sinon, ce pb, répandu, a-t-il une solution ?
trace smart du disque en défaut :
a2mains@PC-markorki:/home/marc$ sudo smartctl -s on -o on -a /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-54-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD20EFRX-68EUZN0
Serial Number: WD-WCC4M0974286
LU WWN Device Id: 5 0014ee 2b42f8804
Firmware Version: 80.00A80
User Capacity: 2000398934016 bytes [2,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 25 20:37:13 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Automatic Offline Testing Enabled every four hours.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (28200) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
trace smart du disque de même type, qui n'a pas de problème :
a2mains@PC-markorki:/home/marc$ sudo smartctl -s on -o on -a /dev/sdc
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-54-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD20EFRX-68EUZN0
Serial Number: WD-WCC4M0970110
LU WWN Device Id: 5 0014ee 209847e62
Firmware Version: 80.00A80
User Capacity: 2000398934016 bytes [2,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Fri Jul 26 17:50:05 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Automatic Offline Testing Enabled every four hours.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (26640) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 269) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 47
3 Spin_Up_Time 0x0027 217 168 021 Pre-fail Always - 2125
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1665
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 080 080 000 Old_age Always - 14975
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1604
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 213
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 839357
194 Temperature_Celsius 0x0022 098 092 000 Old_age Always - 49
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
ATA Error Count: 13 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 13 occurred at disk power-on lifetime: 14967 hours (623 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 69 65 d6 e3 Error: UNC 8 sectors at LBA = 0x03d66569 = 64382313
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 69 65 d6 e3 00 00:01:19.359 READ DMA
b0 da 00 00 4f c2 00 00 00:01:03.251 SMART RETURN STATUS
b0 d0 01 00 4f c2 00 00 00:01:03.248 SMART READ DATA
b0 d1 01 00 4f c2 00 00 00:01:03.244 SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
ec 00 01 00 00 00 00 00 00:01:03.243 IDENTIFY DEVICE
Error 12 occurred at disk power-on lifetime: 14963 hours (623 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 45 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 45 00 00 00 a0 00 03:18:18.911 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 00 03:18:18.850 IDENTIFY DEVICE
ec 00 00 00 00 00 a0 00 03:18:13.520 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 03:18:13.518 SET FEATURES [Set transfer mode]
Error 11 occurred at disk power-on lifetime: 14963 hours (623 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 45 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 45 00 00 00 a0 00 03:18:13.518 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 00 03:18:13.515 IDENTIFY DEVICE
ec 00 00 00 00 00 a0 00 03:17:58.242 IDENTIFY DEVICE
Error 10 occurred at disk power-on lifetime: 14963 hours (623 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 08 00 08 00 e0 Error: IDNF at LBA = 0x00000800 = 2048
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 08 00 08 00 e0 00 03:12:09.640 WRITE DMA
Error 9 occurred at disk power-on lifetime: 14963 hours (623 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 08 60 49 4f e0 Error: IDNF at LBA = 0x004f4960 = 5196128
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 08 60 49 4f e0 00 03:09:32.732 WRITE DMA
ec 00 00 00 00 00 a0 00 03:09:32.730 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 00 03:09:32.729 SET FEATURES [Set transfer mode]
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
a2mains@PC-markorki:/home/marc$
a2mains@PC-markorki:/home/marc$