[zfs-discuss] EDAC Errors

Durval Menezes durval.menezes at gmail.com
Thu Jan 23 18:11:51 EST 2014


Hi Cedric,

The current RAM ECC scheme does detect and correct all single-bit errors,
and detects (and signals the OS, which I think then panics) all double-bit
errors... triple errors can't be reliably detected. that said, the number
of ECC errors you showed in the log above looks truly humongous.

I would get that fixed ASAP: it could be progressing and then you would not
be far behind the point where you get panic-generating double-bit errors,
or worse, undetected triple bit errors.

Cheers,
-- 
   Durval.



On Thu, Jan 23, 2014 at 4:15 PM, Cédric Lemarchand <
cedric.lemarchand at ixblue.com> wrote:

>
>
> Le 23 janv. 2014 à 18:55, Gordan Bobic <gordan.bobic at gmail.com> a écrit :
>
> You have duff RAM. Try reseating it, if that doesn't help replace it.
> But since they are all corrected, your data should be corruption free thus
> far.
>
>
> Thanks for advices Gordan ;-)
>
> Cheers
>
>
>
> Cédric Lemarchand <cedric.lemarchand at ixblue.com> wrote:
>
> Hi Gordan,
>
> Le 16/01/2014 15:29, Gordan Bobic a écrit :
>
> EDAC errors aren't always fatal. Depending on what the log says, it could
> be a bit flip detected and corrected by ECC in RAM (if you have ECC RAM) or
> in the CPU cache.
>
> Ok but ... but it's a bit scary since the counter is increasing from
> 699153 to 871113 today :
>
> mc0: csrow0: CPU_SrcID#0_Channel#3_DIMM#0: 871113 Corrected Errors
>
> The scrub report no errors.
> The thing is that it's the first time I check those errors, maybe it's an
> usual thing that EDAC errors happens. Do I need to worry about that or just
> let the ECC do his job ? until which rate ?
>
> One more time : thank ECC ...
>
>
>
> On Thu, Jan 16, 2014 at 2:19 PM, Cédric Lemarchand <
> cedric.lemarchand at ixblue.com> wrote:
>
>> Hello list,
>>
>> I am facing hardware errors reported by some EDAC counters, it seems I
>> need to check the hardware to find out what is going on (CPU/MB/RAM).
>> I know it's not ZFS related but I am asking myself what are chances those
>> errors can have *invisible* impact on my data pools ? A scrub is running
>> and will take  ~11h to crush the data. The OS *seems* fine, at least did
>> not freeze, no wierd message in logs (exact EDAC errors) and pool status
>> doesn't report any errors.
>>
>> Cheers
>>
>> --
>> Cédric
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to zfs-discuss+unsubscribe at zfsonlinux.org.
>>
>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-discuss+unsubscribe at zfsonlinux.org.
>
>
> --
>  Cédric Lemarchand
> System & Network Engineer
> iXBlue
> 52, avenue de l'Europe
> 78160 Marly le Roi
> France
> Tel. +33 1 30 08 88 88
> Mob. +33 6 37 23 40 93
> Fax +33 1 30 08 88 00
> www.ixblue.com
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-discuss+unsubscribe at zfsonlinux.org.
>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-discuss+unsubscribe at zfsonlinux.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+unsubscribe at zfsonlinux.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.zfsonlinux.org/pipermail/zfs-discuss/attachments/20140123/84e444a5/attachment.html>


More information about the zfs-discuss mailing list