Pages

Jun 7, 2008

Eight Ways To Kill Your HDD

To translate this page into Malay, please Click Here..

Eight Ways To Kill Your HDD

Overclocking

Overclocking the system per se does not impose a direct threat to the HDD, however, things change once the PCI frequency goes beyond 38 MHz. Some older drives like the IBM Deskstar5 series would simply corrupt all data, some others like the Western Digitals were running fine up to 42 MHz PCI frequency without problems. The same was true for the older Samsung drives, however, those drives had pitiful performance anyway so we won't talk about them. Factors that influence the stability of the drive at higher IDE frequencies are amongst other things the quality and, as silly as it may appear, the installation of the UATA cables used.

Incorrect Orientation of UATA Cables

Before the days of Ultra DMA 33, IDE cables were available in about any form and shape and length. Some ultra-long cables were up to 24 and 27" long and meant to be used in full tower, others had the middle connectors in all kinds of positions. UDMA/33 brought on some more stringent limitations on the cable length and positioning of the middle connector and while the specs still left open the question of which connector had to be used for master or slaves, certain high-speed drives like the WesternDigital WDC22500, WDC24300 or WDC26400 would either not run or else suffer from extremely compromised longevity when hooked up to the center connector as single drive on that particular channel. This phenomenon was as puzzling (after all, there should not be any difference since the same wires are used) as it was consistent between users and chipsets. The main reason why the phenomenon was so counterintuitive was that with shorter cable length, most drives would function better on any overclocked system, so why was it necessary to add trace length and move the drive to the very end of the IDE cable?

The explanation is actually very straightforward and leads directly into the definitions of the current UATA cables. All high-speed data transfer cables need termination at the end of any signal path. In the easiest case, termination is just a resistor to ground that absorbs voltage amplitudes also known as signals. If such a termination is missing at the end of the cable, the signal will bounce back and eventually interfere with any forward-moving signal at any point in between. Any drive, though, will act as a terminator but it should be clear that termination is effective only if it is at the end of the cable and not somewhere in between. The proof was in the pudding, that is, by cutting off the tail of the IDE cable, the same drive would suddenly work on the same middle connector that did not function before. At the time, we thought of it as black magic but I still have my 4" home-made, single drive IDE cables that are working flawlessly.

Ultra ATA cables

Aside from signal reflection, another major problem of the 40 wire cables was the electrical cross-talk between data and command lines. Deterioration of signal integrity because of cross talk increases dramatically with cable length, which is the reason for the twisted pair specifications of CAT-5 Ethernet cables, just to give one example. Electrical cross-talk can successfully be eliminated by inserting shields between the individual data wires and this is exactly the reason why Ultra ATA/66 and higher are using 80 wires since 40 of these wires are tied to ground and act as shields between the signal wires.

In addition, UATA/66 cables always have one connector that, by definition, has to be connected to the mainboard, one connector on the far end that is the master or single drive and one connector that is for the slave device in the middle of the chain. The ratio between the cable segments is one of the factors but the real issue is the fact that only on the blue connector are the shield wires connected to ground. After reading the above, there should be no question regarding the rationale behind this arrangement.

Rounded Cables

Some companies have gotten really cute with the design of what has become known as rounded cables. Depending on the manufacturer, the ribbons are sliced into single or multiple strands and then bundled tightly. In most cases, there should be no problems, however, there have been numerous suspicions on many bulletin boards about rounded cables causing higher coaster rates while burning CDs. Likewise, users of rounded cables appear to suffer from abnormally high failure rates of HDDs. Again, this could be pure coincidence but it is food for thought, at least.

Removable Drive Racks

Aside from the speculations about rounded cables, there are some hard data on removable drive racks. Removable drive racks are casings that can be inserted into a 5.25 drive bay and which hold a HDD inside a casing that can slide in and out the drive bay to lock into a connector in the back of the device. Even though this kind of gizmo does not allow hot swapping of HDDs, it allows removal of a drive at the turn of a key or so the manufacturers claim.

We tested several of these devices with the IBM 60GXP and the unfortunate result was that not a single drive survived in the removable rack for more than a few hours. Symptoms usually started with lagging of the system to go to the Windows splash screen after the POST. This lag continued to get progressively worse over several reboots until the drive finally was no longer recognized at all.

Interestingly, if the drive was removed from the removable racks during the initial stages of errors and hooked up to a standard UATA cable, full functionality was restored immediately. Whenever we waited until the drive was no longer recognized, switching back could not resolve the problems anymore.

A repetition of the above background appears like carrying owls to Athens but imagine an additional interface with rerouting of the signals from the original cable to a snap-in connector and from there through an additional cable inside the "coffin" to the drive. In short, adding three additional interfaces does not appear a good idea. Per se, though, the concept is great, though, only, don't use it with anything faster than a UATA/66 drive.

TRAS Violation: The Creeping Corruption of a HDD

One of the most common reasons for HDD failure is what is called tRAS violation. tRAS is the minimum bank open time of the DRAM, that is, we are talking about system memory here. Many mainboard manufacturer still include Ultra and Turbo settings in their CMOS setup options that are only workable at 100 MHz memory bus settings, a.k.a PC1600 mode. One setting that has absolutely no impact on performance is the minimum bank open time or tRAS, while the same setting can have catastrophic consequences for data integrity including HDD addressing schemes if the latency is set too short. In theory, tRAS can be as short as tRCD + CAS delay, however, in reality, the minimum bank open time is dictated by the RAS Pulse Width, that is the time required to reach a voltage differential between memory bitlines and reference lines to safely identify a 0 or 1 logical state.

The main reason why tRAS violation does commonly lead to HDD corruption may relate to the translation of the physical memory space into virtual memory sub-spaces by the operating system and finally writing the data back to the storage media but it is not entirely clear what is going on there. A fact is, though, that a tRAS value of 5 is adequate for PC1600 or 100 MHz operation. At 133 Mz or PC2100, tRAS should never undercut 6T, likewise, at PC2700, the value should be increased to 7T where applicable. In terms of performance, tRAS settings hardly make any difference. We challenged some performance gurus at AMD on this matter and they reported a drop in Quake frame rates from 792 fps to 790 fps when increasing tRAS from 5T to 6T.

Tipping Over of Cases

By far the most common damage to drives occurs through mechanical interference. The level of stress depends on the operational state of the drive, that is a drive that is not powered up will withstand some 300 G over 2 msec whereas in a drive that is up and running, 30 G suffice to cause errors and bad sectors. 30 G sounds rather high but any case tipping over and falling on a non-carpeted floor will easily exceed this value. The typical consequence is that the next bootup will terminate with the well known Chrrr, chrrr, chrrr ..... where the splash screen used to be.

Vibrations, Mortal Enemies of HDDs

Less dramatic but likewise common is the drumming on a somewhat fragile desk or bouncing of objects like bouncing balls of computer cases or simply hitting the case with the vacuum cleaner. This kind of scenario can cause vibrations that are the worst possible scenario for any HDD. If a drive can sustain a shock of 30-50G, the tolerance towards vibrations is usually only 1% of the shock tolerance. Typical values are in the order of 0.5-0.7G. It is happening every day. Transporting systems back and forth to LAN parties falls into the same category, in cases like that I always remove the HDD and transport it separately.

Power Outages In The Midst Of Defragmentation

A relatively rare cause of HDD failure is a power outage in the middle of a defragmentation but I have seen it happen and even though the damage is non-permanent in most cases, it may require a low level format which results in complete loss of all data.

HDTach and Similar Benchmark Programs

As nice as these programs are, repetitive use of HDTach and similar utilities add excessive stress to the drive. Even though failure may not occur immediately after benchmarking the drive they can weaken the drive and all it will take is another straw to break the camel's back. All those are factors that should be taken into account when yet another drive gets corrupted or dies. Sometimes it is just not the drive but a user pattern.

0 comments:

 
Related Posts Plugin for WordPress, Blogger...