Tape Recovery – How Tape Works
Tape Recovery – How Tape Works
The use of tape for data storage and data recovery in the computer industry goes back many decades. Tape provided a solid and robust means of storing code and data, along with a far lower cost-of-ownership than the available hard disk options.
Today the cost of hard disk has plummeted, but tape storage is still considered to be the best available form of long-term archival storage in terms of price and resilience.
Those of us who were born long enough ago can remember treading gingerly, and speaking in low-tones when passing the floor of the computer building where the hard disks were housed. Disks were unreliable, low volume, and expensive to run whereas to recover data from tapes was fast enough and likely to work.
The concept of “near-line” storage developed, and still exists in the world of mainframe, AS/400 and large scale UNIX computing. Years ago a request to recover a file would result in a message popping up on the computer operator’s screen to fetch the open reel tape labelled KV19473D and load it on drive 15. The data was recovered from the tape after only a short delay to the user.
These days the operator has been replaced by some form of robotic tape library, and the open reel tape by a tape cartridge that can be handled mechanically (for example IBM 3590 and TS1120, STK 9840, 9940 and T10000, and of course LTO Ultrium and DLT). This process developed into Hierarchical Storage Management, also named HSM, and allows for “infinite” storage (as infinite as you can afford space, tape drives and media).
With smaller systems, this includes some systems that would look pretty large today such as MicroVax, there was a more procedural use of tape data storage. Partly this was due to the cost of robotic equipment, but mostly as the rise of the mini and micro computer coincided with the start of the rise in lower cost more reliable hard disks and the concept of client/server and the daily tape backup as a source or data only required when a failure occurred and so to avoid requiring hard drive data recovery work.
Attempts at introducing HSM into this market, using intermediate storage such as Optical Disk and using tape for longer term archive, came and went throughout the 90’s but largely tape was used as a backup and retrieval medium.
How Tape Storage differs from Disk
Setting aside the material differences and the low-level recording technologies used the general concepts are no different between magnetic tape and hard disk. Each uses magnetism to encode data on a suitably receptive recording medium.
The real differences are in implementation and usage, and reflect the major physical differences between the two.
The short answer to “what is the difference” is that disk is a random access medium and tape is a sequential one. To go into greater depth, disks are generally pre-formatted with a known number of recordable “sectors” whereas tape is written on-the-fly.
The sequential access nature of tape reflects its physical character, it is long and narrow and to get to some data at the far end the drive has to traverse the length of the tape. With disk recording to recover any recorded sector all the drive must to is position the read head to the right track and wait for the data to spin past. So an access time of small fractions of seconds versus anything up to a couple of minutes, you wouldn’t get far implementing random access on tape.
The issue of formatting though is far from clear. Early open reel tapes, Exabytes and quarter-inch cartridges (the older version of SLR often known as streamers) had erase mechanisms that cleaned the tape ahead of the write head so recording was always to blank tape.
The smaller quarter-inch cartridges, DC2000 and more recently Travan, ADR and Ditto, were formatted with sectors (usually during manufacture). The very first DC2000 drives ran from the diskette controller in a PC and operated like diskettes. So in theory they were random access, but the practical access time would few people would live long enough to use them in that manner for a significant volume of data.
Newer tape formats (SDLT, LTO Ultirum, 3590, 3570 and many others), whilst not being pre-formatted with data sectors do have a lot of servo data written to them during manufacture and if they are erased become useless. This includes servo tracking data that is used to assist in the data alignment process now that the recording densities have increased and there is little or no space left unused.
One, often unwelcome, feature of tape storage is the concept of “the last thing you wrote is the last thing you can recover”. With a hard disk each sector is uniquely addressable. If data is written to sector 79 it has no impact upon sectors 78 and 80. With tape, as soon as recording finishes the drive determines that the last thing written is the new end-of-data. So if you have a tape containing 400GB and write 2MB to the start of it, there is just under 400GB sitting on the tape that cannot be accessed without recourse to a tape data recovery service.
In Data Recovery parlance this is over-writing or re-initialisation. Don’t be fooled into thinking that there is any chance of getting the data back that has actually been over-written, that is the stuff of science fiction, but the remaining but inaccessible data can often be recovered from the tape
The advantage that tape gives is that each file is usually stored contiguously and there are none of the frailties of file allocation tables involved when accessing the data.
This is all generally true, but there are no rules. Some tape recording formats (Legato Networker, NetBackup and ARCserve amongst them) take data from multiple sources and intertwine it on the tape (sometimes known as multiplexing or multi-streaming). As said earlier there is nothing to stop the development of random-access tape, but the shape is wrong and it would never catch on.
There are, however, compromises. IBM 3570 and STK 9840 attempt to split the difference between the two styles of recording. They use a tape cassette, so the tape is on two reels within the case rather then like DLT and Ultrium where there is a single spool and the tape is transferred to a take-up reel within the drive. The “start” of tape is actually in the middle, so at load time the tape is half way from either end, and the data is stored on multiple tracks so that the drive can position across and along the tape to locate data. So a nod towards random-access and faster access time than your average tape though the time to recover data from any single file is still generally considerably longer that with disk.
Tape Storage Concepts
We can set aside the actual recording technique and put the clock back to the 9-track ½-inch open reel tape for the concepts involved in tape data storage and tape data recovery. This type of tape was predominant during the 1980’s and to an extent the drives that followed had to imitate the methodology followed in order to replace it. This means that an Ultrium drive, a DLT drive and a DAT drive all take data and give it back exactly as the open reel drive did, even though they use radically different recording formats.
With the open reel tape data was transferred to the drive as a sequence of data buffer loads named blocks. The drive would encode each of these with its own identification and error correction data, and with a gap in between each one. This inter-block gap is why you might sometimes hear people saying that they “used a larger block size to increase capacity”. On open reel tapes the gap was of a fixed size so the smaller the block size the greater the number of blocks required to store an amount of data. The greater the number of blocks, the greater the number of gaps and so capacity was lost. Then again, with older tapes the larger the block size the more chance of hitting an unusable area of tape so the whole thing was a bit hit-and-miss.
With modern drives the data block is merely what you send to the drive, and what you get back. Internally it is a matter of encoding and has little to do with how data is actually stored.
The above had a couple of exceptions, notably the earlier Exabyte 8mm helical scan drives. These split data into 1024 byte sections when writing to tape and would not share a 1024 byte storage unit between user data blocks. The consequence of this was that if you write 1025 byte blocks to a tape then each was written as 2048 bytes and the capacity of a tape was halved. There are exceptions to all rules.
So, tape drives record to theoretically blank tape, have no pre-formatting, and if you record data at the start you have lost everything that you have overwritten and anything after the point at which you stop writing.
There is nothing either right or wrong with any of this, it is just they way they are. What tape gives you is high volume, low cost per gigabyte storage that you can drop on the floor, pick up and read afterwards. Don’t try that with a hard disk and then expect to be able to easily recover your data.
File Marks aka Tape Marks
These are a sub-divider that you won’t find on a disk. A file mark is a data pattern encoded by the drive and used to allow spacing to a particular position on a tape. You want to recover data from backup set 3, well the backup software doesn’t read through backup sets 1 and 2 first, it skips file marks and then starts to read and recover data once it has found set 3.
With 4mm DAT there is an additional type of file mark named as the set mark. This allows there to be two distinct types of data marker, though only Sytos Plus, SBACKUP and a few proprietary formats ever made use of this feature.
Helical Scan drives, AIT, Exabyte and DAT, encode file marks so that they can be found during high speed search operations. Normally, as with a video recorder, the tape moves slowly during reading. It would take 2 to 3 hours to position down the tape at reading speed so they kick the drive into fast seek and can then get to the next file mark in a fraction of the time. In video terms this is a “fast-forward” and enable fast access when recovering data from the tape.
Don’t be fooled by the name though. They sound like small little markers when actually they can be several megabytes in size on some types of tape.
End of Recorded Media
When reading from a tape you might encounter a condition named “End of Recorded Media”, sometimes reported as “Blank Check”. On older drives when recording completed the drive would erase a length of tape afterwards. Subsequent reading attempts would run into this length of blank tape and know they had reached the end. Modern drives encode a data pattern, similar in size to a file mark, that denotes the end of recording. Data recovery via normal means stops at this point, there is no way past and specialist recovery methods and technology need to be employed to gain access to this lost data.
Mainframe, and some midrange, systems did not rely upon the drive reporting that the end of data had been reached but relied upon their own devices. IBM systems would encode a double file mark, HP systems used a triple file mark. These patterns denoted logical end of data.
These systems will still rely on their logical mechanism for saying “that’s it”, but the drive will still do its own thing. The moment recording stops the EOD is written and that is that without professional data recovery assistance.
Variable Block Mode
Disks are typically formatted with recordable sectors each of 512 bytes. IBM for the AS/400 use 520 or 522 bytes. Tapes, of course, have to be different.
Modern tape drives can record in either fixed block of variable block mode. This is to enable them to plug into systems that have differing pedigrees.
Mainframe systems, for example IBM 380/390 and AS/400 (OK it is not a mainframe but it behaves like one), write data in chunks that were the correct size for their purpose. The label block at the start of an IBM Labelled tape was defined as being 80 bytes long, so an 80 byte block was written to the tape. Since 80 byte blocks were not a practical proposition when dealing with open real tapes the actual data was written in larger chunks limited in size only by available memory in either the system of the tape drive formatter.
Fixed Mode Recording
Smaller systems and cheaper drives tended to deal with data pretty much as they did with disk. It did not matter how big the data was it would be written out in equal sized chunks and the drives available in this market segment obliged. The early quarter inch cartridge drives would only record data in 512 byte sections. Smaller UNIX systems and PC systems have a tradition of recording in this manner still do. The only real difference between disk and tape here is that the tape block sizes for fixed mode recording have typically extended to 64KB or higher.
Later drives have been designed to be backwards compatible with this more primitive format and with the more expensive drives that operate in variable mode, or to be plugged in as direct replacement for these drives and so can operate as either Fixed or Variable Block recording devices.
Early drives relied on skipping file marks to position along tape, but later tape devices introduced the concept of block numbering. So each tape block has a unique number starting at 0.
This partly explains why the tape block sizes used have increased over time. The SCSI specification describes the block number using 3 bytes, a maximum of 16,777,215 blocks. With 512 byte blocks this would mean that the maximum capacity of tape would be in the region of 9GB, not very helpful if writing to an 800GB Ultrium 3 data cartridge.
Three fundamental tape storage formats have developed since the late 1980s.
Although the ground between parallel and serpentine formats has closed more recently with drives having elements of both formats.
½” open reel – AKA known as 9-track parallel
The drive records 9 tracks of data at once to the tape surface. Recording begins at the physical start of the tape (PBOT) and ends at the physical end of the tape (PEOT). This format developed from the punch card idea with the eight bit byte and a parity bit. So this is one byte at a time recording.
The capacity of these tapes is tiny by today’s standards. NRZI recording format managed a staggering 23MB at 800bpi on a 2400 foot tape. In its heyday, with a massive 6250 bits-per-inch the capacity rose to an impressive 180MB.
We are all more familiar with helical scan than we might realise. It is a technology that was developed for video recording (VHS and Video8) and sound recording (DAT).
The tape is wrapped around a cylinder that contains the read and write heads. The tape moves slowly whilst the cylinder spins quickly with each rotation allowing data to be written and then read back to check (Read-after-write).
The name Helical scan springs from the patter described by the head passing along a slowly moving tape as “describing a portion of a helix”. (it is probably a more marketable name than “diagonal data”)
Exabyte Corporation took the Sony Video8 8mm recorder, added a SCSI interface and some additional checking and came up with a 2GB data storage format which was way ahead of its rivals, albeit briefly.
HP and Sony adapted developments in the audio market with 4mm media named Digital Audio Tape, added additional error correction and came up with DDS DAT. Sony later created AIT based, an 8mm helical scan format and even one of the STK mainframe drives used this technique.
The name arises from the pattern of the recording being forward and backwards for a number of tracks, apparently a bit snake-like in character (according to some imaginative marketing person).
Early drives had a pair of recording heads, one for forwards recording and one for reverse. The drive would record forwards until Physical end-od-tape (PEOT), reverse until physical beginning-of-tape (PBOT), then re-position the heads and repeat the process. Early drives recorded 4 tracks, the latest record hundreds and overlap with the parallel formats by recording several tracks simultaneously.
Equally parallel format recording drives now records along the tape forwards and then reverse so they have become almost serpentine.
In the data recovery context there is the issue that physical damage impacts multiple places in the recording since the drive passes across each area of tape. Of course this is only an issue if the tape snaps or becomes crumpled, and there is an argument as to how likely this is compared with helical scan devices which have a much more complex tape path. We have no intention of entering the affray between exponents of each style of recording.
Tape still has a major part to play in data security and the long term archival of important information. As a data recovery specialist I see both failed hard disk drives and damaged tapes, and whilst tape recovery comes with its own set of challenges that can make it a tortuous process, seldom is a tape a complete failure and the data recovery success rate is well over 95%.