Tag - sector

1
on recent disk reading results
2
found bug last night
3
re-examining XOR data checksum used on amiga floppies
4
working FPGA version of the amiga floppy project
5
redone transfer routine
6
feasability of writing disks
7
first real world tests
8
intermittent problem found
9
great performance increases
10
SlackFest 2007

on recent disk reading results

(this was posted to Classic Computer mailing list, please disregard if you’re on the list.  I think this is an important topic)

The last two nights I’ve been busy archiving some of my Amiga floppy collection.  Most disks were written over 20 years ago.

On a sample size of about 150 floppies, most of them were perfectly readable by my homegrown usb external amiga floppy drive controller.

I paid very close attention to the failures or ones where my controller struggled.

Without sounding too obvious here, the time between the pulses (which more or less define the data) were grossly out of spec.  The DD pulses should nominally be 4us, 6us, and 8us apart before pre-write compensation.  Most good disks are slightly faster, and normal times for these ranges are:

4us: 3.2-4.2us.  Many around 3.75us
6us: 5.5-6.2us.
8us: 7.5-8.2us

(notice margins around 1-1.3us)

My original microcontroller implementation was 3.2-4.2, 5.2-6.2, and 7.2-8.2.

When my current FPGA controller would have a problem, I’d notice that there were problems right on a boundary.  So maybe pulses were coming in at 3.1us apart instead of 3.2.  Or maybe 4.3 instead of 4.2.  So I kept bumping the intervals apart, making a larger range of pulse times acceptable — the XOR sector checksums were passing, so I was likely making the right choices.  The bits were ending up in the right buckets.

But as I went through some of these disks, I ended up with the difference between ranges(and basically my noise margin) being reduced smaller and smaller.  Some to the point where an incoming pulse time might fall darn smack in the middle of the noise margin.  Which bucket does THAT one go into?

My approach has been very successful(easily 95%+), but it makes me wonder about Phil’s DiscFerret dynamic adaptive approach where a sample of the incoming data defines the ranges.

Some disk drives and controllers might be faster or slower than others, and if you create custom ranges for each disk (each track?), perhaps you’ll have better luck.

found bug last night

With this new FPGA solution, certain tracks would result in what I call a “short read.”  A short read is any received track that contains less than 28,125 delta T’s, aka pulse times.  Given a certain capture time, there are minimum’s and maximum’s of the number of pulses on an amiga track.

If we have a 300 rpm drive, then its 60s/300 = 200ms per revolution.  If the bitcells are 2us wide, then you have at most 200ms/2us = 100,000 bit cells.  The one’s density is at max 50%(raw MFM ’10’), so this means every other cell would contain a 1. So 50,000 pulses, so 50,000 delta T’s.  The minimum one’s density is 25%(raw MFM ‘1000’), so 25,000 pulses.  Now we read more than just one revolution of data, because we will very likely start reading in the middle of a sector.  So instead of 11 sectors worth of read time, we actually need to read 12 sectors worth, to ensure we read the entire sector in which we started.  This is 218.2ms of time minimum.  We could potentially re-assemble data, using some type of circular buffer, but this is more trouble than it’s worth.  I currently read 225ms of data.

225ms / 2us = 56,250 maximum, 28,125 minimum.

I had my FTDI chip, for the usb<->ttl converter, D2XX USB parameters setting the USB transfer size to 57000 bytes.  This is definitely over and above what was needed.  Or so I thought.

I bumped the transfer size from 57000 to 60032 (docs said specifically 64 byte multiples), and everything started working.  I had already narrowed it down that the problem tracks were ones that had a high density, where there were lots and lots of pulses.  So I knew the size of the track was related.  I checked for FIFO overflow, and it wasn’t overflowing.

I’ve got to look when I have a free second, but I think my USB packet size is 4096 bytes.  So 56250+4096 (some amount of padding?) = 60346.   Uh-o, I better bump that to 60,352.  I think the driver (or windows?) that maxes out at 64k transfer size, so I still have a little wiggle room.

Long and short is that it appears to be working much better.  I was glad to find this bug with just a little brainstorming, and getting better visibility into my actual pulses count on the FPGA.

re-examining XOR data checksum used on amiga floppies

So I was wondering exactly how effective the XOR checksum that Commodore used for the amiga sectors was.  If I read a sector, and perform the checksum, and the checksum matches the stored checksum, then the data is the same, right?  Not always.  I had expected it to be better, maybe not MD5 or SHA-1 level, but I expected it to be decent.

I had run into this paper a year ago when I was looking at my transfer checksums.  But this certainly also applies to the amiga sectors, too.

Some good excerpts:

  • the percentage of undetected two-bit
    errors becomes approximately 1/k (k being checksum size), which is rather poor performance for
    a checksum (i.e., 12.5% undetected errors for an 8-bit checksum, and 3.125% for a 32-bit checksum).
  • The XOR checksum has the highest probability
    of undetected errors for all checksum algorithms in this study…..
  • There is generally no reason to continue the common practice of
    using an XOR checksum in new designs, because it has the same software computational cost as an
    addition-based checksum but is only about half as effective at detecting errors.

So I wanted to actually prove myself that XOR is that bad, looking at it from the amiga sector perspective, so I wrote a quick program to:

  1. Create a random block of data 512 bytes long
  2. Calculate the 32-bit XOR checksum based on 32-bit chunks at a time and store it as the last byte (as does the amiga)
  3. Select a number of bit inversions(basically corrupt the data in a specific way) which can affect data and/or stored checksum
  4. Recalculate the checksum of the modified block
  5. If the checksums MATCH, meaning that two different sets of data yield the same checksum, then this is an undetected error.
  6. Count the number of undetected errors vs total errors.

That paper lists the properties of an XOR checksum, and I wanted to compare results:

  • Detects all single-bit errors.  (my testing confirms)
  • Fails to detect some two-bit errors. (my testing confirms, see below)
  • If total number of bit errors is odd, XOR checksum detects it. (confirmed with 1,3,5,7 bit errors)
  • Fails to detect any even number of bit errors that occur in the same bit position of the checksum computational block. (confirmed with 2,4,6,8)

The two-bit errors is really the case that I worry about.  If two bit errors occur in the same bit position, and the inverted oppositely(1—>0 and 0—>1), then it won’t be detected.  So how often does this happen with random data?  Paper’s author, Maxino, says 3.125% of the time.  I can definitely confirm this.  My testing shows 1.8 million hits over 56 million tries.   There are some differences with Amiga data, but I think the results can be the same.  I might load some example amiga sectors and try them.

Once the number of bit errors increase, the probability of them happening in the same bit position, in the same direction, and occurring evenly on each position goes down —- and therefore the overall chance of undetection decreases as well.

4 bit undetected errors happen 0.3% of the time.

6 bit undetected errors happen 0.04% of the time.

8 bit undetected errors happen 0.009% of the time.

I have other testing to do, including running burst error tests.

So for each sector, you really don’t want to see only two bit errors occur.  A single bit error, sure.  Multiple bit errors, ok, because we can detect them.  You don’t want to think you have a good copy when it fact the checksum passing was a false positive.


working FPGA version of the amiga floppy project

So, I’ve been working on the FPGA version of the amiga floppy project for some time.  I just recently had a fair bit of free time, and so everything came together rather quickly!

I’m now able to read amiga floppy disks in using the same Java client software I had developed for use with the Parallax SX microcontroller board.  There were a few minor changes in the software — most notably the read data routine from the hardware.

I’ve written the code in Verilog on a Xilinx Spartan-3e evaluation board.

The various hardware parts I’ve described:

  • UART: Written from scratch, a transmitter and a receiver.   Simple to use, variable baud rates.
  • FIFO: Generated from Xilinx’s CoreGen. This connects the floppy interface to the USB interface. 32k bytes
  • FSM to always empty the FIFO to the PC.  Once something goes in the FIFO, it immediately gets sent to the PC
  • Read-floppy-FSM: Stores 225ms of Delta T’s (aka time between edges) as 8-bit integers into the FIFO.
  • Command FSM: Receives single-character commands from the java software to execute (R for read, U for upper head, L for lower head, etc)
  • Transmit test pattern routine: Sends 32k of data to the PC to test for reliable communication

A couple advantages with the FPGA solution:

  • We transmit the data to the PC as soon as it’s available.  I want to characterize the actual latency, but it should be pretty small.  This is different from my load->FRAM, and then FRAM-> PC method.  This method should be much faster and we’re just not idling for 225ms.
  • Instead of transmitting the bit-sync’d raw MFM to the PC, I’m sending the delta T’s.  While this requires a little more processing on PC, the PC can more easily determine why a particularly sector can’t be read.  For instance, is the time between pulses too small? Too large?  On a fringe edge?  Plus, since the Java decodes these, I can now add sliders for “acceptable delta T’s” for each 4, 6, or 8us windows.  Before that would require modifying the firmware on the microcontroller.  I can also start to do statistical analysis on the pulse times.

I am currently doing about 430ms per track.  This sucks.  I was beating that by 100ms with my microcontroller.  However, the problem is that because a variable amount of data is going to the PC, the PC receiving code does not know when exactly to stop receiving, so there’s a wait-timer which I have to optimize.  Once I receive the minimum amount of data, I wait 100ms since the last received data, and then exit.  I’ve got to put my logic analyzers in place and figure out how to optimize it.

Denis@h3q can read a disk in 43s, which is pretty darn good.  He is using tokens like I mentioned here and here and here.  I’m transferring much more data though, which gives me more information.  I like his time, and maybe that would be a nice goal to beat!  Wow. That’s only 269ms/track.  Hrrrmm…. that’s pretty good.

redone transfer routine

So my transfer routine has been a little flaky lately. I’m not sure exactly why but I think it’s related to the number of processes I’m running. While it’s a P4 2.4, I think USB scheduling is dead last on the priority list, because it sure seems that way. My transfer protocol is pretty simple right now, or almost completely non-existent. After the PC sends a R for ReadDisk, the SX starts spewing data. There is no handshake, no acknowledgement of data — but there is a checksum. And that checksum has been failing. While I do initiate auto-retransmit, its slow and clumsy.

So tonight, I implemented an XMODEM-like protocol. Sender and receiver synchronize the start, each block is acknowledged, a NAK causes a retransmit of a block, instead of the whole track, and so on. and overall it works pretty good except for one thing. IT’S SLOW. How slow? About 2.1s per track. Yuck. At my high point with the other transfer, I was around 313-328ms per track. So like 6 times faster.

That’s way too slow. There’s a lot of back and forth with this protocol, with forced acknowledgment before the next block is transferred. The wikipedia page on Xmodem said that was one of the reasons for its replacement with other protocols like Ymodem and Zmodem.

Incidentally, I grew up on these serial based protocols, and used everything from Xmodem to Kermit to Ymodem, CIS B+, etc on BBS’s. ZModem with it’s resume feature was really tits on a stick back in the day.

Part of my problem is block size. I’m actually using 32-byte blocks because I don’t have enough memory on my SX. So that’s 374 blocks per track. 374 headers, 374 ACK’s.

34 blocks per sector, 172ms per sector. Just way way way too slow.

Since before I read from FRAM directly to USB, there was effectively no easy way to retransmit, because you can’t just backup using serial memory. And I don’t actually keep track, use, seek, of any fram byte locations. I don’t need to — I write one bit at a time, and I read back one byte at a time — but always in order, and always from start to end. The way I retransmit now is to re-read the track into memory again, and start the process all over again. In the past this worked for the few times I needed, but it seems that for whatever reason (maybe new FTDI drivers?) I’m dropping much more regularly now.

This xmodem method isn’t really 100% polished yet, but given these times I think it’s unlikely I’m going to. Gotta come up with a better method. Some in-between. Maybe some FRAM re-positioning routines?

Dunno.

I’ve love to hear what YOU think.

Thanks

feasability of writing disks

While there are some other bug fixes and documentation to be done on what’s already been implemented, I started thinking about writing ADF’s to disk over the last few days.

While the hardware part of it is already in place, there are some things that would need done:

  • The interrupt service routine would need modified to not just read data by reacting to edges, but now extract a bit from memory and put an appropriate pulse on the writedata pin. Floppy drive specs say the pulse should be 0.1us to 1.1us wide.
  • Write an SX routine that would receive data from the PC and store it in the fram. This would need to checksum the received data and either error out or request a retransmit.
  • Write PC routines that would create the full MFM track: I’d need to create a header (easy enough, use sector numbers, 11-sector number, track number, and then checksum everything), then MFM encode the data. I’m already doing much of this in the decode portion, so I can basically do the opposite for encoding.
  • Of course there’ll need to be a “controlling” pc routine, much like my current readtrack() routine.

So the whole process of writing a disk would go something like this:

  1. Have user browse for the filename, select the file, load it into a “floppydisk” structure/object that currently exists.
  2. Rewind and make sure I’m at track 0.
  3. Create the first track’s data.
  4. Send a command to the SX to tell it to receive a track’s worth of data.
  5. Send the data making sure it’s been received correctly.
  6. Tell the SX to write the track.
  7. SX enables the write gate by lowering it, and starts pulsing out data, pausing the appropriate times for 0-bits.

I don’t see any major hangups, although there are a few timing related things to get right. I’ve got make sure 18ms has passed after each step pulse. And I’ve got to make sure 100us has passed since I’ve selected the drive(this is mainly during setup for the first trak). For the last track, I need to make sure I pause for 650us after the last track is written. I also have to make sure that the time from the write gate dropping to the first bit MUST be 8us or less. Same with the final bit, I have to raise that write-gate within 8us after the last pulse.

I’ve got to look into creating a gap, sending SYNC words, learning wtf pre-write compensation is, etc.

first real world tests

Well I tried to read about 70 disks in via my project tonight. I had mixed results. First, the Teac loves HD media but doesn’t fare as well with LD media. The Samsung hates HD media, but does very well with LD media. There are huge differences in the results according to drive. I think users that find that their disks aren’t readable should try switching floppy drive manufacturers/models. I have several different drives I still want to test.

The good news is that the program didn’t blow up or crash, the hardware didn’t die, and nothing locked up— even when faced with error conditions. Everything worked as expected.

I don’t have exact results YET, but I was able to retrieve roughly 50 images out of the 70. Now some were actually not attempted because I’m tired and skipped them. Some were attempted by the Teac but I should have used the Samsung and vice versa. Some I tried both drives on. I’ve got some more tests to run on this batch.

I can tell you there is a big difference between quality brand-name media and the cheap stuff. The cheaper-looking/feeling the disk was, the less chance my project had of reading it.

NOT ONE FLOPPY MADE BY A BRAND NAME (SONY, MAXELL, OR FUJI) FAILED TO READ.

This is really a testament to quality, and they should be proud. My poorness/cheapness of youth has come full circle to bite me in the ass.

I’m interested in seeing what the real amiga and amiga drive can do with these disks. My bet, is, since the Amiga wrote those disks on the exact same floppy drive, that it will also be able to read shaky disks.

Also, I noticed a lot of single-sector errors on the tracks. My project repeatedly tries to read a track until it can get a good copy, it continuously retries until stopped. What this means is that as soon as I hit a real solid error, my software stopped. BUT, for the ones I noticed, it looked like just a single-sector was the problem. And my bet, even further, that there was only a couple, or 1 bad bit in that sector. I’m working on improving how my software reads disks to make it more robust.

50/70 isn’t horrible. 70% + or so on disks that are 10-20 years old.

more on results over the next couple days…..

EDIT in JUNE 2010: You really need to see the post here, I’m able to read the majority of the disks that were previously unreadable in my newer FPGA solution.

intermittent problem found

I was just putting some finishing touches on this before I started to actually use and archive some of my stuff.  To put it to an actual real world test.

Anyways, I tested an ADF generated ADF  against an amiga transdisk ADF, and it failed.  Turned out to be different by exactly one sector of all zeros in MY file versus the good file.

I tracked it down to a crappily written readtrack() routine where if I get a bad sector, but then at least one good sector after it, then the thing never retries.  As a crappy patch, I check all sectors at the end of a track read and make sure there is good data in them.  But this sucks.  Even though I want and will have the same check in place, my read routine should be much much cleaner.  I’m going to rewrite the routine.

Originally, I thought it was a data-size problem cropping up again, but I was wrong…..

great performance increases

Today I’ve worked on optimizing the performance of the AFP reader
and have made great strides.

My original track times were nearly 475ms. This was 300ms reading the actual disk, 125ms transferring, and process times of around 50ms. This put the total disk time, not including overhead of saving the ADF which adds a half a dozen seconds, to about 76s. That’s a 1:16. Not horrible, but not great.

I’ve done a couple things:

  • Reduced the number of bytes read off the disk from 15232 (14 tracks) to 14055(11 tracks + one byte short of a full sector (worst case) 1087 + 1000 byte gap(slightly bigger than I need)
  • This in turn reduced the time needed to actual read the disk, from a “forced-wait”(ie my code used to just do a Thread.sleep() pause, from 300ms to 220ms (actual disk time, now)
  • Less data read means less transferred, and this moved my transferred bytes from ~15238 to ~11,977. Smaller transfer is now between 94ms-110ms depending on USB scheduling.

SO this means that I’m now running actual times of about 370ms total. This is 220ms (diskread) + 102ms (average usb xfer time) + ~50ms PC processing time. A 370ms track read translates to 59.2 seconds full disk read!!

With the overhead of the floppy disk drive warming up, and saving the final ADF to the HD, this translates to an actual time of approximately 65 seconds.

My target speed for this has always been 1 minute per disk(Jan 07 post).

While there are at least a few other ways to optimize, most of them either make the code too complex, or do things in such a non-standard way that implementing them would just make maintaining and growing this code harder in the future.

The bulk of the time right now is made up by the untouchable base of 220ms. This time will never be reduced, and makes up about 60% of time I’m spending. On certain fast reads, it makes up nearly 67% of the time I’m seeing. So between 3/5 and 2/3 of the time can’t be touched.

I’m happy with this performance and won’t be addressing it again for some time. I have to move on to improving the user interface, fixing the terminal mode so it works again(after jd2xx change), reporting with and dealing with errors within the user interface (instead of standard output, where everything goes right now), and adding some odds and ends.

SlackFest 2007

Ok, I’ll admit it.  I’ve been slacking like the best of them.

I’m not horribly unsatisified with my progress, especially this year.  The first four months of this year I achieved the following:

* Whole first ADF written January 29th
* Much cleaner hardware, single circuit board February 14th

* Data is now byte-sync’d, transferred sectors in order Februrary 17th

* Working Java client March 26th
As far as next steps go, I’ve got to

1> get error correction and retries working in the java client

2> clean up the gui and set up a “save config file” option, selecting serial ports, etc

3> clean up the java code

4> start testing it under Ubuntu

I’m very interested in error recovery, and I have been putting a decent amount of thought into it.  I’m really fascinated by the fact that Amiga floppy disks have tons and tons of structure.  First, the MFM is output in such a precise and predictable way, ie we know that no two 1’s are found back to back, no more than three consecutive zeros, etc.  We also know about the way clock bits are inserted.  And because of this fact, only certain bytes are possible when made up of these certain bit patterns.  In fact, only 32 are possible.  With the help of a checksum, I think it would be possible to brute-force a certain number of bad bytes.  Now I do think that the checksum is particularly unhelpful for this (XOR, I think), something like MD5 would be da bomb.  No false chance of duplicates yielding the same checksum.  I don’t understand or appreciate the (math-based) ramifications of using XOR though.  It’s certainly used everyplace within encryption, so this is might be better than I think.

My read routines are no longer raw, although I’m probably going to go back and add a raw read function.

I’ve tossed around the idea of simplifying this project even further by eliminating the FRAM memory and adding real-time functionality which I know the emulator crowd wants.  This is further down the line, and honestly, I don’t even know if its possible (or at least, not sure how I would do it)

I’ve still been managing between 12,000 and 20,000 hits per month (about 10,000 unique visitors), and so I really appreciate the continued interest people have shown.  This is really a fun project, and I’m looking forward to expanding it and making it better and more useful.

Thanks