error correction using hamming distances

I’ve been thinking for quite some time about error correction.  About using the structure of the MFM to our advantage.  I’ve come up with something although I haven’t implemented the idea yet.

As you know from my many references to it, , this post shows the 32 possible byte combinations from legal raw MFM.

I’ve run some numbers today.  First, there are illegal values, and then you have illegal PAIRS of MFM bytes.  For instance, while 0xAA is valid by itself, and 0x11 is valid by itself, 0xAA11 is illegal, because the combination of those would produce 4 consecutive zeros, and as we know, this is illegal.  Just like 0x55 is legal and 0x88 is legal, but 0x5588 is illegal because that would produce back-to-back 1’s, also illegal.

So as far as PAIRS go, roughly 1/3 of the total possible pairs (32*32=1024), are illegal.  That works out to be 341 pairs or so.  I was always interested in the percentage of bad pairs so we could use that info to detect bad bits/bytes.

So here’s my logic:

We need to know the data is bad, and that we fixed the data once we’ve attempted some error correction stuff.  The checksum works fine here.  The checksum uses XOR, and while its not perfect, it provides us with a decent check.

Next, we need to know which bytes are bad so we know which ones to fix.  This is also easy.  Because we know which raw values are ok, we can simply check the bytes against a table (when the checksum fails), and then we know roughly which ones are bad.  The byte before or after might be screwy, but we have the general location.

So now that we know which bytes need fixed, we need to figure out what the correct values for the bytes would be.  Enter hamming distance.  If you don’t know what that is, look here.  Long and short of it is that it provides a measurement of how many bits need flipped/changed to go from one byte to another.  In our case, it tells us that if we have a single-bit error (and this we have to guess on), and we give it the bad byte, it will tell us which LEGAL mfm bytes the byte could have been.  I wrote a small c program today that calculates the hamming distance between BAD BYTES, and legal mfm bytes.  Note this produced ~7200 lines of results.

So then what, we now have a small subset of guesses to make at the byte value.  Now, if there is just one bad byte, then its easy enough to swap in some choices, and then re-checksum the track to see if that fixes it.  If there are multiple values, this becomes much harder for me to wrap my head around programmatically wise.  I think the problem wreaks of recursion, which I’ve never been very good at.  I need to test all the possible (or probable) values in each of the appropriate locations.  This is not simple iteration, as far as I know.

So here’s some examples: (these are in decimal,not my normal hex)

Let’s say our bad byte is DEC 83.  Once again, this is illegal because of the “3” ended in binary 11.

So, the output from my program tells me that a single-bit error could have cause an 81 to look like an 83, that means bit 2 in that byte got mangled from a 0 to a 1.

But it could been an 82 as well.  So bit 1 might have been set, when it should have been cleared.

But that’s it.  Assuming that we just had a single-bit error, then those are the only two possibilities.  I think single-bit errors are the most common, but I’ll need to work on this.

OK, so now let’s look at 2-bit errors in the same byte.  If bad byte is still 83, then 17, 18, and 85 are possible choices.

What I’m really doing is intelligently narrowing the search space to valid possible choices.  And then, I could apply the PAIR logic from above, and potentially eliminate more of the choices.

I think it might make sense to check upwards of 3-bad bits of error…… After that it seems like maybe brute force might just be better/simpler…..

While this stuff is nice, if we have a byte get mangled in such a way that it produces another legal different raw mfm byte, then it might be impossible to check.  I can still apply my pair-logic to it which might help identify which byte is the problem byte.

Definitely different and excited stuff.  I’ve never seen this idea used before, at least not in the amiga floppy world.


Amateur Electronics Design Engineer and Hacker


  • Ok, so the Sun Java programming forum made light work of my task. Got two solutions, in way less than 24 hours that both work. Those guys are bright over there. Minus a couple irritating forum-twats in another thread, it was an enjoyable experience.

    One person provided a recursive solution and another an iterative solution. I don’t understand either of them completely, but I’m looking into the code.

    You know, short powerful code always scares me. I mean, when a lot is happening in the same line of code, I think it’s harder to read and understand. This doesn’t mean it’s bad — but I tend to write code that is lengthy with separate steps, temporary variables, etc that make it easy to read. I seem to make mistakes when writing smallish code. I’ve worked in a bunch of different languages, and I’m not really an expert in any of them. But most of my code, with comments, should be easy enough to be followed by someone with a basic understanding of languages — even someone who doesn’t know Java can generally understand my code.

    There’s no major obstacles preventing me from implementing new error correction into the codebase. I’ll have to find some time to do that.

  • This is a really good project, especially for those of use who have thousands of Amiga floppies.

    I have found that the Catweasel (with an ordinary PC drive) does a better job of reading Amiga floppies than a genuine Amiga. They may be implementing some tricks similar to yours.

  • Hi there,

    I just dragged my old PC 3.5 floppy disk reader project out of the garage, blew all the dust off it and got it working last night. The project is pretty crude, but I managed to read the ascii words embedded in the DOS boot sector, so it still works! When I last worked on it, I tried but failed to get it to read a 5.25 inch floppy, so maybe this time I’ll get more success. I’d also really like to get writes to work, but haven’t even tried that yet.

    I’m using a Parallax propeller to operate and read the floppy. I posted my floppy-reading code on the Parallax forums a few months ago but didn’t get any interest from anyone.

    Anyway, it’s nice to see that I’m not the only one who likes to figure out how this old equipment works! You and I emailed about a year ago, but I thought I’d check out your site again to see what progress you’ve made.

    David B

  • Hi David,

    Yes, I definitely remember trading comments with you. Welcome back.

    Glad to hear your project still works. I’ve walked away from this one for a month or so before to come back with stuff not working like it used to.

    Oh a propeller. Very neat. I have not messed with one of those, but the hub concept is very cool. I’ve got so many things I want to play with…… but not enough time (or money)

    I watch the Parallax forums every now and again — I would have definitely responded to your post if I saw it!! Link to it so I can check it out!

    Yeah, I really enjoy messing around with this stuff. I grew up with this amiga gear, and its really nice to say, “OOOOOOOOOHHHH so that’s how that works… Or so that’s why that did that….’ etc.

    As far as progress, I’m now reading floppies reliably, Java software works. Here’s an update through May 2007

    A fair bit has been changed and reworked — for the better. I now have a stable automatic-retry java client, flow control which slows the SX down if necessary, I talk with the USB driver using DLL thru a Java JNI(instead of virtual comport) , tested RX uart, and so on. Basically made everything more stable and reliable.

    I made a post/comment activity graph

    The next major steps will be adding error correction (this post), and adding writing capability. Writing will really require a commitment in time that I’m not sure if I want to make yet. Time will tell!

  • Hi,

    Here’s the link to my Parallax forum comment, which includes a picture of my homemade propeller board attached to the floppy drive, and the propeller floppy-reading code is also included.

    I’d heartily recommend the propeller. It doesn’t cost that much, and has lots of great features, not the least of which is that you can test code without having to write to EEPROM. Downloading and running modified code is so fast and painless that development goes way faster than on the SX chips

    Over the weekend I detected the MFM pulses from a 5.25 floppy, hacked around with it a bit, only to find that whatever I did broke the 3.5 floppy reading. Grrrrr…..

  • Say the XOR checksum fails and tells you something is wrong with bit 2.

    Say the “illegal values” check fails and tells you something is wrong with the 85th byte.

    Couldn’t you flip bit 2 of the 85th byte, and if the resulting byte is a legal byte value, call it good?
    I’m not seeing any “recursion” or “iteration” involved in this at all.

    (At least in the simplest case of 0 or 1 bit errors per checksummed block).

    So what did the Sun Java programming forum suggest?
    Could you give a direct link?

    — David

  • David C,

    Hello again!

    A wrong checksum doesn’t give me any information as to the location of the error, either the bit-position, or which byte is screwed up. All I know is that the data portion of the sector is not correct. Unless you know some checksum magic? All the bytes get XOR’d together, so I don’t know how you’d know which bit was screwed up…. Unless you mean that you could compare wrong checksum with stored checksum — but I think this would only work in a case where there is 1 single-bit error per sector. Is this what you mean?

    The recursion or iteration comes into play when there are multiple possible values for multiple bad bytes. You need to loop through all the possible replacements for each bad byte.