techtravels.org

losing data

I’ve been thinking more and more about how exactly I’ve been losing data.

It’s alot of data. The worst case I’ve seen is about 140 bytes. This is 1120 bits, or 2240us. This is a LONG time for my SX to really be doing nothing — especially during an active transfer — this is 2.24 milliseconds!

This means my clock is rolling over 1120 times with no activity.

The only thing I can think this means is that I’m not seeing any edges within that timeframe. But why not? What’s different about this dead-period than before and after?

And why does this almost always happen DURING the data-portion of the sector, and not usually during the sector header? The sector headers almost always checksum right.

And if this isn’t obvious, the sector headers are almost always the perfect length. So we don’t drop bits during a sector header. Now not all sector headers are perfect, but 95% of the time, they are. So the length is perfect, and the checksum is perfect.

It sounds like something is losing sync, but the sync occurs with every edge, not just the ones at the beginning of the sector.

The header is 32 bytes, 64 RAW MFM bytes. So this is 512 mfm bits without error. Without dropping a bit, or adding a bit. Or losing data.

This is the question of the week.

keith

Amateur Electronics Design Engineer and Hacker

7 comments

  • Header is 64 bytes, and data is another 1024 bytes. I’d say that chances you miss something in data zone are much higher, like 16:1 or something (I am not good at maths), as opposed to header zone.

    You can’t say for sure you miss that big piece of data at once, continuously in one sector, can you?

    Could it also be that your SX turns deaf (sp?) to edges? like some global interrupt disable flag or instruction in effect.

  • Right on the sizes.

    Unencoded: 32 sector header + 512 data = 544 total bytes
    MFM: 544 * 2 = 1088 raw bytes between sectors

    I have no idea if I’m missing just one big chunk out of each sector, or if I’m dropping one bit here, one bit there type of thing. I guess the latter is more realistic….

    I do know which sector is missing how much data, as I measure the offset of each sector header. Some of them are coming close, like 1087 bytes or 1084 bytes…. This plus or minus 3 or 4 bytes I’m not terribly worried about it (yet). These are close enough that some minor tweaks can probably pull those in. It’s when I start getting 30 or 40 or 140 bytes out of whack then I know somethings really wrong.

    I was thinking about something globally like that too. Like perhaps the SX resets, or something happens to the interrupts, but I haven’t seen anything like that happen. I use the debugger all the time now, and the debugger would go idle (and not run) if it reset.

    I don’t disable the interrupts at any point, at least manually, so unless some other instruction has some side effect I don’t know about, etc

    You know, I was thinking that perhaps the shorter-sized groups are throwing wrenches into the works. Like perhaps I’m not dealing with them properly. Something of them were really short, like 7.5us instead of 8, etc. Maybe my timing breaks down with these shorter/faster groups?

  • Maybe your timing indeed breaks on those shorter groups. Remember the spindle speed is not stable, and can deviate by some percent. Let it be +/-5%, then 8us will turn into 7.6us if the spindle goes faster, or into 8.4us if it slows down a bit. Certainly the deviation can be much less, or much bigger than these 5%.

    How will your code behave if it sees 7.6us or 8.4us?

  • That’s a very good question. Since I’m now “sampling” so early in the bitcell, I don’t think its much of an issue for short cells anyways.

    Currently my timing for a “1000”, ie a 8us grouping :

    real edge happens
    (60ns later) edge detected
    (2.26us) first rollover for first ‘0’
    (4.26us) second rollover for second ‘0’
    (6.26us) third rollover for third ‘0’

    Although I haven’t accurately measured, I think my worst case ISRs are now in the 800ns range, but if that was the case, I might be finishing nearby 7us or so. Well below the minimum 7.5us I’ve seen.

    Note I haven’t seen even a single group with a cycle longer than 4us, 6us, or 8us. They are always shorter. If they were longer, they might indeed cause a problem. Let’s say this is a 6us grouping, if it turned out to be 6.30us, then I’d record an additional zero because my rollover would occur again before the next edge. I’d also miss the next edge because I’d already be in the ISR when the edge-trigger would happen. And missing this edge would cause the following zeros to be ignored until it sees another new edge.

    However, I’ve never actually SEEN a long group before. And I’ve seen a LOT of cells…… but 5% (.3 on 6us) really isn’t that much. And it’s a spinning motor. I bet that Commodore specified max. tolerances on it. And I think thats in the books I have. In any case, I think I have to deal with this tolerance issue. If I could do something like 10-15% or something in both directions, that would be ideal.

    I think I’m fine on the short side. As long as they are

    2.26 + .8 = 3.06 for a 4us
    4.26 + .8 = 5.06 for a 6us
    6.26 + .8 = 7.06 for a 8us

    That’s pretty good. Currently the longest I can handle is approximately
    (ie next edge MUST FIRE before next rollover)
    4.25 for a 4us
    6.25 for a 6us
    8.25 for a 8us

    I guess I could trade-off and move the sample point. This would yield

    2.60 + .8 = 3.4(15%)
    4.60 + .8 = 5.4(10%)
    6.60 + .8 = 7.4(9%)

    for the shortest and the longest would be

    4.59 for 4us (15%)
    6.59 for 6us (10%)
    8.59 for 8us (7.5%)

    In the worst case, this gives main no time at all to process in the same cell, but with the double-buffering in place in the ISR, hopefully it can get to it within the next 8 bits….

    Just thinking out loud. Sorry for my babbling.

  • I made the change today with no noticable effects. I really wish I had a way of automatically comparing the data between DiskMonTools and the output of my SX. I was comparing byte for byte by hand, but thats really impossible with 1088 bytes.

    One sector in particular was 1085 bytes, very close. I wanted to see which bytes were skipped, and perhaps figure out why.

    Most amiga programs won’t output MFM though, on the screen, or let alone in a file. I tried taking a picture and OCR’ing it, but that didnt work for crap.

    I really need to be able to read a raw sector in MFM on the amiga, and save the contents in a file.

  • Random thoughts. Maybe you will be lucky and one of them will hit something.

    * “… just one big chunk out of each sector, or … one bit here, one bit there type of thing.” If you had a free-running timer, maybe you could hack the main() routine to send a time byte *instead* of (or perhaps in addition to) the data byte. I’d expect all the bytes of a sector to be fairly evenly spaced. Then you could tell the difference between many time bytes that look suspiciously like 1 MFM bit too long, vs. a couple of really huge time bytes.

    * I had one project that was squirting short blips of data to a desktop computer at the fastest possible BPS rate. It worked fine as long as I kept each blip less than 16 bytes (the size of the serial hardware buffer); any longer and *sometimes* it would work, other times the OS would be “busy” doing something else and lose bytes. Is there a quick way to make sure you aren’t seeing this ? (I’m pretty sure the serial hardware on the PC has an “overflow” flag, but perhaps there’s a simpler way to test this). If this is happening, you’re going to have to buffer up at least a sector (perhaps a track would be simpler, if you have the RAM) and spoon it to the PC with some sort of “ack”/”send again”. (Again, double-buffering — you can spoon one sector/track to the PC, while reading another sector/track into the other buffer).

    * Why in the world do you have *2* interrupt sources — one on edges, and one on the timer ? Why don’t you just have *1* interrupt source — on the edges ? (Just let the timer run freely, not enabling its interrupt). It looks to me like you don’t even need *both* edges — just trigger on whichever edge looks crisper (the falling edge, from your photographs). (Perhaps your debug code could give 2, 3, or 4 short blinks after each edge to indicate what it found just *before* that edge … not quite as pretty as the way you’re doing it now, though).

    At the beginning of the interrupt-on-edge, you would snag the current timer value and store it somewhere — say, NEW.
    Then you calculate NEW – OLD to find out how wide the last gap was, and somehow translate that number into MFM bits.
    Near the end of the interrupt, you move the value from NEW into OLD to prepare for the next interrupt.

    Note: never reset the timer. Use RETI, not RETIW. If the timer has been clocked 20 times between OLD and NEW, then NEW – OLD is *always* 20, even if OLD was 0xFF.

    I see that using *only* the timer interrupt has already been discussed
    ( https://techtravels.org/?p=39 ),
    but I’m suprised no one has suggested *only* the edge interrupt.

    * Sometime in the distant future you might consider “auto-tuning” — keep track of the widths of the MFM blips, and dynamically set the 2 decision values halfway between the neighboring types. (There are 3 widths of MFM blip, right ?). Then as long as your disk is spinning relatively constantly during a sector, it makes no difference exactly what that speed is — it could be *double* or *half* the normal speed. (The Apple II floppy disks had “sync bytes” at the beginning of every sector, for this reason).

    * ”… [sometimes] I go ahead and xfer it to the PC in the ISR.” Have you considered *always* sending the byte in the ISR, and making main() into a do-nothing loop ? Just because you *have* a background task doesn’t mean it has to *do* anything.

    https://www.piclist.com/techref/scenix/sxints.htm

  • All good ideas…..

    I think the PC is fast enough. If you go back to some older posts like (https://techtravels.org/?p=14), then you’ll see where I used to have a minimal ack-required protocol on the PC side. I got rid of that for a simple read-on-rising edge, with no ack from the PC. The main reason is that this extra logic required too much time on the PC side — really the PC is SLOW in comparison to the SX in detecting pin chages etc. I was able to really reduce the amount of code, and make the PC much more efficient. I tested this with a small test program on the SX which just streamed 00-FF from the SX to the PC, and the PC didn’t drop bytes until I got to over twice the speed —- easily in excess of 1 mbps.

    I’m using the parallel port here, not the serial port, so I’m not sure what buffers etc would apply here. I’ve done excessive testing with the PC, and I’m really confident I’m not dropping bytes. I make sure the byteready lead is low when I have 4-bits(at the 8us mark) stored, and then raise it immediately upon getting 8-bits(at the 16us mark), so it’s easy to find edge.

    I also scoped out the byteready lead out to, I dunno 10 or 20us per division, and I’m seeing a very consistent 4-bit wide highs and lows. I can get a large number of them onscreen, held on the scope, and so they look good.

    The current program is based on your double buffer idea. When the ISR gets to 8 bits, no matter whether due to an edge or a high, the ISR stores the shift reg byte into a temp variable, and notifies main by way of another single-bit variable which main monitors in a super tight loop. This way main has plenty of time to get the byte out. Double-buffering at it’s finest! 🙂 I did something similar earlier, and just put it back. I’ve checked this to make sure I’m not overflowing, and that condition I saw earlier was gone. I really wanted to figure out WHY I was seeing that, but forward progress is more important.

    I do like your edge-interrupt only idea, and I’m going to put this into action shortly. I have to play around with the RTCC without interrupt, and probably use the prescaler — but I’m not 100% sure what I’m doing yet. I get the general idea though and like it.

    Auto-tuning is a great idea, but I’ll leave that for further optimization once I get the basic thing working! 🙂