Tag - pulse

status updates
on floppy adpll, this time, my solution
motor speed variation tests
on recent disk reading results
more real world tests
found bug last night
characterizing speed performance of floppy drive controller
working FPGA version of the amiga floppy project
Finally had some luck with FPGA board
sleep inversely proportional to the AFP and no more tokens

status updates

I’ve been more active on the project than the posts have illustrated. I’ve been working on a few different things :

Been trying different PLL solutions, examining how Paula does it, and actually analyzing my read data with Matlab to get a graphical view of the “problem.”  Most of my solutions, while they technically work (ie they have the desired behavior), they don’t actually solve the problem at hand.  I have to get a better hand on what “bad data” looks like, and then try to invent solutions that address the specific problems I’ve seen. I’m also using Paula as one metric of “goodness.”  In my mind, my controller should be better — should read disks that the amiga is not able to.

I don’t think I fully understand the problem.  I know what the SYMPTOMS are — flux transitions are read sometimes as being “in the middle.”, 5us between pulses where 4us=’10’ and 6us=’100′. What the heck does 5us mean?  How do we bias this towards the right direction?  Many of the controllers use some sort of PLL — but I see values one after another that span the acceptable range of values.  You can’t track something that doesn’t trend!

I also want to get a better handle on how Paula works.  I’ve got it hooked back up to the logic analyzer, and been trading messages on English Amiga Board about disk DMA and the like.  I’d like to do automatic comparisons between the output of Paula and my project and see when Paula gets it right, but I get it wrong!

on floppy adpll, this time, my solution

This isn’t by any stretch finished, but it does do what I’ve expected it to do.

It responds both to differences in phase (counter is set to 0), and differences in frequency (period is adjusted in the direction of the frequency error)

I did this at 3am last night, so there could be a couple bugs in there.

With all this being said, I’m not entirely sure that PLLs are actually required for good reading of amiga floppy disks. My regular “static interval” solution works about 95%-98% of the time. I’m going to come up with a list of problem disks and see if this solution works better/worse/otherwise.

I’ve used a read data window of 1us, which starts being centered on 2us, and is automatically adjusted as data comes in. This produces windows around 2us, 4us, 6us, 8us. I output the overall error which is the deviation from the center of the window as each pulse is received. I’d like to graph this error, but it doesn’t look like Xilinx’s iSim will export a particular output as CSV or whatever.

module floppy_pll(
    input clk,
    input floppy_data,
     input reset,
    output reg window_open,
    output reg [7:0] data,
    output reg strobe,
     output reg [7:0] error

// window is 1us wide
// starts at .5us before counter rollover
// ends .5us after counter rollover
// ideally, edges should be arriving right when the counter rolls over

reg [7:0] period = 100;
reg [10:0] counter = 0;

reg IN_D1, IN_D2;
wire out_negedge = ~IN_D1 & IN_D2;

always@(posedge clk or posedge out_negedge) begin

    if (reset) period <= 100;
    if (clk) begin
        counter <= counter + 1;
        if (counter == period) counter <= 0;
        if ( (counter > (period-25)) || (counter < 25) ) window_open <= 1;
        else window_open <= 0;
    if (out_negedge) begin
        // if counter == 0 and we see out_negedge be positive, then we are perfectly aligned
        // so we dont need to adjust anything
        if (counter != 0) begin
            if (window_open) begin
                counter <= 0; // align counter to the phase of the incoming signal
                if (counter < 25) begin
                    //we rolled over before pulse was seen, so make period larger
                    //error values will be over 128
                    period <= period + 1;
                    error <= (128 + counter);
                if (counter > (period-25)) begin
                    //we haven't rolled when we saw the pulse, so make period shorter
                    //error values will be less than 128
                    period <= period - 1;
                    error <= 128 - (period-counter);

    // edge detection flops for data in direct from floppy
    IN_D1 <= floppy_data;
    IN_D2 <= IN_D1;


motor speed variation tests

I collected roughly 1,000 index pulses from the Sony MPF920-E with one common floppy inserted.

The motor speed variation was very very small.  I am impressed as to the accuracy.

Across 967 index pulses, all were within 56 microseconds of each other.  The average was 200.487ms.  Most of the group were within 20 microseconds of the average.

The standard deviation is 9.46 microseconds.

on recent disk reading results

(this was posted to Classic Computer mailing list, please disregard if you’re on the list.  I think this is an important topic)

The last two nights I’ve been busy archiving some of my Amiga floppy collection.  Most disks were written over 20 years ago.

On a sample size of about 150 floppies, most of them were perfectly readable by my homegrown usb external amiga floppy drive controller.

I paid very close attention to the failures or ones where my controller struggled.

Without sounding too obvious here, the time between the pulses (which more or less define the data) were grossly out of spec.  The DD pulses should nominally be 4us, 6us, and 8us apart before pre-write compensation.  Most good disks are slightly faster, and normal times for these ranges are:

4us: 3.2-4.2us.  Many around 3.75us
6us: 5.5-6.2us.
8us: 7.5-8.2us

(notice margins around 1-1.3us)

My original microcontroller implementation was 3.2-4.2, 5.2-6.2, and 7.2-8.2.

When my current FPGA controller would have a problem, I’d notice that there were problems right on a boundary.  So maybe pulses were coming in at 3.1us apart instead of 3.2.  Or maybe 4.3 instead of 4.2.  So I kept bumping the intervals apart, making a larger range of pulse times acceptable — the XOR sector checksums were passing, so I was likely making the right choices.  The bits were ending up in the right buckets.

But as I went through some of these disks, I ended up with the difference between ranges(and basically my noise margin) being reduced smaller and smaller.  Some to the point where an incoming pulse time might fall darn smack in the middle of the noise margin.  Which bucket does THAT one go into?

My approach has been very successful(easily 95%+), but it makes me wonder about Phil’s DiscFerret dynamic adaptive approach where a sample of the incoming data defines the ranges.

Some disk drives and controllers might be faster or slower than others, and if you create custom ranges for each disk (each track?), perhaps you’ll have better luck.

more real world tests

So tonight, I put my new FPGA implementation of the amiga floppy project to good use.  I read some more of my collection of amiga floppies.


It’s working like a champ.  As a matter of fact, I selected disks which I could not previously read with my microcontroller solution.  I could read 90% of the previously unreadable ones.  Most of the unsolvable problems were related with HD disks (which, based on my earlier posts, some drives handle better than others)  Note this is just temporary until I try other drivers to read the disks — and try covering the HD hole.

I have better visibility on the PC as to the “problem” delta T’s.  So pulses that are slightly too far apart, just on the boundary of what I consider to be valid, if I adjust my boundary accordingly, and now consider them valid, everything is peachy-keen.  I want to add a real-time option in my java client to allow this to be adjusted on the fly.  See problems? Adjust the slider, and problems go away.  Pretty neat.

I didn’t have this visibility when the microcontroller was interpreting the delta T’s.  The microcontroller had no easy feedback method to tell me what was happening.  Having high-level debugging of the information on the PC makes this all possible.

Nice to see the software purring.  There is still plenty of improvements to make, usability to be improved, etc.  But its working very nicely.

found bug last night

With this new FPGA solution, certain tracks would result in what I call a “short read.”  A short read is any received track that contains less than 28,125 delta T’s, aka pulse times.  Given a certain capture time, there are minimum’s and maximum’s of the number of pulses on an amiga track.

If we have a 300 rpm drive, then its 60s/300 = 200ms per revolution.  If the bitcells are 2us wide, then you have at most 200ms/2us = 100,000 bit cells.  The one’s density is at max 50%(raw MFM ’10’), so this means every other cell would contain a 1. So 50,000 pulses, so 50,000 delta T’s.  The minimum one’s density is 25%(raw MFM ‘1000’), so 25,000 pulses.  Now we read more than just one revolution of data, because we will very likely start reading in the middle of a sector.  So instead of 11 sectors worth of read time, we actually need to read 12 sectors worth, to ensure we read the entire sector in which we started.  This is 218.2ms of time minimum.  We could potentially re-assemble data, using some type of circular buffer, but this is more trouble than it’s worth.  I currently read 225ms of data.

225ms / 2us = 56,250 maximum, 28,125 minimum.

I had my FTDI chip, for the usb<->ttl converter, D2XX USB parameters setting the USB transfer size to 57000 bytes.  This is definitely over and above what was needed.  Or so I thought.

I bumped the transfer size from 57000 to 60032 (docs said specifically 64 byte multiples), and everything started working.  I had already narrowed it down that the problem tracks were ones that had a high density, where there were lots and lots of pulses.  So I knew the size of the track was related.  I checked for FIFO overflow, and it wasn’t overflowing.

I’ve got to look when I have a free second, but I think my USB packet size is 4096 bytes.  So 56250+4096 (some amount of padding?) = 60346.   Uh-o, I better bump that to 60,352.  I think the driver (or windows?) that maxes out at 64k transfer size, so I still have a little wiggle room.

Long and short is that it appears to be working much better.  I was glad to find this bug with just a little brainstorming, and getting better visibility into my actual pulses count on the FPGA.

characterizing speed performance of floppy drive controller

So I’ve got things working rather swimmingly right now.  Switched drives from Samsung to Sony, and it’s made a huge difference.  The Sony just seems to work better.

I’m averaging about 355ms per track, yielding 57s total disk times.  The 355ms is made up of 313ms of transfer time at an effective throughput rate on the serial of around 1.175mbps.  Which is basically 1.5mbps baud rate, theoretical max of 1.2mbps.  This isn’t horrible performance, but I really want to get back to 2mbps.  I haven’t been using 2mbps because I have massive errors, but I think that there is some round off happening in my UART that prevents it from working correctly.  I need to revisit my UART code and find out exactly why 2mbps doesn’t work.  I’ve run this usb->ttl converter at 2mbps with my uC, so it really should work fine.

If I go to 2mbps, I’ll EASILY chop off the 88ms from 313ms, and I’ll be transferring the track to the PC in REAL TIME.  Basically, as fast as I receive it, I’ll be sending it to the PC.  Remember, that because I transmit the pulse times, and not the data, that fast times are really required.  This is a little more complicated than just saying the RAW MFM rate is 500kbps, so you need 500kbps of bandwidth to the PC.

There are several optimizations I can do, and I’ll post more later.

working FPGA version of the amiga floppy project

So, I’ve been working on the FPGA version of the amiga floppy project for some time.  I just recently had a fair bit of free time, and so everything came together rather quickly!

I’m now able to read amiga floppy disks in using the same Java client software I had developed for use with the Parallax SX microcontroller board.  There were a few minor changes in the software — most notably the read data routine from the hardware.

I’ve written the code in Verilog on a Xilinx Spartan-3e evaluation board.

The various hardware parts I’ve described:

  • UART: Written from scratch, a transmitter and a receiver.   Simple to use, variable baud rates.
  • FIFO: Generated from Xilinx’s CoreGen. This connects the floppy interface to the USB interface. 32k bytes
  • FSM to always empty the FIFO to the PC.  Once something goes in the FIFO, it immediately gets sent to the PC
  • Read-floppy-FSM: Stores 225ms of Delta T’s (aka time between edges) as 8-bit integers into the FIFO.
  • Command FSM: Receives single-character commands from the java software to execute (R for read, U for upper head, L for lower head, etc)
  • Transmit test pattern routine: Sends 32k of data to the PC to test for reliable communication

A couple advantages with the FPGA solution:

  • We transmit the data to the PC as soon as it’s available.  I want to characterize the actual latency, but it should be pretty small.  This is different from my load->FRAM, and then FRAM-> PC method.  This method should be much faster and we’re just not idling for 225ms.
  • Instead of transmitting the bit-sync’d raw MFM to the PC, I’m sending the delta T’s.  While this requires a little more processing on PC, the PC can more easily determine why a particularly sector can’t be read.  For instance, is the time between pulses too small? Too large?  On a fringe edge?  Plus, since the Java decodes these, I can now add sliders for “acceptable delta T’s” for each 4, 6, or 8us windows.  Before that would require modifying the firmware on the microcontroller.  I can also start to do statistical analysis on the pulse times.

I am currently doing about 430ms per track.  This sucks.  I was beating that by 100ms with my microcontroller.  However, the problem is that because a variable amount of data is going to the PC, the PC receiving code does not know when exactly to stop receiving, so there’s a wait-timer which I have to optimize.  Once I receive the minimum amount of data, I wait 100ms since the last received data, and then exit.  I’ve got to put my logic analyzers in place and figure out how to optimize it.

Denis@h3q can read a disk in 43s, which is pretty darn good.  He is using tokens like I mentioned here and here and here.  I’m transferring much more data though, which gives me more information.  I like his time, and maybe that would be a nice goal to beat!  Wow. That’s only 269ms/track.  Hrrrmm…. that’s pretty good.

Finally had some luck with FPGA board

So I bought an FPGA eval board and a book on Verilog awhile back.  I made some progress learning things, and had some simple things like UARTs, writing to the LCD, and even some small VGA software(hrrrm, maybe I should call it Hardware…) working.  Some of it from scratch, some heavily borrowed from existing sources.  But then I got stuck.  Stuck on being able to access the DDR on board.  The included memory controller, produced by Xilinx’s MIG and the CoreGen app, was hard to use and I didn’t (and still don’t) understand verilog enough to simply run with that controller.  Now don’t get me wrong.  Xilinx has pretty decent documentation for some of this stuff.  And they described pretty well the steps needed to initialize the controller, and perform reads and writes.  But you have to remember that their controller is 7300 lines of code broken up across about 40 source files.  Now if this was C/C++ or Java (or even assembly language), and it was commented properly, I could probably follow what is being done.  Their code is poorly commented IMHO.

I digress, right, but I am really an anal commenter.  I comment A LOT.  But there really should never be a time where I’ve got explain what’s being done in a particular code block.  If something goofy is being done, or if I got sloppy, I explain it in-line with the code.  Right on the same line.  The comments help me when I’m reading the code.  And plus they sometimes reveal bugs in my code where what I say I’m doing in the comments don’t match what’s happening in the code.  < digression mode off >

So I’ve been looking for other ways to skin the cat.  I’ve asked guys at work. I’ve checked out literally every memory controller on opencores.org.  The problem is, basically, that DDR sucks.  It first sucks because high frequencies are required and so pathways through the FPGA are restricted.  I don’t know enough about FPGAs to tell you which path (from which pin to which pin) meets the timing requirements for DDR.  The first D in DDR is double, and that implies that if you have a clock, both the rising edge and falling edge of the clock pulse are reading or writing data.  Even if my actual application doesn’t require 100+mhz data rates, I’m forced to read/write to the memory at a fast rate.  Oh, and the other D stands for Dynamic.  And this sucks too. Why because Dynamic memory has to be refreshed constantly, and on-time.  That’s another thing to worry about.  And to think I actually contemplated writing my own controller.  Sheeeesh.

The Spartan-3 eval board, instead of the 3E (which is what I have), contains 1MB of SRAM.  Now the S here is STATIC — aka opposite of Dynamic, no refresh required.  It’s fast memory, around 10ns, but it doesn’t do any crap double data rate junk.  I’ve seen example controllers for this board, and for this memory, and it’s like one page of code. It’s absurd the difference in complexity.  However, I’ve got 64mb, and this is 1mb.

Suffice to say, I don’t own the S3 board.  While it’s nice, there are very limited connectors, LED displays instead of LCD, etc


So the embedded design kit, available from Xilinx, contains MicroBlaze which is a 32-bit softcore processor that has BUILT-IN controllers for things like Memory, Ethernet (cool), Serial (UARTs), etc.  And guess what, you program it in C.  And so you first download the FPGA with the HDL for the softcore, and then you download the .elf executable that is make’d from your code.  The Base System Builder (called the BSB) will build some initial framework for you and then you can expand from that.  Pretty neat because I can select my exact rev of my starter kit, and it handles many things like making sure that the various chips and onboard peripherals are setup properly and interface to the right pins.  So the RS232 port is wired in right, and the ethernet is attached properly.  And then, they’ve tested the controllers.  To make a long story short, I’ve finally got some code running that accesses the DDR and then spits out some messages via the RS232 port.  Now this isn’t my code, but you’ve got to start someplace.
— Entering main() —
Starting MemoryTest for DDR_SDRAM:
Running 32-bit test…PASSED!
Running 16-bit test…PASSED!
Running 8-bit test…PASSED!
— Exiting main() —

So what happens is that the different peripherals are mapped into memory at different locations.  On my board, my 64mb of ram is mapped at $8C00 0000-$8FFF FFFF.  And how easy is it to read and write to the memory?  Check this out:

Defined in an automatically generated header: #define XPAR_DDR_SDRAM_MPMC_BASEADDR 0x8C000000

Xuint32* myram = XPAR_DDR_SDRAM_MPMC_BASEADDR; //declare a pointer to the start of the memory block

myram[0] = 0xAAAA4489;

DONE. That’s it.  How much easier can it get!@#

I’m hoping that on Sunday I get some time to spend extending their provided examples.  Maybe set up something where it can read in stuff via the serial port, store it in ram, and then spit it back out…

sleep inversely proportional to the AFP and no more tokens

So I have figured one thing out.

The more I work on the AFP, the less I sleep.  At least that’s how it’s been over the last two nights.  Two nights ago it was 3am, last night it was after 2am.

Ideas come and go with me.

I’ve completely scrapped the token idea.  After a couple dry runs with some amiga floppies, it seems like the density of “10”s seem to be much more popular than 100, or 1000’s.  I guess I could have told you that looking at a scope.  But anyways, I had a lot of trouble getting the idea to work, and suffice to say, I quickly abandoned it.  The point is that the tokens were saving me about 400 bytes or so on average(about 3%), not the grand 50% or whatever I had imagined.

This also avoids unnecessary complexity.

So the current read routine times the difference between the pulses and then writes a ’10’, ‘100’, or ‘1000’ to the FRAM.  I manually write out the bits for each grouping…… no loops or anything.  And plus I take advantage of the fact that the output bit, once changed to 0 really doesn’t need to change.  So basically write a 1, clock, write a 0, clock, clock, clock for the 1000.

I successfully read a floppy last night using this method, so at least my concept is stable.  This still eliminates interrupts completely (edge and timed) and eliminates the ISR, etc.  Definitely getting a little more robust.  Now, will it actually help me read more floppies?  I don’t know.  I think the next step is to start documenting HOW these floppies are failing if I can tell.

I was having a few goofy problems last night, which I think were mostly related to flow-control.  I was using putty without f.c. and my SX appeared to lockup — it wouldn’t echo commands or the actual track.  Turned off flow control completely, and it started working.  I don’t know if putty handles the dtr/dsr flow control properly.  or should I say I’m not sure if the FTDI VCP drivers support DTR/DSR flow control.  I think they do because I think I tested it in Windows under C++ earlier.

The other goofy problem I was having was related to my code spontaneously restarting.  I think it was related to using JMP’s and not using the @ before the address.  The @ causes a PAGE to be inserted before the jump.  Now I don’t know if this is true for other CJxx commands etc.

I also saw a bunch of transfer checksum errors, but I think that’s related to the number of bytes being received.  I think I have to use the beginning delimiter FF FF effectively, and stop using an absolute offset for start of frame etc.

Ahh well.