Tag - fifo

1
found bug last night
2
working FPGA version of the amiga floppy project
3
4
PC might be the problem!
5
inpout32.dll vs WinIo
6
Finals week at pitt.edu
7
too many variables
8
updates
9
PC to SX transfer images

found bug last night

With this new FPGA solution, certain tracks would result in what I call a “short read.”  A short read is any received track that contains less than 28,125 delta T’s, aka pulse times.  Given a certain capture time, there are minimum’s and maximum’s of the number of pulses on an amiga track.

If we have a 300 rpm drive, then its 60s/300 = 200ms per revolution.  If the bitcells are 2us wide, then you have at most 200ms/2us = 100,000 bit cells.  The one’s density is at max 50%(raw MFM ’10’), so this means every other cell would contain a 1. So 50,000 pulses, so 50,000 delta T’s.  The minimum one’s density is 25%(raw MFM ‘1000’), so 25,000 pulses.  Now we read more than just one revolution of data, because we will very likely start reading in the middle of a sector.  So instead of 11 sectors worth of read time, we actually need to read 12 sectors worth, to ensure we read the entire sector in which we started.  This is 218.2ms of time minimum.  We could potentially re-assemble data, using some type of circular buffer, but this is more trouble than it’s worth.  I currently read 225ms of data.

225ms / 2us = 56,250 maximum, 28,125 minimum.

I had my FTDI chip, for the usb<->ttl converter, D2XX USB parameters setting the USB transfer size to 57000 bytes.  This is definitely over and above what was needed.  Or so I thought.

I bumped the transfer size from 57000 to 60032 (docs said specifically 64 byte multiples), and everything started working.  I had already narrowed it down that the problem tracks were ones that had a high density, where there were lots and lots of pulses.  So I knew the size of the track was related.  I checked for FIFO overflow, and it wasn’t overflowing.

I’ve got to look when I have a free second, but I think my USB packet size is 4096 bytes.  So 56250+4096 (some amount of padding?) = 60346.   Uh-o, I better bump that to 60,352.  I think the driver (or windows?) that maxes out at 64k transfer size, so I still have a little wiggle room.

Long and short is that it appears to be working much better.  I was glad to find this bug with just a little brainstorming, and getting better visibility into my actual pulses count on the FPGA.

working FPGA version of the amiga floppy project

So, I’ve been working on the FPGA version of the amiga floppy project for some time.  I just recently had a fair bit of free time, and so everything came together rather quickly!

I’m now able to read amiga floppy disks in using the same Java client software I had developed for use with the Parallax SX microcontroller board.  There were a few minor changes in the software — most notably the read data routine from the hardware.

I’ve written the code in Verilog on a Xilinx Spartan-3e evaluation board.

The various hardware parts I’ve described:

  • UART: Written from scratch, a transmitter and a receiver.   Simple to use, variable baud rates.
  • FIFO: Generated from Xilinx’s CoreGen. This connects the floppy interface to the USB interface. 32k bytes
  • FSM to always empty the FIFO to the PC.  Once something goes in the FIFO, it immediately gets sent to the PC
  • Read-floppy-FSM: Stores 225ms of Delta T’s (aka time between edges) as 8-bit integers into the FIFO.
  • Command FSM: Receives single-character commands from the java software to execute (R for read, U for upper head, L for lower head, etc)
  • Transmit test pattern routine: Sends 32k of data to the PC to test for reliable communication

A couple advantages with the FPGA solution:

  • We transmit the data to the PC as soon as it’s available.  I want to characterize the actual latency, but it should be pretty small.  This is different from my load->FRAM, and then FRAM-> PC method.  This method should be much faster and we’re just not idling for 225ms.
  • Instead of transmitting the bit-sync’d raw MFM to the PC, I’m sending the delta T’s.  While this requires a little more processing on PC, the PC can more easily determine why a particularly sector can’t be read.  For instance, is the time between pulses too small? Too large?  On a fringe edge?  Plus, since the Java decodes these, I can now add sliders for “acceptable delta T’s” for each 4, 6, or 8us windows.  Before that would require modifying the firmware on the microcontroller.  I can also start to do statistical analysis on the pulse times.

I am currently doing about 430ms per track.  This sucks.  I was beating that by 100ms with my microcontroller.  However, the problem is that because a variable amount of data is going to the PC, the PC receiving code does not know when exactly to stop receiving, so there’s a wait-timer which I have to optimize.  Once I receive the minimum amount of data, I wait 100ms since the last received data, and then exit.  I’ve got to put my logic analyzers in place and figure out how to optimize it.

Denis@h3q can read a disk in 43s, which is pretty darn good.  He is using tokens like I mentioned here and here and here.  I’m transferring much more data though, which gives me more information.  I like his time, and maybe that would be a nice goal to beat!  Wow. That’s only 269ms/track.  Hrrrmm…. that’s pretty good.

I’ve played around a considerable bit from yesterday. There’s something very strange about the speed the PC can react in certain circumstances, and I can’t quite nail it down.

Now I haven’t changed one thing on the PC. The logic is just too simple, and doesn’t need changed.

On the SX side, I wrote (again, I’ve done this before) a real small routine here:

repeat:

 if pcack <> 0 then repeat

 inc loopz
 RC = loopz
 byteready = 1

waitforpin:

 if pcack <> 1 then waitforpin
 byteready = 0
 goto repeat

This basically sends 0x00 – 0xFF to the PC, one byte at a time, and it waits on the PC — but sends it to the PC as fast as it will take it. I hooked up the byteready and pcack leads to my scope, and then measured the time it took for each cycle. The complete cycle took only 4.25+/- us. This is really fast, and I’m happy the PC isn’t choking. Man the SX is so much faster than the PC responding to signals, raising and lowering leads, etc
I also checked to see if the data is received correctly, and its 100% correct…..

I implemented a 16-byte software fifo (circular buffer) into my actual software and it looks like it works as expected. Within only 50 bytes (sometimes a little longer, like 100+), the circular buffer overflows. My data is coming out at a rate of 8 bits every 16us(ie 1 bit every 2us, or 500kbps), so that means that the PC has 16us PER BYTE. And my earlier tests showed that the PC could take a byte every 4.25, almost 1/4 of the time. But the PC chokes in this environment.

This fifo gives my pc 16 times longer to reply — but I don’t see how it needs it — and this *still* isn’t enough. So I went to my earlier code without the fifo, and added “wait” statements which cause the main program to wait for the PC to become ready, and wait for the pc to ACK the byte. I haven’t put a scope on yet to see how much I’m being penalized. I’m betting 4us. Which is a long time, especially because main has other tasks….

My early conclusions point to the fact that PC seems to process/task-switch to other running applications whenever there is no activity for x number of microseconds. So if you are pounding it, it remains active, and your response times are low. Wait 5 or 10 microseconds, and it switches to do something else, and then takes a little bit to come back to service your new request. Just a guess at this point.

PC might be the problem!

I think I’ve found at least one problem here.

My PC is at least one problem. Even though I’ve done extensive tests with the PC in the past, including measuring a 9us total cycle time, it appears the PC must slow at some point which causes data to be lost. It’s wierd, because it seems like it can keep up some of the time — other times it does just fine, then starts dropping, picks up again etc.

I wrote a little program that just transmitted a bunch of integers from the SX to the PC, and the SX waited on the PC to transmit the next one — it was really fast. I wasn’t storing the integers in memory, like I do the normal data, and perhaps that’s the difference.

Suffice to say that I put a “if byteready = 1 then BREAK” line right before the code that puts a byte on the port, I get a break within 50 bytes. Byteready *should be* zero when I go into that routine, because that would have indicated that the PC ack’d the previous byte. If byteready = 1, then that means the SX has a byte on the port, and it hasn’t been ack’d. If I reach code that is putting another byte on the port, and the previous one hasn’t been ack’d — then the PC isn’t replying fast enough.

The semi-good news is that this at least partially explains why I’m getting some good data, garbage, followed by some good data. I’ll bet my SX code is fine…..

I thought this was a problem earlier so I wrote a small software FIFO buffer on the SX. It buffers up to 16 bytes worth of data. It’s not tested real well yet, but the base code is there. I should turn this into a SUB and test it extensively so I can use it in other applications.

I’m going to put the FIFO code back in.

inpout32.dll vs WinIo

From a previous post, I mentioned WinIo as a nice way of accessing the parallel port from win2k. However, it was slow. The authors said, “I should be using MapPhystoLin() anyways”, whenever I asked about performance. Hell if I can figure out htf I’m supposed to use it — it’s not clear to me.

So, instead of banging my head against the wall, I’m going to switch to another solution, one that appears to be pretty popular as well.

inpout32.dll is a simple DLL that contains two functions for reading and writing the parallel port. You can find download and find docs here

One note: under Visual Studio .net, you’ll need to remove a portion of two lines, basically deleting “_stdcall” — otherwise you’ll get an error, and it won’t compile.

The correct lines are :

typedef short (*inpfuncPtr)(short portaddr);
typedef void (*oupfuncPtr)(short portaddr, short datum);

(before they read typedef short _stdcall, and typedef void _stdcall) — Basically yank out _stdcall and you’ll be fine.

You’ll find this at the top of test.c in the distribution zip.

In previous posts, I mentioned that to transfer a full byte of data takes 12us, where it takes normally 16us to actually obtain the data from the drive. This is too close for comfort, and so you can also read back where I added a FIFO buffer to the SX code. In any event, 8.25us of the 12us, some 70% of the time is:

8.25us for the PC to grab the byte(store it to the harddrive, right now anyways) and raise pin 1

I think this is entirely too long. Now, new versions of my code put the received data into RAM just in case the HD takes a long time. Remember, my HD’s have cache, and they are cached by the OS too — so writing to HD isn’t as bad as it sounds.

Anyways, if I get the entire full cycle down to 8us or less, I’d be thrilled. I really hope the performance of inpout32.dll will help here.

Finals week at pitt.edu

Welp, this is finals week, so don’t expect anything new to show up until after April 28th, 2005.

But, this doesn’t mean I’ve forgotten about this project! I am very much interested in moving this forward.

I’m not really sure what my next step is, but I have to start approaching this using the “scientific method”, and start to isolate my issues. So far, the testing and everything is haphazard, and I find myself testing stuff that I’ve done before because I’ve partially forgot what the results of earlier tests were. This is bad, wastes time, etc.

Part of my problems is that the signal has multiple lengths, sometimes it’s a perfect 4us, 6us, or 8us grouping, and sometimes it’s 5.56us or 7.63us. You get the picture. The code has to be able to handle these different situations the same. Also, I think I’m in my ISR simply way too long. I’m missing edges, and this is where I’m screwing up.

If anyone is interested, I’ve included my SX/B code below. You can also find it here for downloading.

DEVICE SX28, OSCHS3, TURBO, STACKX, OPTIONX
IRC_CAL IRC_SLOW
FREQ 50000000

' -------------------------------------------------------------------------
' Variables
' -------------------------------------------------------------------------

inputpin var RB.0       'data from drive
outputbyte var RC       'data to pc

byteready var RB.2      'notify pc byte is ready
pcack var RB.3          'pc saying it got the byte

storevalue var bit      'bit to store in the shift reg
samplepin var bit       'actual value of pin at interrupt

numstoredbits var byte      '# of stored bits in shift reg
storedata var byte      'actual stored data
seenedge var bit        'seen an edge or are we idly?
pending var byte        'swapped with wkpnd_b, RTCC or edge?
validhigh var byte      'how many high bits have we seen in a row

inputready var bit      'used for isr to tell main value is ready
                'to be stored

currpos var byte        'current STORE position in FIFO
xferpos var byte        'current XFER position in FIFO
myarray var byte(16)        'FIFO storage


' -------------------------------------------------------------------------
  INTERRUPT
' -------------------------------------------------------------------------

ISR_Start:

    'RTCC = 0
    RB.1 = 1        'for debugging

    samplepin = inputpin    'grab the pin

    pending = 0
    wkpnd_b = pending   'is this an edge or a rollover

    if pending.0 = 1 then processedge

    if seenedge = 0 then goback 'are we idle?

    if samplepin = 0 then goback    'we shouldn't be at an edge here

    inc validhigh

    if validhigh > 3 then goidle    'more than 3 highs? go idle

    storevalue = 0
    inputready = 1          'tell main the value is ready to be stored

    goto goback

goidle:
    seenedge = 0            'erase the edge we saw before
    validhigh = 0           'reset and start over
    goto goback

processedge:

    seenedge = 1            'we've now seen an edge!

    validhigh = 0           'this should already be zero

    storevalue = 1          'store a 1 for an edge
    inputready = 1          'tell main its ready

    RTCC=0

goback:

    wkpnd_b = 0

    RB.1 = 0            'for debugging

    returnint 88


PROGRAM start_point

' Program Code
' -------------------------------------------------------------------------
start_point:

    'setup port b

    TRIS_B=%11111001        'direction bits rb.0 input, rb.1 & rb.2 output
    ST_B = %11111110        'schmitt trigger for our drive input

    WKPND_B = 0         'clear pending interrupts
    WKED_B = %11111111      'set falling edge trigger
    WKEN_B = %11111110      'enable drive input interrupts
    PLP_B =  %11111110      'enable pullups

    'setup port c

    TRIS_C=0            'enable port c for output
    PLP_C = %00000000

    numstoredbits = 0
    byteready = 0
    seenedge = 0
    pending = 0
    validhigh = 0
    inputready = 0

    currpos = 15
    xferpos = 15

    ' turn on interrupts
    OPTION = $88

repeat:

    if byteready = 1 then skipthis      'we are already xferring something

    if currpos = xferpos then skipcheck 'we have nothing to send and everything has been ack'd

    outputbyte = myarray(xferpos)

    byteready = 1               'put the byte on the port and tell PC

    goto skipcheck              'we don't need to check an ack this soon after we raised it

skipthis:

    if pcack = 0 then skipcheck

    byteready = 0

    if xferpos > 0 then skipassign

    xferpos = 16

skipassign:

    dec xferpos

skipcheck:

    if inputready = 0 then repeat

    inputready = 0

    if numstoredbits > 0 then notzerobits

    storedata = storevalue
    numstoredbits = 1
    goto repeat

notzerobits:

    storedata = storedata < < 1 ' shift left to make room for next bit

    if storevalue = 1 then storeaone

    storedata = storedata | 0   ' store a zero

    goto nextstep

storeaone:

    storedata = storedata | 1   'store a one

nextstep:

    inc numstoredbits

    if numstoredbits < 8 then repeat

    'else output data to FIFO, eventually to the PC

    myarray(currpos) = storedata

    if currpos > 0 then skipcurrreset

    currpos = 16

skipcurrreset:

    dec currpos

    numstoredbits = 0

    storedata = 0

    goto repeat

Stick with me and feel free to comment. I need the help!

too many variables

I think that’s really the problem I’ve been facing with this floppy project. Trials and tribulations indeed. Or at least trial and error, with stress on the error part. There are probably less than half-a-dozen people worldwide that have the very specific knowledge that I need to get this off the ground. These would be people from CAPS, one person who sells an amiga (as well as other floppy drives) floppy controller card, and that’s it.

I’ve talked to some really nice ex-Commodore employees who have been friendly, and helpful.

The problem, of course, is that these machines were created a long time ago. The programmers of the original OS, freeware/shareware software, hardware, have all long forgotten the details. And you know what they say, “the devil’s in the details.”

To give you an idea of the variables with which I’m working:

1. Am I reading the hardware signal right? Are there *really* only three possibilities coming out of the drive, a “10”, “100”, and a “1000”? This is all I’ve been able to observe…

2. Is the software I’ve written on the SX microcontroller properly sampling this data at the right time? I have two interrupts occuring, one for the edge detection and one for the 2us timeout which clocks a zero in (up to 3 of them in a row before going idle). So far, sometimes this works, and sometimes it doesn’t. Why and how is this getting out of sync?

3. Is the SX transmitting faster than the PC can handle this? So far, my observations say no, and I implemented a software FIFO to help out.

4. Is my software FIFO working?

5. Is my PC software, that’s designed to receive all this, working properly? I’m now storing the transmitted bytes in a memory array, and later storing to disk, to prevent any speed issues associated with accessing the hard drive.

The REAL problem here is that I simply don’t know what is actually leaving the drive in terms of data. The only thing I’ve been able to figure out is that the MORE data is leaving the drive than is showing up on the PC. This is a bad sign. Where’s the data being lost? My guess, the sampling isn’t working properly. Something is slipping. But how the heck can it slip when a transition resets it, and transition is occuring on a very regular basis (minimum every 4us, maximum every 8us). The thing only has to run freely for a few cycles.

Add the fact that the only real pattern that I can VISUALLY look for is the 0xAAAA 0xAAAA 0x4489 0x4489 raw sync sector in raw MFM. Any words, or real data has to be decoded before, properly aligned on a byte, etc. PAIN IN THE BUTT to figure out if everything is actually working. Most of my programming projects, whatever they might be, are straight-forward. Run it, see if it works, and then go back and fix it if it doesn’t. Even the indication that this thing is working is obscured.

There are so many variables, and I’m constantly changing all of them, that who knows when one part is working? Of course, you say, only change one thing at a time. OK, makes sense, but this just isn’t practical. If I change how the SX xfers to the PC, then I have to modify the PC software accordingly…. So if I make a mistake in coding on one side or the other, who knows?

Maybe I’m lacking the necessary basic project management, coding, microcontroller, hardware experience required to get this off and running?

BTW: my title of this entry reminds me of “Too many secrets” or “Cootys Rat Semen.” from Sneakers. If you don’t know what I’m talking about, please disregard! 🙂

updates

I originally thought that the SX might be sending the data too fast for the PC to handle. As my last post mentioned, one full cycle is 12us, but who knows if this is during a disk-write, going into the cache, etc. I took several samples and they were all around 12us but I wanted to be sure that the PC wasn’t dropping data on the floor. The SX, up ’til now, didn’t have support for handling this situation.

I made a couple improvements on the SX side and the PC side. On the SX, I added a 16-byte software FIFO, so if the PC hasn’t ack’d the last byte yet when a new byte is ready, it simply gets thrown into an array, and sent to the PC as soon as it’s ready. It takes 16us(2us * 8 bits) minimum for the SX to get 1 one byte, and so a 16-byte FIFO gives me 256us of delay before this would be a serious issue. This is still 1/4000th of a second, so the PC can’t really dilly-dally, but it gives me some breather space.

On the PC side, instead of getting the byte from the parallel port and writing it directly to disk, I’m shoving it into an array first, and then writing at the end of the transfer. All of this software is of the barely-working variety….

I have yet to see anything intelligible to come from the drive which means my whole solution doesn’t function, yet. There are so many variables which makes this a tough nut to crack. Just the mere fact that the data coming from the drive is MFM encoded, simply looking at the data doesn’t tell you whether or not the raw data is correct. And the goofy MFM encoding that the Amiga uses makes it even tougher. BUT — I’m still holding out that I should be able to see the SYNC words. Of course, the problem there is that the data can be shifted, and not on even byte boundaries, so you have to shift the data up to 8-bit locations in order for that to be readable.

My next step is to run DiskMonTools, an amiga program, which will display the raw MFM from a floppy on screen. I’m hoping I can sort of learn what some common encoded patterns are, and look for them in the output from my program. Things like headers etc have lots of static data, so they are easier to identify.

I’m hoping to see some progress one of these days……

PC to SX transfer images

I haven’t spent much time within the last few days on the project. Looking at the data on the PC, I think that there are still some problems. I think the problems lie in the fact that the interface to the parallel port that I’m using is too slow. Or I should say that the WAY in which I’m reading the port is slow. I need to change some things around. This timing diagram shows what it looks like when it IS working. Top line is the acknowledgement pin from the PC which tells the SX it has received the byte. Here’s the data flow and how it relates to what you see:

When the SX receives 8 bits worth of data, it raises pin 10 (bottom trace). The PC is waiting on pin 10, when it sees it going high, it reads the byte and then raises pin 1(top trace). SX waits for pin 1 to go high, and drops pin 10 whenever that happens. The PC then drops pin 1, and waits on pin 10 again.

The total cycle takes about 11-12us, the breakdown is

8.25us for the PC to grab the byte(store it to the harddrive, right now anyways) and raise pin 1
2.10us for the SX to realize pin 1 is raised and drop pin 10
1us for the PC to realize 10 has dropped and to lower pin 1

The SX is fairly unrelenting with the speed — and there is no advanced logic. If the PC takes longer than 15-16us in any one case, the data just gets plain corrupted. I think that in worst case scenarios this is happening. I see these diagrams get goofy — and I’m not sure exactly why yet.

I didn’t want to get into complicated flow-control issues, but I might not have a choice. I’m think I’m going to set up a simple software FIFO on the SX — like perhaps 16 bytes. This should be plenty and should handle situations when the PC isn’t ready to received. This will be a mandatory ACK situation before the next byte comes — this is different than how it is now. Right now I didn’t want the SX to block waiting on an ACK……

PC to SX