Tag - uart

characterizing speed performance of floppy drive controller
working FPGA version of the amiga floppy project
Finally had some luck with FPGA board
got build environment working again, fixed uart code
first day back
a little code review today
AFP 0.2 WIP SX code
reworking my uart
On speed optimization
mid-january status

characterizing speed performance of floppy drive controller

So I’ve got things working rather swimmingly right now.  Switched drives from Samsung to Sony, and it’s made a huge difference.  The Sony just seems to work better.

I’m averaging about 355ms per track, yielding 57s total disk times.  The 355ms is made up of 313ms of transfer time at an effective throughput rate on the serial of around 1.175mbps.  Which is basically 1.5mbps baud rate, theoretical max of 1.2mbps.  This isn’t horrible performance, but I really want to get back to 2mbps.  I haven’t been using 2mbps because I have massive errors, but I think that there is some round off happening in my UART that prevents it from working correctly.  I need to revisit my UART code and find out exactly why 2mbps doesn’t work.  I’ve run this usb->ttl converter at 2mbps with my uC, so it really should work fine.

If I go to 2mbps, I’ll EASILY chop off the 88ms from 313ms, and I’ll be transferring the track to the PC in REAL TIME.  Basically, as fast as I receive it, I’ll be sending it to the PC.  Remember, that because I transmit the pulse times, and not the data, that fast times are really required.  This is a little more complicated than just saying the RAW MFM rate is 500kbps, so you need 500kbps of bandwidth to the PC.

There are several optimizations I can do, and I’ll post more later.

working FPGA version of the amiga floppy project

So, I’ve been working on the FPGA version of the amiga floppy project for some time.  I just recently had a fair bit of free time, and so everything came together rather quickly!

I’m now able to read amiga floppy disks in using the same Java client software I had developed for use with the Parallax SX microcontroller board.  There were a few minor changes in the software — most notably the read data routine from the hardware.

I’ve written the code in Verilog on a Xilinx Spartan-3e evaluation board.

The various hardware parts I’ve described:

  • UART: Written from scratch, a transmitter and a receiver.   Simple to use, variable baud rates.
  • FIFO: Generated from Xilinx’s CoreGen. This connects the floppy interface to the USB interface. 32k bytes
  • FSM to always empty the FIFO to the PC.  Once something goes in the FIFO, it immediately gets sent to the PC
  • Read-floppy-FSM: Stores 225ms of Delta T’s (aka time between edges) as 8-bit integers into the FIFO.
  • Command FSM: Receives single-character commands from the java software to execute (R for read, U for upper head, L for lower head, etc)
  • Transmit test pattern routine: Sends 32k of data to the PC to test for reliable communication

A couple advantages with the FPGA solution:

  • We transmit the data to the PC as soon as it’s available.  I want to characterize the actual latency, but it should be pretty small.  This is different from my load->FRAM, and then FRAM-> PC method.  This method should be much faster and we’re just not idling for 225ms.
  • Instead of transmitting the bit-sync’d raw MFM to the PC, I’m sending the delta T’s.  While this requires a little more processing on PC, the PC can more easily determine why a particularly sector can’t be read.  For instance, is the time between pulses too small? Too large?  On a fringe edge?  Plus, since the Java decodes these, I can now add sliders for “acceptable delta T’s” for each 4, 6, or 8us windows.  Before that would require modifying the firmware on the microcontroller.  I can also start to do statistical analysis on the pulse times.

I am currently doing about 430ms per track.  This sucks.  I was beating that by 100ms with my microcontroller.  However, the problem is that because a variable amount of data is going to the PC, the PC receiving code does not know when exactly to stop receiving, so there’s a wait-timer which I have to optimize.  Once I receive the minimum amount of data, I wait 100ms since the last received data, and then exit.  I’ve got to put my logic analyzers in place and figure out how to optimize it.

Denis@h3q can read a disk in 43s, which is pretty darn good.  He is using tokens like I mentioned here and here and here.  I’m transferring much more data though, which gives me more information.  I like his time, and maybe that would be a nice goal to beat!  Wow. That’s only 269ms/track.  Hrrrmm…. that’s pretty good.

Finally had some luck with FPGA board

So I bought an FPGA eval board and a book on Verilog awhile back.  I made some progress learning things, and had some simple things like UARTs, writing to the LCD, and even some small VGA software(hrrrm, maybe I should call it Hardware…) working.  Some of it from scratch, some heavily borrowed from existing sources.  But then I got stuck.  Stuck on being able to access the DDR on board.  The included memory controller, produced by Xilinx’s MIG and the CoreGen app, was hard to use and I didn’t (and still don’t) understand verilog enough to simply run with that controller.  Now don’t get me wrong.  Xilinx has pretty decent documentation for some of this stuff.  And they described pretty well the steps needed to initialize the controller, and perform reads and writes.  But you have to remember that their controller is 7300 lines of code broken up across about 40 source files.  Now if this was C/C++ or Java (or even assembly language), and it was commented properly, I could probably follow what is being done.  Their code is poorly commented IMHO.

I digress, right, but I am really an anal commenter.  I comment A LOT.  But there really should never be a time where I’ve got explain what’s being done in a particular code block.  If something goofy is being done, or if I got sloppy, I explain it in-line with the code.  Right on the same line.  The comments help me when I’m reading the code.  And plus they sometimes reveal bugs in my code where what I say I’m doing in the comments don’t match what’s happening in the code.  < digression mode off >

So I’ve been looking for other ways to skin the cat.  I’ve asked guys at work. I’ve checked out literally every memory controller on opencores.org.  The problem is, basically, that DDR sucks.  It first sucks because high frequencies are required and so pathways through the FPGA are restricted.  I don’t know enough about FPGAs to tell you which path (from which pin to which pin) meets the timing requirements for DDR.  The first D in DDR is double, and that implies that if you have a clock, both the rising edge and falling edge of the clock pulse are reading or writing data.  Even if my actual application doesn’t require 100+mhz data rates, I’m forced to read/write to the memory at a fast rate.  Oh, and the other D stands for Dynamic.  And this sucks too. Why because Dynamic memory has to be refreshed constantly, and on-time.  That’s another thing to worry about.  And to think I actually contemplated writing my own controller.  Sheeeesh.

The Spartan-3 eval board, instead of the 3E (which is what I have), contains 1MB of SRAM.  Now the S here is STATIC — aka opposite of Dynamic, no refresh required.  It’s fast memory, around 10ns, but it doesn’t do any crap double data rate junk.  I’ve seen example controllers for this board, and for this memory, and it’s like one page of code. It’s absurd the difference in complexity.  However, I’ve got 64mb, and this is 1mb.

Suffice to say, I don’t own the S3 board.  While it’s nice, there are very limited connectors, LED displays instead of LCD, etc


So the embedded design kit, available from Xilinx, contains MicroBlaze which is a 32-bit softcore processor that has BUILT-IN controllers for things like Memory, Ethernet (cool), Serial (UARTs), etc.  And guess what, you program it in C.  And so you first download the FPGA with the HDL for the softcore, and then you download the .elf executable that is make’d from your code.  The Base System Builder (called the BSB) will build some initial framework for you and then you can expand from that.  Pretty neat because I can select my exact rev of my starter kit, and it handles many things like making sure that the various chips and onboard peripherals are setup properly and interface to the right pins.  So the RS232 port is wired in right, and the ethernet is attached properly.  And then, they’ve tested the controllers.  To make a long story short, I’ve finally got some code running that accesses the DDR and then spits out some messages via the RS232 port.  Now this isn’t my code, but you’ve got to start someplace.
— Entering main() —
Starting MemoryTest for DDR_SDRAM:
Running 32-bit test…PASSED!
Running 16-bit test…PASSED!
Running 8-bit test…PASSED!
— Exiting main() —

So what happens is that the different peripherals are mapped into memory at different locations.  On my board, my 64mb of ram is mapped at $8C00 0000-$8FFF FFFF.  And how easy is it to read and write to the memory?  Check this out:

Defined in an automatically generated header: #define XPAR_DDR_SDRAM_MPMC_BASEADDR 0x8C000000

Xuint32* myram = XPAR_DDR_SDRAM_MPMC_BASEADDR; //declare a pointer to the start of the memory block

myram[0] = 0xAAAA4489;

DONE. That’s it.  How much easier can it get!@#

I’m hoping that on Sunday I get some time to spend extending their provided examples.  Maybe set up something where it can read in stuff via the serial port, store it in ram, and then spit it back out…

got build environment working again, fixed uart code

So awhile ago I bought a quad core machine which runs Vista 64.  Once I had the new machine, I tried getting my build environment for the AFP working again.  NetBeans, the java IDE I use, has 64-bit support but there were a host of issues regarding FTDI drivers, jd2xx, etc which I fought over and eventually gave up.  I was getting error messages like “Can’t load IA 32-bit .dll on a AMD 64-bit platform” and there was a serious conflict between 32 bit and 64 bit JVM, JDK, DLL’s etc etc.  Pain in the butt.

I’ve had some time to work on stuff the last couple days and sit down and re-attack the problem.  I did manage to solve it by uninstalling anything Java that is 64-bit. 🙂  I believe it was just the JDK and JVM.  I also had to reinstall NetBeans because it links itself to a JDK —- once it was uninstalled, NetBeans would literally not boot with a link.  I looked all over NetBeans for something that defines a “target.”  You know, something where I can say “I want to build applications for 64-bit” or “32-bit” or whatever.  I couldn’t find it.  I uninstalled NetBeans, reinstalled it (this time it detected and recognized the 32-bit JDK), and voila, my java client now loads, builds, and runs correctly!@#

I hooked up my AFP again, and attempted to image and disk, and there were major problems.  Do you remember this post? This time it actually wasn’t that bad.  Another time somehow one of my SX28 pins were fried.

I’ve always wanted to do an extended UART data transfer test.  I’ve never really done this and I think it has been a big source of problems from the beginning.  Even though I checked the UART carefully for cycle counts(and done this 239408 times), put it on the logic analyzer, and even had someone else review it, there must have been a mistake.  I was corrupting about 3-5 bytes for every 100,000.  Not tons, but enough to show up during a disk transfer.

I started out really looking into my UART.  When bytes were corrupted, they were corrupted in exactly the same way:

The first data bit that was a 1-bit was ignored, and the data only started being received after the next one bit.  Let me give an example:

Let’s say the correct byte was : dec 138, or binary 1000 1010.  It would be received as dec 10 or 0000 1010.

correct byte might be : dec 39 or binary 0010 0111. It would be received as dec 7, or 0000 0111.

correct byte might be: dec 166 or binary 1010 0110. It might be rx’d as dec 38, or 0010 0110.

Remember, this only happened as an exception.

I eventually tweaked my UART by shortening the delay between the start bit and the first data bit, and also the time between bits by 20ns.  I’m honestly not sure why that worked, and it was mostly found by trial and error. But it had a profound and instant effect.  I was running trials and counting the number of bad bytes per 655k of transfer.  I was anywhere between 33-42 bad bytes per 655k.  When I made the change, it jumped to 0 !!

As a matter of fact, I just finished sending 284 megabytes (or 2.84 gigabits) of traffic through it without a single bit error!  I think that’s pretty decent.  The funny thing, I fired up “calc” to do some quick math, and I think the cpu interruption, or disk access, or something, caused me to lose some data.  In the actual real client, it would have automatically retransmitted the track, so it’s not the end of the world.

Once I fixed the uart, it started reading disks correctly again.

I’m pretty happy to see this thing working again.  Maybe I’ll read in some more amiga floppies tonight!

first day back

You know, I always hate the “first day back” working on a project.  Things always seem to get screwed up between the time I last worked on a project, and that day.  Even if nothing has changed. Or so I say.

I came back and starting getting a lot of “SYNC errors.”  A sync error occurs whenever the software thinks it’s reading track #20, and for whatever reason the drive is giving data for track #21 — or some different track.

That was this time’s problem.  I really don’t know what changed or what affected it, but I decided to slow the thing down a little bit, and that seemed to solve the problem.  Right now 50ms seems to fix it, but I’ll lower that, and more importantly dig a little deeper to see if I can figure out why the heck I need that.

50ms doesn’t seem like  a long time, but I really don’t like it after I spent so much optimizing this thing to be as fast as possible.  I was trying to go back in time with the software, to see if maybe something like my updated receive UART code was screwing things up, but it didn’t help much……

a little code review today

While my send UART code has been pretty solid for some time, I’ve always though my receive uart code needed doublechecked and worked on.

I spent some time today just going through the code, graphing out some timing diagrams, checking my cycle-counts and so forth.  The RX uart code is actually pretty good.  There is some minor jitter in detecting when the startbit actually falls, but it is on the order of a couple cycles.  Depending on exactly when the start bit happens, it can either take 2 cycles or 4 cycles.  This is only a difference 40ns, and I’m not horribly worried about it.

I still want to do some type of stress test on it because I don’t want this to be a source of problems if I ever decide to implement writing as a feature.

AFP 0.2 WIP SX code

I’ve put some work into cleaning up the SX code tonight. I’ve removed some code sections no longer needed, and added documentation where it was lacking. Code should be commented pretty good. I’ve also added a “TO-DO” section at the top of the code of things I have to work on.

This is an SX/B code file, which is simply plain text, and the framework of the program is SX/Basic. All the critical routines, namely the actual data reading/storage, FRAM communication, custom UART code is in pure assembly. I use SX/B where the high-level abstraction is convenient, easy to read, and assembly where speed/preciseness is required.

You can get the code here.

I welcome comments, corrections, suggestions, etc.

The to-do list from the top of the file follows


‘1> Deal with motor being on all the time. do we want to have a turn-off-motor command?
‘2> Optimize forward and backward code and figure out why this code actually works AS IS
‘3> double check receive uart code & comments
‘4> Deal with a non-amiga/really bad disk locking up findsync()
‘5> All code sections, including subroutines, need pre-state information, variables used
‘ and post-state to identify what needs to be in place for them to operate correctly
‘ This will make them more modular and will allow MAC’s better
‘6> More generally, handle error / no-data conditions more elegantly – display error and exit

reworking my uart

Well I rewrote my recv and send uart, and with the help of my logic analyzer, I’ve tuned them up pretty good.

I’m getting some transfer errors, somehow, and I think its related to buffer size.

I read a disk in 2:26, which is my fastest time to date.  If I can get rid of these errors, it will be much much faster.

On speed optimization

The theoretical limit, assuming 300rpm drives, is 200ms per track. Or 400ms per cylinder, or 32 seconds per disk. Most attempts read slightly more than one track. I read 16,384 bytes, or 262ms worth.
Right now, if I continue to use the SX/B UART code, which maxes out at 230,400 bps. That puts my PC xfer code at 711 ms per track.

So that’s 262ms + 711ms = 973ms or about 1 second per track. 160 tracks or 2:40 minutes per disk. Not bad, but I’d much rather be closer to 1 minute.

I also have overhead in the PC->SX communication, and I’m currently turning the motor ON and then OFF for each track. I’ve gotta fix that. Plus, I have to wait 500ms for the drive to spin up on each attempt. Leaving the motor on all the time raises some issues, but I’ll get that in time.

More on this later

mid-january status

So here’s where I’m at:

I know my memory read and write routines are good.  I calculate a checksum as I’m writing the data, and on the output of that data to the PC, I also calculate the checksum.  They match, no problems there.

I know my USB to PC routines are good.  I calculate a different byte-based checksum (8-bit checksum)  from the data I get from the FRAM, and then I have the PC software calculate it.  They always match and I’m using Parallax’s UART for the time being.  Mainly for reliability.

I’m using a new basic ISR routine, which I posted a post or two back.  It’s simple, it doesn’t force any particular type of encoding.  What comes off the drive goes into the memory.  There are some drawbacks, for instance, I don’t support any type of idling.  For the initial data, I wait to see a transition, and then I turn on interrupts and start recording.  I don’t check double 1’s now, and I don’t check more than (3) 0’s.  The data SHOULD be coming out of the drive correct, and force fitting it into correct form just doesn’t work, and while it does fix SOME situations, I *REALLY* have to get to the bottom of why this happens.

My MFMSanityCheck software is telling me .3% of the data is bad, which I think 99.7% still isn’t anything to complain about, but I really have to find the source of the problems.

All .3% of at least one sample file is a double 1’s situation.  And I’ve seen this before.  And its NEVER triple one’s, and it’s NEVER too many zeros.  Just double 1’s.  Two 1’s back to back.

So now that I’ve tested my memory, the problem is either the drive is REALLY spitting out two 1’s (and I have no clue how to fix that problem), or my ISR is triggering twice on the same edge.

I’m leaning towards the second choice but I really have to figure out how that is happening.