Tag - SX

working FPGA version of the amiga floppy project
bought a Saleae Logic. Another tool for my toolbox
got build environment working again, fixed uart code
my checksum algorithm
redone transfer routine
sleep inversely proportional to the AFP and no more tokens
new read routine implementation started
further development on the AFP
feasability of writing disks

working FPGA version of the amiga floppy project

So, I’ve been working on the FPGA version of the amiga floppy project for some time.  I just recently had a fair bit of free time, and so everything came together rather quickly!

I’m now able to read amiga floppy disks in using the same Java client software I had developed for use with the Parallax SX microcontroller board.  There were a few minor changes in the software — most notably the read data routine from the hardware.

I’ve written the code in Verilog on a Xilinx Spartan-3e evaluation board.

The various hardware parts I’ve described:

  • UART: Written from scratch, a transmitter and a receiver.   Simple to use, variable baud rates.
  • FIFO: Generated from Xilinx’s CoreGen. This connects the floppy interface to the USB interface. 32k bytes
  • FSM to always empty the FIFO to the PC.  Once something goes in the FIFO, it immediately gets sent to the PC
  • Read-floppy-FSM: Stores 225ms of Delta T’s (aka time between edges) as 8-bit integers into the FIFO.
  • Command FSM: Receives single-character commands from the java software to execute (R for read, U for upper head, L for lower head, etc)
  • Transmit test pattern routine: Sends 32k of data to the PC to test for reliable communication

A couple advantages with the FPGA solution:

  • We transmit the data to the PC as soon as it’s available.  I want to characterize the actual latency, but it should be pretty small.  This is different from my load->FRAM, and then FRAM-> PC method.  This method should be much faster and we’re just not idling for 225ms.
  • Instead of transmitting the bit-sync’d raw MFM to the PC, I’m sending the delta T’s.  While this requires a little more processing on PC, the PC can more easily determine why a particularly sector can’t be read.  For instance, is the time between pulses too small? Too large?  On a fringe edge?  Plus, since the Java decodes these, I can now add sliders for “acceptable delta T’s” for each 4, 6, or 8us windows.  Before that would require modifying the firmware on the microcontroller.  I can also start to do statistical analysis on the pulse times.

I am currently doing about 430ms per track.  This sucks.  I was beating that by 100ms with my microcontroller.  However, the problem is that because a variable amount of data is going to the PC, the PC receiving code does not know when exactly to stop receiving, so there’s a wait-timer which I have to optimize.  Once I receive the minimum amount of data, I wait 100ms since the last received data, and then exit.  I’ve got to put my logic analyzers in place and figure out how to optimize it.

Denis@h3q can read a disk in 43s, which is pretty darn good.  He is using tokens like I mentioned here and here and here.  I’m transferring much more data though, which gives me more information.  I like his time, and maybe that would be a nice goal to beat!  Wow. That’s only 269ms/track.  Hrrrmm…. that’s pretty good.


I hate when this happens. 🙂

The Parallax SX microcontroller line has reached production EOL. The owner of the SX design
(www.ubicom.com) has given Parallax final notice that we are to place a lifetime buy of wafers. We
recognize this announcement will be difficult for customers who have designed the SX into their products.
We share your disappointment.


bought a Saleae Logic. Another tool for my toolbox


I bought at Saleae Logic which is an inexpensive logic analyzer.  See link here.

It isn’t nearly as fast (only samples at 24mhz max), and it doesn’t have as advanced triggering capabilities, but it does do millions->billions of samples.

So, of course, I put it to the test!  I recorded 5 million samples at 24mhz which works out to be 210ms, just slightly over a floppy track time of 203ms.  I sampled an entire track, which is FANTASTIC if you know anything about logic analyzers.  They just don’t usually store much.

Then I wrote a small C program which converts the exported binary data to RAW AMIGA MFM.  I searched for binary patterns of the sync code 0x94489, and exactly 11 of them came up.  Which means that my little code is working, the logic analyzer is correctly reading the data.  I still have to try to decode this track and see if it decodes properly, but this is pretty neat.  It’s like third party verification of what I’m doing.

I have these odd exception cases where sometimes a track refuses to read although the amiga reads it perfectly.  I’m going to get the bottom of those cases.

I hate to say this, but everything just worked tonight.  No problems. Brought back up the java client, programmed the SX, and off I went.  Pretty neat.

I’ll have more to say on this logic analyzer, but the software is really nice, clean, simple.  It does its job.

I can’t tell you for how long I’ve wanted to sample a whole track w/ some test equipment.  You can buy $$$ logic analyzers and not get this type of buffer space…. It achieves it by doing real-time samples straight to the PC’s ram……

got build environment working again, fixed uart code

So awhile ago I bought a quad core machine which runs Vista 64.  Once I had the new machine, I tried getting my build environment for the AFP working again.  NetBeans, the java IDE I use, has 64-bit support but there were a host of issues regarding FTDI drivers, jd2xx, etc which I fought over and eventually gave up.  I was getting error messages like “Can’t load IA 32-bit .dll on a AMD 64-bit platform” and there was a serious conflict between 32 bit and 64 bit JVM, JDK, DLL’s etc etc.  Pain in the butt.

I’ve had some time to work on stuff the last couple days and sit down and re-attack the problem.  I did manage to solve it by uninstalling anything Java that is 64-bit. 🙂  I believe it was just the JDK and JVM.  I also had to reinstall NetBeans because it links itself to a JDK —- once it was uninstalled, NetBeans would literally not boot with a link.  I looked all over NetBeans for something that defines a “target.”  You know, something where I can say “I want to build applications for 64-bit” or “32-bit” or whatever.  I couldn’t find it.  I uninstalled NetBeans, reinstalled it (this time it detected and recognized the 32-bit JDK), and voila, my java client now loads, builds, and runs correctly!@#

I hooked up my AFP again, and attempted to image and disk, and there were major problems.  Do you remember this post? This time it actually wasn’t that bad.  Another time somehow one of my SX28 pins were fried.

I’ve always wanted to do an extended UART data transfer test.  I’ve never really done this and I think it has been a big source of problems from the beginning.  Even though I checked the UART carefully for cycle counts(and done this 239408 times), put it on the logic analyzer, and even had someone else review it, there must have been a mistake.  I was corrupting about 3-5 bytes for every 100,000.  Not tons, but enough to show up during a disk transfer.

I started out really looking into my UART.  When bytes were corrupted, they were corrupted in exactly the same way:

The first data bit that was a 1-bit was ignored, and the data only started being received after the next one bit.  Let me give an example:

Let’s say the correct byte was : dec 138, or binary 1000 1010.  It would be received as dec 10 or 0000 1010.

correct byte might be : dec 39 or binary 0010 0111. It would be received as dec 7, or 0000 0111.

correct byte might be: dec 166 or binary 1010 0110. It might be rx’d as dec 38, or 0010 0110.

Remember, this only happened as an exception.

I eventually tweaked my UART by shortening the delay between the start bit and the first data bit, and also the time between bits by 20ns.  I’m honestly not sure why that worked, and it was mostly found by trial and error. But it had a profound and instant effect.  I was running trials and counting the number of bad bytes per 655k of transfer.  I was anywhere between 33-42 bad bytes per 655k.  When I made the change, it jumped to 0 !!

As a matter of fact, I just finished sending 284 megabytes (or 2.84 gigabits) of traffic through it without a single bit error!  I think that’s pretty decent.  The funny thing, I fired up “calc” to do some quick math, and I think the cpu interruption, or disk access, or something, caused me to lose some data.  In the actual real client, it would have automatically retransmitted the track, so it’s not the end of the world.

Once I fixed the uart, it started reading disks correctly again.

I’m pretty happy to see this thing working again.  Maybe I’ll read in some more amiga floppies tonight!

my checksum algorithm

So I’m using an XOR checksum to detect errors in the transfer between the SX and the PC.  I always thought it was a reasonably strong checksum, even though it’s still an 8 bit checksum.

I found a neat article today here.

“The XOR checksum has the highest probability of undetected errors for all checksum algorithms in this study, and is not as effective as addition-based checksums for general purpose error detection uses.”

There were around 6 or 7 different checksums presented.

More later.

redone transfer routine

So my transfer routine has been a little flaky lately. I’m not sure exactly why but I think it’s related to the number of processes I’m running. While it’s a P4 2.4, I think USB scheduling is dead last on the priority list, because it sure seems that way. My transfer protocol is pretty simple right now, or almost completely non-existent. After the PC sends a R for ReadDisk, the SX starts spewing data. There is no handshake, no acknowledgement of data — but there is a checksum. And that checksum has been failing. While I do initiate auto-retransmit, its slow and clumsy.

So tonight, I implemented an XMODEM-like protocol. Sender and receiver synchronize the start, each block is acknowledged, a NAK causes a retransmit of a block, instead of the whole track, and so on. and overall it works pretty good except for one thing. IT’S SLOW. How slow? About 2.1s per track. Yuck. At my high point with the other transfer, I was around 313-328ms per track. So like 6 times faster.

That’s way too slow. There’s a lot of back and forth with this protocol, with forced acknowledgment before the next block is transferred. The wikipedia page on Xmodem said that was one of the reasons for its replacement with other protocols like Ymodem and Zmodem.

Incidentally, I grew up on these serial based protocols, and used everything from Xmodem to Kermit to Ymodem, CIS B+, etc on BBS’s. ZModem with it’s resume feature was really tits on a stick back in the day.

Part of my problem is block size. I’m actually using 32-byte blocks because I don’t have enough memory on my SX. So that’s 374 blocks per track. 374 headers, 374 ACK’s.

34 blocks per sector, 172ms per sector. Just way way way too slow.

Since before I read from FRAM directly to USB, there was effectively no easy way to retransmit, because you can’t just backup using serial memory. And I don’t actually keep track, use, seek, of any fram byte locations. I don’t need to — I write one bit at a time, and I read back one byte at a time — but always in order, and always from start to end. The way I retransmit now is to re-read the track into memory again, and start the process all over again. In the past this worked for the few times I needed, but it seems that for whatever reason (maybe new FTDI drivers?) I’m dropping much more regularly now.

This xmodem method isn’t really 100% polished yet, but given these times I think it’s unlikely I’m going to. Gotta come up with a better method. Some in-between. Maybe some FRAM re-positioning routines?


I’ve love to hear what YOU think.


sleep inversely proportional to the AFP and no more tokens

So I have figured one thing out.

The more I work on the AFP, the less I sleep.  At least that’s how it’s been over the last two nights.  Two nights ago it was 3am, last night it was after 2am.

Ideas come and go with me.

I’ve completely scrapped the token idea.  After a couple dry runs with some amiga floppies, it seems like the density of “10”s seem to be much more popular than 100, or 1000’s.  I guess I could have told you that looking at a scope.  But anyways, I had a lot of trouble getting the idea to work, and suffice to say, I quickly abandoned it.  The point is that the tokens were saving me about 400 bytes or so on average(about 3%), not the grand 50% or whatever I had imagined.

This also avoids unnecessary complexity.

So the current read routine times the difference between the pulses and then writes a ’10’, ‘100’, or ‘1000’ to the FRAM.  I manually write out the bits for each grouping…… no loops or anything.  And plus I take advantage of the fact that the output bit, once changed to 0 really doesn’t need to change.  So basically write a 1, clock, write a 0, clock, clock, clock for the 1000.

I successfully read a floppy last night using this method, so at least my concept is stable.  This still eliminates interrupts completely (edge and timed) and eliminates the ISR, etc.  Definitely getting a little more robust.  Now, will it actually help me read more floppies?  I don’t know.  I think the next step is to start documenting HOW these floppies are failing if I can tell.

I was having a few goofy problems last night, which I think were mostly related to flow-control.  I was using putty without f.c. and my SX appeared to lockup — it wouldn’t echo commands or the actual track.  Turned off flow control completely, and it started working.  I don’t know if putty handles the dtr/dsr flow control properly.  or should I say I’m not sure if the FTDI VCP drivers support DTR/DSR flow control.  I think they do because I think I tested it in Windows under C++ earlier.

The other goofy problem I was having was related to my code spontaneously restarting.  I think it was related to using JMP’s and not using the @ before the address.  The @ causes a PAGE to be inserted before the jump.  Now I don’t know if this is true for other CJxx commands etc.

I also saw a bunch of transfer checksum errors, but I think that’s related to the number of bytes being received.  I think I have to use the beginning delimiter FF FF effectively, and stop using an absolute offset for start of frame etc.

Ahh well.

new read routine implementation started

Last night, until 3am, I spent redoing my read routine to incorporate the method discussed in the comment of the last post.

Basically, I use the time between the pulses as an indication of the data.  I store 2-bit tokens in memory to indicate the data.  I store (2) 3-bit tokens if the time between pulses was too short, or too long.

Using tokens saves me memory, memory write time, and transfer time.  In the best case, I save 50%.  In the worst case, I’m even-steven.  I don’t lose anything.

With idea help from some people at work, I’ve come up with a routine that de-tokenizes the data on the PC.  The PC now is going to be handling the byte-sync again.  I want the SX to do it, but I’ll have to compute the tokens, and then sync to the tokens.  This is down the road.  For the time being, the PC can handle it.

The routine more or less works right now — but there are some bugs in my code.  It is properly counting the time between pulses, and storing the right tokens, and increasing the bit-counter.  But it’s in a neverending loop, so something is not right.

I originally started using the edge detection register, but it was occassionally double-detecting the pulses.  So I changed it to wait for a high, then wait for a low.  This ensures that there is a complete return to high before another negative pulse detected.  I looked at this on a scope, and it’s much better than the previous method.  The one benefit that I lose, however, is that the edge detection register will detect pulses even when I’m off busy doing other things.  I’m not worried about this right now — but I may revisit in the future and add some type of low-pass filter and start using the edge detection register again.

This is first time in awhile that I hooked everything back up again, and played with the actual hardware.  Man, it was a lot of fun!  I miss this stuff.  I actually have spent a lot of time thinking about the problem, and I have recently spent some time writing some Java code —- so I haven’t been completely removed or completely stagnant on everything.  But getting the hardware out is really cool.

further development on the AFP

I’m about to bring a new member of my family into this world.  As a new proud parent, I’m completely unsure about what this means in terms of time, dedication, and so on.  I’m sure I’ll have less free time to spend on projects like this.   As a result, updates and improvements are unlikely to happen in the short term.

Something to remember is that I’ve already achieved my goal, which was loosely, “to create a device that sits in between a PC and a floppy drive that can read amiga floppies and create emulator .ADFs in a reasonable period of time.”  You know, crossing the finish line was rather anti-climactic, mainly because (as you’ve seen) there are so many smaller, but no less important, steps that all add up to a working product.

I’ve learned a fair bit on this project, and would like to go back and identify the specific problems I faced, how I eventually solved them, and why they presented such a challenge.  There were some really fundamental mistakes I made which frustrated things, and made this task harder than it should have been.  I think other beginners can learn from my mistakes, and I’d like to make a post/(maybe a paper?) on some basic do’s and dont’s.

In any event, there are a few things I want to touch up, but then I’ll be posting a copy of the java source, compiled java with instructions on how to execute, and the SX firmware.

Note that I’m not going incommunicado — I’ll be monitoring the blog and email as usual.

Last but not least, there are a number of things I really want to do with this thing, namely build a custom circuit board.  I’d like to add the writing capability, but that is a fairly large task.  There are other minor to-do’s that eventually need done.  So I’m not finished yet here.

You know, I could never have gotten as far as I have without the help of people here, especially Tim and David, and people over at the Parallax forums(Bean, pjv, Michael C, Guenther), who I’m pretty sure are tired of my posts and my frustration.

feasability of writing disks

While there are some other bug fixes and documentation to be done on what’s already been implemented, I started thinking about writing ADF’s to disk over the last few days.

While the hardware part of it is already in place, there are some things that would need done:

  • The interrupt service routine would need modified to not just read data by reacting to edges, but now extract a bit from memory and put an appropriate pulse on the writedata pin. Floppy drive specs say the pulse should be 0.1us to 1.1us wide.
  • Write an SX routine that would receive data from the PC and store it in the fram. This would need to checksum the received data and either error out or request a retransmit.
  • Write PC routines that would create the full MFM track: I’d need to create a header (easy enough, use sector numbers, 11-sector number, track number, and then checksum everything), then MFM encode the data. I’m already doing much of this in the decode portion, so I can basically do the opposite for encoding.
  • Of course there’ll need to be a “controlling” pc routine, much like my current readtrack() routine.

So the whole process of writing a disk would go something like this:

  1. Have user browse for the filename, select the file, load it into a “floppydisk” structure/object that currently exists.
  2. Rewind and make sure I’m at track 0.
  3. Create the first track’s data.
  4. Send a command to the SX to tell it to receive a track’s worth of data.
  5. Send the data making sure it’s been received correctly.
  6. Tell the SX to write the track.
  7. SX enables the write gate by lowering it, and starts pulsing out data, pausing the appropriate times for 0-bits.

I don’t see any major hangups, although there are a few timing related things to get right. I’ve got make sure 18ms has passed after each step pulse. And I’ve got to make sure 100us has passed since I’ve selected the drive(this is mainly during setup for the first trak). For the last track, I need to make sure I pause for 650us after the last track is written. I also have to make sure that the time from the write gate dropping to the first bit MUST be 8us or less. Same with the final bit, I have to raise that write-gate within 8us after the last pulse.

I’ve got to look into creating a gap, sending SYNC words, learning wtf pre-write compensation is, etc.