data rate vs throughput

Although I’m transmitting from the Parallax SX microcontroller to the PC via USB at a data rate of 2mbps, the actual throughput is much less.  First, there is some overhead, namely start and stop bits, which is 25%.  Next, I actually have to read the data from the FRAM, and this takes time.

It takes approximately 1.7us to read 1 byte, and then 5us to transmit that byte.  The 5us is 500ns (1/2mbps) * 10 bits (8 start + 1 start + 1 stop).  So 6.7us per byte.  This doesn’t include the time it takes to findsync().

So my throughput is approximately ~800 kbps on a data rate of 2mbps.

Kind of sucks, but getting to 2mbps is impossible unless I integrate the reading/findsync’ing into the transmit routine.  And I think that’s generally a bad idea.  I really want to protect my bit times so I have quality transmission.  I don’t want to get into the uh-o situation where processing something else overruns the time slot, etc.

Yeah, so right now it looks like <READDATA aka pause><SEND BYTE><READ><SEND><READ><SEND> and so on.  To get to 2mbps, I’d basically have to <send><send><send><send>.  Now, if I could utilize the “dead time” between bits to actually read the data….. well… then I might be closer.  Remember, too, that I’m bit-banging so doing something interrupt driven is out of the question.

I’m not 100% sure the PC isn’t introducing some of this delay.  Which is why I’ve been looking at revamping the read routines.  First, they are butt-ugly, and second, they don’t handle error cases well.  Actually, they hang in error cases.

I’m still floating around the idea of error CORRECTION by taking advantage of the inherent structure of MFM.  I really think that there is something here.

Next steps are to try to work out a better read routine, and then implement a retry system for bad tracks.

About the author


Amateur Electronics Design Engineer and Hacker


  • I think 1/2 of the theoretically maximum possible transfer rate is great.
    The theoretical maximum goodput with this hardware is 8 good data bits / 5 us = 1.6 Mbit/s, right?

    Your 0.8 Mbit/s (I guess this includes findsync() ?) is already most of that.
    So there’s not much blood left to squeeze out of this turnip 🙂

    And during transfers, you’re already getting a goodput of 8 good data bits / 6.7 us = 1.19 Mbit/s.

    Brainstorming a few simple things that might help a little (but might just make things worse):
    * Would it help any to do groups of 4 bytes?
    Or would that just make things even slower?

    * If you know that the read is *always* going to be at least the length of a “stop” bit, perhaps you could use a trimmed-down routine that immediately returns after starting the stop bit.
    In other words, send 9.01 bits ( 1 start + 8 data + the first 1/100 of the stop bit), and hope the read is long enough to delay for the rest of the stop bit.
    It’s OK to stretch the stop bit much longer than a normal bit, right?
    Which gives 9.01 bits * 0.5 us/bit + 1.7 us (the rest of a really long stop bit) = 6.2 us per byte, giving a goodput of 1.28 Mbit/s.

  • Nice to see you again, David! 🙂

    I think I was doing at least a portion of the math wrong. I’ve been counting clock cycles and I haven’t actually put a scope on it yet to verify my numbers. But yes, the 1.19 mbps does not include findsync(). I didn’t bother to really run the findsync() numbers because:

    1> It only runs 14 times per track. The per-byte time will control the average time since there are 1082 bytes per raw sector.

    2> findsync() will typically return really fast, because the next sector should be right up against the end of the last sector. The exception is in the track gap because it has to look through about 830 bytes +/- to find the next sync.

    Lemme put a scope on the TXDATA pin and get a real number.

    I don’t think a 4-byte (or any byte) grouping would help. The time to read a byte is fixed, and the time to send the byte is fixed. Does it matter if I read-read-read-read and then send-send-send-send ?? I’m thinking not. It would speed the sending of the data, but then there would longer times between sends. I don’t think this would help. Also, my routines are all strictly one-byte-at-a-time.

    You are right. No difference at all between a stop-bit and an idle condition, so “too long” stop bits are ok. The read times, as I mentioned, are fixed times. So I could always count on there being a long enough delay.

    The problem is that my sendusb() routine is designed for all SX–>PC communication, so I’d either need a “short” flag to skip the normal stop bit delay when sending data this way. I guess that’s easy enough to implement. Obviously when doing a

    sendusb byte1
    sendusb byte2

    I’d need a normal stop bit in between.

    That would gain me like probably about 400ns per byte. Yeah, that is significant….

    Right now, my current problem is HEAT and I can’t continue to add or do testing until I get this HEAT problem fixed. My SX is overheating and I have yet to figure out why.

  • OK. Finally got around to putting my logic analyzer on the job.

    I’m getting 8 good data bits every 8.3us. So this is yielding 963kbps. Out of a possible 1.6mbps.

    The transfer routine is actually written in SX/B for simplicity, and I initially wasn’t taking into account the “overhead” of looping, etc. Don’t ask: it’s a nested loop with multibyte compares.

    So here’s the breakdown of the 8.3us:

    2.52us for reading memory. That’s a 3.17mbps memory read.
    5.38us for transmitting. That’s a 1.47mbps transmit.
    0.42us for looping

    If you take 8.3us * 15238 bytes = 126ms per track. Java was reporting like 156 or something millseconds, so this tells me that RXTX and Java are relatively close, and that I’m not losing speed due to my sloppy java reading code.

  • OK. I’ve optimized my readfram() routine.

    It used to take 2.52us per byte. It now takes 2.20us per byte. Which takes memory reads from 3.17mbps to 3.64mbps. And improves overall track performance from 127ms to 122ms. Heck, I’ll take the free 4% improvement.

    Here’s my routine to read a Ramtron serial fram memory FM25256 (or similar) into a Parallax SX microcontroller, if anyone is interested:

    <p> MOV nsb, #8<br />
        CLR gotdata</p>
    <p> SETB SCK        'outputs are latched on falling edge, this just raises the clock<br />
                    'note that the other instructions that follow serve to allow<br />
                    'at least 28ns for a minimum clock high time.</p>
    <p> MOVB C, SO      'read the bit into carry<br />
        RL gotdata      'bring the bit via carry into gotdata, leftshifting for MSB<br />
        DEC nsb         'decrement byte counter<br />
        CLRB SCK        'drop clock (clock low min is 28ns, data not valid until Todv, or 24ns)<br />
                    'data comes valid 24ns after this clock drop.</p>
    <p> JNZ loopzz      'delay here is long enough for Todv and Tcl(clock low time)</p>

    I know there’s plenty of comments there. I’d rather overcomment than undercomment. Routine looks alot more sparse w/o comments.