I’ve just managed to get some key components working this year, and as a result, the machine is becoming more usable.
I wanted to measure the performance of this Frankenstein machine. Ideally, I’d use standardized benchmarks but I don’t really have a compilation toolchain setup yet. I’m mostly stuck using 68K assembly.
I am in the process of investigating vbcc to use with vasm, the assembler I’m using now to create machine code for this beast. Remember that while someone else created the J68 soft-cpu, that I’ve written the memory controllers and/or glue logic. This includes the interface with the video frame buffer.
I’m using four types of memory currently:
- 4MB SRAM for the frame buffer. During CPU access, the video driver is blocked from reading it — or more accurately, just temporarily displays the CPU data randomly on the screen. I think this side-effect looks neat.
- 8MB SDRAM using a ported memory controller. This is generic RAM.
- Some portion of FLASH memory, the so-called User Flash Memory, which currently holds the code. Which is specifically VUBUG.txt.
- Some portion of FPGA block memory, the m9k’s, which holds microcode, registers, and so on. Also an amount for working RAM.
The design decisions surrounding memory architecture, video frame buffer, and general bus connectivity seems to be a large driver of performance of this class of computers. I haven’t read enough (although I have some earlier books) to understand all the factors, but I know this much:
- Driving the video is one of the largest tasks of the computer. Since I’m currently using a large VGA monitor for convenience, instead of the LCD, I’m driving 640 x 480 x 12-bit color. Horizontal Refresh ~31.25khz, Vertical 60hz. 25mhz pixel clock. This is about 300mbps +/-. No joke, even for old school, low-res-ish stuff.
- Without dual-ported memory that takes too many pins to be practical, you have to take turns between the CPU accessing the memory, and the video driver accessing it. You’ve got to be careful not to starve your video driver.
- Every instruction has to be fetched, 16-bits at a time, from our FLASH memory. The flash memory is 32-bits wide, but for ease-of-use, and first generation, I’m only packing each flash memory word with 16-bits. I still have to figure out “where to temporarily store” the next 16-bits. And do so without causing a new memory bus cycle request. I think my memories are faster than the cycle…..so if I can’t avoid this, it might not be able to be optimized.
I’m going to study some older 68K implementations to really understand how the early designers were able to eeek out so much power.
So some early measurements include:
- I can execute about 4.1 million NOPs per second on the machine. This includes fetching them from FLASH, and decode / executing them. The NOP is a 16-bit instruction, and requires just a single fetch.
- I can also execute about 1 million MULU’s per second. These are word multiplications. Word x Word = Long. This is the more interesting result to me, because MULU’s are expensive operations. Maybe only Divides are more. So this really indicates how much work I could get done. I did a constant * register, with the result stored in the register.
These results to me are encouraging. While a real 68000 running at 66.67mhz would produce a higher level of MIPS, I think this produces enough to get some stuff done. I’m going to test out some of graphics primitives next.
How many circles can I draw per second?
How many lines can I draw per second?
I really have to understand what the timing breakdown of each instruction is if I intend to optimize this.