Category - Badge Computer

1
Performance of the Hope Badge Computer
2
move.w vs move.l: The culprit of my circle problems
3
Graphics Primitives are working, at least mostly.
4
Frame Buffer and Text Generation now working…
5
Interesting snippet from FPGA Prototyping by Chu
6
SRAM Frame Buffer is now up and running!
7
SRAM PCB built, populated, and is testing good!
8
New SRAM PCB is being built
9
J68 can now speak with BEMICRO MAX 10 onboard SDRAM
10
Modified memory controller now functioning on BEMICRO MAX 10 board

Performance of the Hope Badge Computer

I’ve just managed to get some key components working this year, and as a result, the machine is becoming more usable.

I wanted to measure the performance of this Frankenstein machine. Ideally, I’d use standardized benchmarks but I don’t really have a compilation toolchain setup yet. I’m mostly stuck using 68K assembly.

I am in the process of investigating vbcc to use with¬†vasm, the assembler I’m using now to create machine code for this beast. Remember that while someone else created the J68 soft-cpu, that I’ve written the memory controllers and/or glue logic. This includes the interface with the video frame buffer.

I’m using four types of memory currently:

  • 4MB SRAM for the frame buffer. During CPU access, the video driver is blocked from reading it — or more accurately, just temporarily displays the CPU data randomly on the screen. I think this side-effect looks neat.
  • 8MB SDRAM using a ported memory controller. This is generic RAM.
  • Some portion of FLASH memory, the so-called User Flash Memory, which currently holds the code. Which is specifically VUBUG.txt.
  • Some portion of FPGA block memory, the m9k’s, which holds microcode, registers, and so on. Also an amount for working RAM.

The design decisions surrounding memory architecture, video frame buffer, and general bus connectivity seems to be a large driver of performance of this class of computers. I haven’t read enough (although I have some earlier books) to understand all the factors, but I know this much:

  • Driving the video is one of the largest tasks of the computer. Since I’m currently using a large VGA monitor for convenience, instead of the LCD, I’m driving 640 x 480 x 12-bit color. Horizontal Refresh ~31.25khz, Vertical 60hz. 25mhz pixel clock. This is about 300mbps +/-. No joke, even for old school, low-res-ish stuff.
  • Without dual-ported memory that takes too many pins to be practical, you have to take turns between the CPU accessing the memory, and the video driver accessing it. You’ve got to be careful not to starve your video driver.
  • Every instruction has to be fetched, 16-bits at a time, from our FLASH memory. The flash memory is 32-bits wide, but for ease-of-use, and first generation, I’m only packing each flash memory word with 16-bits. I still have to figure out “where to temporarily store” the next 16-bits. And do so without causing a new memory bus cycle request. I think my memories are faster than the cycle…..so if I can’t avoid this, it might not be able to be optimized.

I’m going to study some older 68K implementations to really understand how the early designers were able to eeek out so much power.

So some early measurements include:

  • I can execute about 4.1 million NOPs per second on the machine. This includes fetching them from FLASH, and decode / executing them. The NOP is a 16-bit instruction, and requires just a single fetch.
  • I can also execute about 1 million MULU’s per second. These are word multiplications. Word x Word = Long. This is the more interesting result to me, because MULU’s are expensive operations. Maybe only Divides are more. So this really indicates how much work I could get done. I did a constant * register, with the result stored in the register.

These results to me are encouraging. While a real 68000 running at 66.67mhz would produce a higher level of MIPS, I think this produces enough to get some stuff done. I’m going to test out some of graphics primitives next.

How many circles can I draw per second?

How many lines can I draw per second?

I really have to understand what the timing breakdown of each instruction is if I intend to optimize this.

move.w vs move.l: The culprit of my circle problems

If you saw my recent post about malformed circles, you’ll see that I was pulling my hair out.

I identified the root cause. It turns out that I was writing 0x00000fff, which caused the pixel I was setting to be black, 0x0000. Worst of all, it was also writing the pixel next to it! It was writing 0x0FFF to the pixel to the right, which causes the pixel to the right to be all white. Pretty insidious! Had it not produced anything on the display, surely I would have tracked down this sooner.

I’ve been playing around with a variety of solutions, and compiled the midpoint circle algorithm C code into assembly, and the move.l instruction was created by the compiler, from my faulty C code.

The correct instruction was one that just modified the 16-bit word at the address in question, the move.w #$0fff, (a4) instruction. The other one, the move.l #4095, (a5) was the problem one.

UPDATE: Just to be clear, there’s no problem or bug with the compiler. I used the wrong data type, likely a 32-bit int or similar(or perhaps a uncast constant assumed to be 32-bit), in the C code, which caused the unwanted move.l instruction to be created. This was entirely my fault. For what it’s worth, this is my first time with more than 10 lines of 68K assembly in decades. I wasn’t good back then, either. ūüôā

I still have to create a toolchain for my machine. Remember there are no sample programs, no documentation, no faq’s, so I’m still figuring out how all of this works. I don’t have a properly configured C compiler, with memory areas dedicated for stack, heap, and lord knows whatever else I have to configure. I’m happy to be turning the corner from hardware development to something that is actually USABLE from a software developer perspective.

 

Graphics Primitives are working, at least mostly.

So we’ve got a frame buffer now which means we can start writing 68K routines to write graphics into it.

So here I’m playing around with lines and circles. I’m using a random line drawing routine from the Atari that I found. I’m using the C code from the Midpoint Circle Algorithm on wikipedia, that has been compiled, and then has the assembly modified to work without support for negative numbers.

I’m seeing some graphic defect, but I’m not sure if that’s due to

  • A defect in the assembly of the Midpoint Circle algorithm
  • An existing cpu write problem (sharing SRAM)
  • a video display issue (also sharing SRAM)
  • an artifact of the built-in monitor upscaling algorithm

The C code runs fine in windows without any similar effect.

Here’s the code:

; }
; void drawcircle(short x0,short y0,short radius)
; {
_drawcircle:
lea _putpixel.L,A2
; short y = 0;
clr.w D2
; short err = 0;
;clr.w D6
move.l #$8000, D6 ; this is an arbitrary number since we can't easily handle negative numbers.
; while (x >= y)
drawcircle_1:
cmp.w D2,D3
blt drawcircle_3
; {
; putpixel(x0 + x, y0 + y);
move.w D5,D0
add.w D2,D0
ext.l D0
move.w D4,D1
add.w D3,D1
ext.l D1
jsr (A2)

; putpixel(x0 + y, y0 + x);
move.w D5,D0
add.w D3,D0
ext.l D0
move.w D4,D1
add.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 - y, y0 + x);
move.w D5,D0
add.w D3,D0
ext.l D0
move.w D4,D1
sub.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 - x, y0 + y);
move.w D5,D0
add.w D2,D0
ext.l D0
move.w D4,D1
sub.w D3,D1
ext.l D1
jsr (A2)

; putpixel(x0 - x, y0 - y);
move.w D5,D0
sub.w D2,D0
ext.l D0
move.w D4,D1
sub.w D3,D1
ext.l D1
jsr (A2)

; putpixel(x0 - y, y0 - x);
move.w D5,D0
sub.w D3,D0
ext.l D0

move.w D4,D1
sub.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 + y, y0 - x);
move.w D5,D0
sub.w D3,D0
ext.l D0
move.w D4,D1
add.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 + x, y0 - y);
move.w D5,D0
sub.w D2,D0
ext.l D0
move.w D4,D1
add.w D3,D1
ext.l D1
jsr (A2)

; if (err <= 0) cmp.l #$8000,D6 bgt.s drawcircle_4 ; { ; y += 1; addq.w #1,D2 ; err += 2*y + 1; move.w D2,D0 asl.l #1,D0 addq.w #1,D0 add.w D0,D6 drawcircle_4: ; } ; if (err > 0)
cmp.l #$8000,D6
ble.s drawcircle_6
; {
; x -= 1;
subq.w #1,D3
; err -= 2*x + 1;
move.w D3,D0
asl.l #1,D0
addq.w #1,D0
sub.w D0,D6
drawcircle_6:
bra drawcircle_1
drawcircle_3:
rts

You also see a remnant from my previous text generation in the upper left hand corner.

I’m open to thoughts on the circle problem!

Frame Buffer and Text Generation now working…

Now that I have the 4MB SRAM board installed, I now have a frame buffer that can keep up with the video bandwidth rate.

The resolution and color depth is currently 800 x 480 x 12-bit. The computing shield I’m using only supports 12-bits, but my LCD can do 24-bit.

Essentially how my frame buffer works is that you write the 12-bit color value into an address in memory that corresponds to a location on the screen.

See Framebuffer at Wikipedia.

The 4MB SRAM gets mapped to $820 000. The color values are stored as 16-bit words like this 0000_RRRR_GGGG_BBBB. Top 4 bits are not used, for now.

One 68K instruction, for example, ¬†“move.w #$0F00,$820000” will put a single red pixel, at location 0,0.

I also have text generation working:

This works a similar way with a character buffer. Another part of ram is being used to store a 100 x 30 character buffer. If you write a 7-bit ASCII value to the lower portion of a 16-bit word at an address that corresponds to the location, then that character will be displayed.

The text layer is independent of the frame buffer, and is overlaid on top of that graphics layer.

This adds pretty powerful capabilities to the badge computer, and now I have to start writing some utility routines to do higher level functions like drawing a line or a circle.

 

Interesting snippet from FPGA Prototyping by Chu

This is exactly the problem that I’ve been facing for the last couple weeks. I could get the memory tests to pass at 100% if only the CPU was talking with the external SRAM. As soon as I added in my time slot system, where every other cycle is for the CPU and the video display circuit, then the test would fail miserably.

What I settled on was simply disabling access to the SRAM from the video display during the fairly short interval that the CPU was accessing. This guarantees that the CPU has relatively error free access. The suggestion above is to only allow writing video memory during the blanking interval, and I think that’s a good idea too.

I still see some small percentage of errors both on the CPU side and the video driver side, but the rate is small enough to not cause any problems currently.

I’ll revisit this noise issue later, but for now, it kinda gives a little Max Headroom kinda vibe to the project, and I like this!

For what it’s worth, knowing that the “struggle is real” and that what I’m dealing with is a conventional problem is refreshing. I’m sure the early computer designers also ran into problems…..

SRAM Frame Buffer is now up and running!

Exciting times for the Hope Badge Computer.

Tonight, I’ve worked out most of the kinks with interfacing the new SRAM module which you can see in posts below this, to the J68 soft-core CPU.

This means that I can now write to memory locations using 68000 assembly, and display color graphics on the screen.

The current resolution is 800 x 480, which matches the LCD. I have attached a VGA LCD Monitor to the FPGA. That interface, the computing shield that you can also see a couple posts down, uses a 12-bit DAC, therefore, the current color depth is 12-bits. Which works out to be 4096 colors. I can support 16-bits on the 7.0 touchscreen, or 16-bits using a better/different DAC for VGA. This is without any other modifications.

The plan is to get to 24-bit color, but everything in due time!

Time to start practicing my 68K assembly!

SRAM PCB built, populated, and is testing good!

The circuit board arrived from Elecrow PCB in China, and it works without modification! My friend Brian from Canada helped me out again, laying out this PCB in short order. He did a great job!

Soldering the half-mm pitch and MEC6-140 connector turned out to be quite a challenge. This is mostly because I’m out of practice. I have since also added the necessary filtering caps to ensure a clean signal.

Here’s an image of the board attached to the FPGA eval board. It’s tiny. 1.9 x 1.6 inches!

I think it looks sharp, but more importantly, the SRAM has 10ns access time consistently, and the interface is easy peasy! No more messing around with SDRAM. We still have the 8MB SDRAM available, albeit with a longer latency.

This circuit board will be our frame buffer. The CPU will (mostly) write to it, and the video driver will read from it. The 4MB size is plenty of room for high resolution and color depth. Even better there’s room for another 4MB if need be!

New SRAM PCB is being built

So up to two SRAMs can be installed on this tiny 2″ x 2″ board. To the right is the connector, and the SRAMs are on the left. With current SRAM sizes, you can install 32-megabit SRAMs. That’s 4 Megabyte each. This will be in addition to the 8MB onboard SDRAM.

Our current memory controller, while integrated and functioning fine, is just too slow. I already have the SRAMs and the connectors, from my previous attempt hardwire attempt. See bottom photo at previous link. But there were hardware “bugs”, maybe solder bridges or wiring mistakes, that I simply didn’t feel like messing around with. This PCB is a heck of a lot cleaner solution, and the chance for problems will be lower. The signal quality will be much better.

I’m in the process of writing a Finite State Machine to perform an independent test of the memory. And then I’ve got to find a place in the memory map for it, and then write some glue logic to integrate it. This should be easier now that I’ve done the FLASH(the UFM) and the SDRAM.

The SDRAM controller has really high latency (on the order of 8+ cycles) and while there were probably workarounds, the single-cycle 10ns latency of these SRAMs is just so attractive, that I think I’m shooting myself in the foot by not chasing the solution down earlier.

I submitted this PCB to Elecrow PCB in China. I had originally tried using 3pcb/pcbway, but their lack of communication and bait-and-switch pricing quote practically ensured my lack of business. Elecrow, on the other hand, has been great with communication. Their pricing is great. I ordered red-colored PCBs for no extra charge and I’m really looking forward to getting them. I paid extra for their rush service, and for fedex. I could have them as early as the end of the week, but we’ll see!

Again, I have to thank my friend Brian for his kicad skills. It would have taken me much longer, with questionable results.

J68 can now speak with BEMICRO MAX 10 onboard SDRAM

So for the longest time, I’ve wanted to use the onboard 8MB SDRAM that is present on the BEMICRO MAX 10 fpga eval board. This is now a reality!

I’ve successfully integrated a controller with some glue logic to connect to the J68.

I’ve mapped the $20000-$81FFFE to the SDRAM. All calls to ROM (stored on-FPGA FLASH) are retrieved properly, some low level RAM calls access the on-FPGA M9K memory blocks, and the UARTs are all handled fairly seemlessly.

Here’s the very simple 68K memory check routine. Obviously this can be expanded on, but it’s passing!!

lea $81fffe,a5
lea $20000,a6

chklop:

move.w a5,(a5)
andi.w #$0000,d5
move.w (a5),d5
cmp.w a5,d5
bne.w printfailz
suba.l #$000002,a5
cmpa.l a5,a6
bne.w chklop
lea passesz,a0
bsr writs
rts

sdram_pass

Sometimes the results of success are with minimal¬†fanfare. That’s ok with me. I understand how important this is to the project. With the J68 now being able to speak with the memory, there’s no limit in the number of applications.

There’s much more to do, though.

  • There is so much room for optimization everywhere.
    • The glue logic is probably very conservative. Working takes priority over speed.
    • I’ve got the J68 CPU at 66.67mhz, but I’ve probably got room to take it to around 90mhz.
    • The memory controller itself doesn’t allow for queued up reads.
    • There’s no cache, which should really help things.
  • I’ve got to add a simple priority arbiter for the wishbone interface to the memory. Other things have got to have access too. Like the video driver.
  • I’m currently using the ROM monitor (see previous posts for link) VUBUG.TXT to boot. This is unnecessary but we barely know what we’re doing here. Eventually, I’d like to pare down the monitor, and get the UART setup code for the console port (only a few lines of assembly), and keep some of the utility routines. Booting mostly our own code is the goal.
  • I’ve managed to add a new command the “f”-command for finally fu*king working to the rom monitor. It calls a batch of assembly, and that’s where my SDRAM test code from above sits. This gives us a way of calling our code and having access to some form of library routines.
  • Right now, I’ve got to recompile the whole system, and reprogram the whole system. I should be able to rewrite just the MIF in on-fpga flash (UFM, as it’s called) which contains memory initialization code containing the 68K machine code which spit out from the assembler.
  • and much more……..

I did manage to write a module (or two) that drives a small sainsmart serial LCD:

minilcd

This allows me to display up to (8) 32-bit numbers in hex on the display at one time. Could be really useful in the future.

I also bought one of these Papilio Computing shield, because we need connectors! This is the cleanest presentation I’ve found, and should interface easy enough.

computingshield

The most important interfaces are probably the VGA, for a monitor, and a couple of PS/2’s for keyboard and mice. The serial ports and audio could be real useful, too. Certainly the SDCARD slot.

So the project is moving along nicely!

 

Modified memory controller now functioning on BEMICRO MAX 10 board

So after some minor heartache, I’ve managed to get a working memory controller on the BEMICRO Max 10 board that is easy to use. It uses a wishbone interface.

The heart of the controller is from here but there were a few problems with it:

  • It uses a burst mode of 2. For 16-bit RAM like the¬†IS42S16400 chip that is found onboard the Max 10, this means the user interface is 32-bits wide. The planned use for this memory controller is with the J68 softcore 68000 CPU in our Hope Badge Computer. As a result, this CPU needs a 16-bit wide interface.
  • It didn’t support byte masking. Byte masking is where you ask the controller to just return a single-byte from the lower or upper portion of the particular column location. This means that you might want the [15:8] portion of the 16-bit word, on the [7:0] portion of the 16-bit word. The CPU needs to support opcodes like “MOVE.B” — and needs a memory subsystem that can support it.
  • There was a bug in the synthesizable test bench code sdram_rw.v. The bug involves the Maximum number of reads/writes during the test. This was set to an arbitrary 200,000 32-bit writes, which works out to be about 800KB. Well the chip is 8MB, so clearly this isn’t right. The right number is 2097152. * 4 (aka 32-bits) = 8,388,608. The right number of bytes.

The changes were relatively minor for the controller itself:

  • Changing the MODE register from¬†12’b000000110001 to¬†12’b000000110000. (Set burst length from 2 to 1)
  • Removing the WRITE1_ST and READ_PRE_ST states from the “data” state machine, simply skipping to the next state.
  • Changing the wishbone interface bus widths to 16-bit instead of 32-bit
  • Adding the SEL_I() wishbone interface to support byte-masking. I think this is right choice looking at the WishBone Spec.
  • Pass the wishbone byte selection through to the DQM pins on the memory chip.

These changes will decrease the latency for a completed single cycle read by one. I have some future plans to add a cache to the front of the memory. I’ve got to read more about how these interfaces work and potentially add some priority arbitration in front of the controller.

This brings me one step closer to integrating the onboard SDRAM to the J68 softcore.