Author - keith

move.w vs move.l: The culprit of my circle problems
Graphics Primitives are working, at least mostly.
Frame Buffer and Text Generation now working…
Interesting snippet from FPGA Prototyping by Chu
SRAM Frame Buffer is now up and running!
SRAM PCB built, populated, and is testing good!
New SRAM PCB is being built
J68 can now speak with BEMICRO MAX 10 onboard SDRAM
Modified memory controller now functioning on BEMICRO MAX 10 board
Something old, something new: building a NAS with a few older parts

move.w vs move.l: The culprit of my circle problems

If you saw my recent post about malformed circles, you’ll see that I was pulling my hair out.

I identified the root cause. It turns out that I was writing 0x00000fff, which caused the pixel I was setting to be black, 0x0000. Worst of all, it was also writing the pixel next to it! It was writing 0x0FFF to the pixel to the right, which causes the pixel to the right to be all white. Pretty insidious! Had it not produced anything on the display, surely I would have tracked down this sooner.

I’ve been playing around with a variety of solutions, and compiled the midpoint circle algorithm C code into assembly, and the move.l instruction was created by the compiler, from my faulty C code.

The correct instruction was one that just modified the 16-bit word at the address in question, the move.w #$0fff, (a4) instruction. The other one, the move.l #4095, (a5) was the problem one.

UPDATE: Just to be clear, there’s no problem or bug with the compiler. I used the wrong data type, likely a 32-bit int or similar(or perhaps a uncast constant assumed to be 32-bit), in the C code, which caused the unwanted move.l instruction to be created. This was entirely my fault. For what it’s worth, this is my first time with more than 10 lines of 68K assembly in decades. I wasn’t good back then, either. ūüôā

I still have to create a toolchain for my machine. Remember there are no sample programs, no documentation, no faq’s, so I’m still figuring out how all of this works. I don’t have a properly configured C compiler, with memory areas dedicated for stack, heap, and lord knows whatever else I have to configure. I’m happy to be turning the corner from hardware development to something that is actually USABLE from a software developer perspective.


Graphics Primitives are working, at least mostly.

So we’ve got a frame buffer now which means we can start writing 68K routines to write graphics into it.

So here I’m playing around with lines and circles. I’m using a random line drawing routine from the Atari that I found. I’m using the C code from the Midpoint Circle Algorithm on wikipedia, that has been compiled, and then has the assembly modified to work without support for negative numbers.

I’m seeing some graphic defect, but I’m not sure if that’s due to

  • A defect in the assembly of the Midpoint Circle algorithm
  • An existing cpu write problem (sharing SRAM)
  • a video display issue (also sharing SRAM)
  • an artifact of the built-in monitor upscaling algorithm

The C code runs fine in windows without any similar effect.

Here’s the code:

; }
; void drawcircle(short x0,short y0,short radius)
; {
lea _putpixel.L,A2
; short y = 0;
clr.w D2
; short err = 0;
;clr.w D6
move.l #$8000, D6 ; this is an arbitrary number since we can't easily handle negative numbers.
; while (x >= y)
cmp.w D2,D3
blt drawcircle_3
; {
; putpixel(x0 + x, y0 + y);
move.w D5,D0
add.w D2,D0
ext.l D0
move.w D4,D1
add.w D3,D1
ext.l D1
jsr (A2)

; putpixel(x0 + y, y0 + x);
move.w D5,D0
add.w D3,D0
ext.l D0
move.w D4,D1
add.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 - y, y0 + x);
move.w D5,D0
add.w D3,D0
ext.l D0
move.w D4,D1
sub.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 - x, y0 + y);
move.w D5,D0
add.w D2,D0
ext.l D0
move.w D4,D1
sub.w D3,D1
ext.l D1
jsr (A2)

; putpixel(x0 - x, y0 - y);
move.w D5,D0
sub.w D2,D0
ext.l D0
move.w D4,D1
sub.w D3,D1
ext.l D1
jsr (A2)

; putpixel(x0 - y, y0 - x);
move.w D5,D0
sub.w D3,D0
ext.l D0

move.w D4,D1
sub.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 + y, y0 - x);
move.w D5,D0
sub.w D3,D0
ext.l D0
move.w D4,D1
add.w D2,D1
ext.l D1
jsr (A2)

; putpixel(x0 + x, y0 - y);
move.w D5,D0
sub.w D2,D0
ext.l D0
move.w D4,D1
add.w D3,D1
ext.l D1
jsr (A2)

; if (err <= 0) cmp.l #$8000,D6 bgt.s drawcircle_4 ; { ; y += 1; addq.w #1,D2 ; err += 2*y + 1; move.w D2,D0 asl.l #1,D0 addq.w #1,D0 add.w D0,D6 drawcircle_4: ; } ; if (err > 0)
cmp.l #$8000,D6
ble.s drawcircle_6
; {
; x -= 1;
subq.w #1,D3
; err -= 2*x + 1;
move.w D3,D0
asl.l #1,D0
addq.w #1,D0
sub.w D0,D6
bra drawcircle_1

You also see a remnant from my previous text generation in the upper left hand corner.

I’m open to thoughts on the circle problem!

Frame Buffer and Text Generation now working…

Now that I have the 4MB SRAM board installed, I now have a frame buffer that can keep up with the video bandwidth rate.

The resolution and color depth is currently 800 x 480 x 12-bit. The computing shield I’m using only supports 12-bits, but my LCD can do 24-bit.

Essentially how my frame buffer works is that you write the 12-bit color value into an address in memory that corresponds to a location on the screen.

See Framebuffer at Wikipedia.

The 4MB SRAM gets mapped to $820 000. The color values are stored as 16-bit words like this 0000_RRRR_GGGG_BBBB. Top 4 bits are not used, for now.

One 68K instruction, for example, ¬†“move.w #$0F00,$820000” will put a single red pixel, at location 0,0.

I also have text generation working:

This works a similar way with a character buffer. Another part of ram is being used to store a 100 x 30 character buffer. If you write a 7-bit ASCII value to the lower portion of a 16-bit word at an address that corresponds to the location, then that character will be displayed.

The text layer is independent of the frame buffer, and is overlaid on top of that graphics layer.

This adds pretty powerful capabilities to the badge computer, and now I have to start writing some utility routines to do higher level functions like drawing a line or a circle.


Interesting snippet from FPGA Prototyping by Chu

This is exactly the problem that I’ve been facing for the last couple weeks. I could get the memory tests to pass at 100% if only the CPU was talking with the external SRAM. As soon as I added in my time slot system, where every other cycle is for the CPU and the video display circuit, then the test would fail miserably.

What I settled on was simply disabling access to the SRAM from the video display during the fairly short interval that the CPU was accessing. This guarantees that the CPU has relatively error free access. The suggestion above is to only allow writing video memory during the blanking interval, and I think that’s a good idea too.

I still see some small percentage of errors both on the CPU side and the video driver side, but the rate is small enough to not cause any problems currently.

I’ll revisit this noise issue later, but for now, it kinda gives a little Max Headroom kinda vibe to the project, and I like this!

For what it’s worth, knowing that the “struggle is real” and that what I’m dealing with is a conventional problem is refreshing. I’m sure the early computer designers also ran into problems…..

SRAM Frame Buffer is now up and running!

Exciting times for the Hope Badge Computer.

Tonight, I’ve worked out most of the kinks with interfacing the new SRAM module which you can see in posts below this, to the J68 soft-core CPU.

This means that I can now write to memory locations using 68000 assembly, and display color graphics on the screen.

The current resolution is 800 x 480, which matches the LCD. I have attached a VGA LCD Monitor to the FPGA. That interface, the computing shield that you can also see a couple posts down, uses a 12-bit DAC, therefore, the current color depth is 12-bits. Which works out to be 4096 colors. I can support 16-bits on the 7.0 touchscreen, or 16-bits using a better/different DAC for VGA. This is without any other modifications.

The plan is to get to 24-bit color, but everything in due time!

Time to start practicing my 68K assembly!

SRAM PCB built, populated, and is testing good!

The circuit board arrived from Elecrow PCB in China, and it works without modification! My friend Brian from Canada helped me out again, laying out this PCB in short order. He did a great job!

Soldering the half-mm pitch and MEC6-140 connector turned out to be quite a challenge. This is mostly because I’m out of practice. I have since also added the necessary filtering caps to ensure a clean signal.

Here’s an image of the board attached to the FPGA eval board. It’s tiny. 1.9 x 1.6 inches!

I think it looks sharp, but more importantly, the SRAM has 10ns access time consistently, and the interface is easy peasy! No more messing around with SDRAM. We still have the 8MB SDRAM available, albeit with a longer latency.

This circuit board will be our frame buffer. The CPU will (mostly) write to it, and the video driver will read from it. The 4MB size is plenty of room for high resolution and color depth. Even better there’s room for another 4MB if need be!

New SRAM PCB is being built

So up to two SRAMs can be installed on this tiny 2″ x 2″ board. To the right is the connector, and the SRAMs are on the left. With current SRAM sizes, you can install 32-megabit SRAMs. That’s 4 Megabyte each. This will be in addition to the 8MB onboard SDRAM.

Our current memory controller, while integrated and functioning fine, is just too slow. I already have the SRAMs and the connectors, from my previous attempt hardwire attempt. See bottom photo at previous link. But there were hardware “bugs”, maybe solder bridges or wiring mistakes, that I simply didn’t feel like messing around with. This PCB is a heck of a lot cleaner solution, and the chance for problems will be lower. The signal quality will be much better.

I’m in the process of writing a Finite State Machine to perform an independent test of the memory. And then I’ve got to find a place in the memory map for it, and then write some glue logic to integrate it. This should be easier now that I’ve done the FLASH(the UFM) and the SDRAM.

The SDRAM controller has really high latency (on the order of 8+ cycles) and while there were probably workarounds, the single-cycle 10ns latency of these SRAMs is just so attractive, that I think I’m shooting myself in the foot by not chasing the solution down earlier.

I submitted this PCB to Elecrow PCB in China. I had originally tried using 3pcb/pcbway, but their lack of communication and bait-and-switch pricing quote practically ensured my lack of business. Elecrow, on the other hand, has been great with communication. Their pricing is great. I ordered red-colored PCBs for no extra charge and I’m really looking forward to getting them. I paid extra for their rush service, and for fedex. I could have them as early as the end of the week, but we’ll see!

Again, I have to thank my friend Brian for his kicad skills. It would have taken me much longer, with questionable results.

J68 can now speak with BEMICRO MAX 10 onboard SDRAM

So for the longest time, I’ve wanted to use the onboard 8MB SDRAM that is present on the BEMICRO MAX 10 fpga eval board. This is now a reality!

I’ve successfully integrated a controller with some glue logic to connect to the J68.

I’ve mapped the $20000-$81FFFE to the SDRAM. All calls to ROM (stored on-FPGA FLASH) are retrieved properly, some low level RAM calls access the on-FPGA M9K memory blocks, and the UARTs are all handled fairly seemlessly.

Here’s the very simple 68K memory check routine. Obviously this can be expanded on, but it’s passing!!

lea $81fffe,a5
lea $20000,a6


move.w a5,(a5)
andi.w #$0000,d5
move.w (a5),d5
cmp.w a5,d5
bne.w printfailz
suba.l #$000002,a5
cmpa.l a5,a6
bne.w chklop
lea passesz,a0
bsr writs


Sometimes the results of success are with minimal¬†fanfare. That’s ok with me. I understand how important this is to the project. With the J68 now being able to speak with the memory, there’s no limit in the number of applications.

There’s much more to do, though.

  • There is so much room for optimization everywhere.
    • The glue logic is probably very conservative. Working takes priority over speed.
    • I’ve got the J68 CPU at 66.67mhz, but I’ve probably got room to take it to around 90mhz.
    • The memory controller itself doesn’t allow for queued up reads.
    • There’s no cache, which should really help things.
  • I’ve got to add a simple priority arbiter for the wishbone interface to the memory. Other things have got to have access too. Like the video driver.
  • I’m currently using the ROM monitor (see previous posts for link) VUBUG.TXT to boot. This is unnecessary but we barely know what we’re doing here. Eventually, I’d like to pare down the monitor, and get the UART setup code for the console port (only a few lines of assembly), and keep some of the utility routines. Booting mostly our own code is the goal.
  • I’ve managed to add a new command the “f”-command for finally fu*king working to the rom monitor. It calls a batch of assembly, and that’s where my SDRAM test code from above sits. This gives us a way of calling our code and having access to some form of library routines.
  • Right now, I’ve got to recompile the whole system, and reprogram the whole system. I should be able to rewrite just the MIF in on-fpga flash (UFM, as it’s called) which contains memory initialization code containing the 68K machine code which spit out from the assembler.
  • and much more……..

I did manage to write a module (or two) that drives a small sainsmart serial LCD:


This allows me to display up to (8) 32-bit numbers in hex on the display at one time. Could be really useful in the future.

I also bought one of these Papilio Computing shield, because we need connectors! This is the cleanest presentation I’ve found, and should interface easy enough.


The most important interfaces are probably the VGA, for a monitor, and a couple of PS/2’s for keyboard and mice. The serial ports and audio could be real useful, too. Certainly the SDCARD slot.

So the project is moving along nicely!


Modified memory controller now functioning on BEMICRO MAX 10 board

So after some minor heartache, I’ve managed to get a working memory controller on the BEMICRO Max 10 board that is easy to use. It uses a wishbone interface.

The heart of the controller is from here but there were a few problems with it:

  • It uses a burst mode of 2. For 16-bit RAM like the¬†IS42S16400 chip that is found onboard the Max 10, this means the user interface is 32-bits wide. The planned use for this memory controller is with the J68 softcore 68000 CPU in our Hope Badge Computer. As a result, this CPU needs a 16-bit wide interface.
  • It didn’t support byte masking. Byte masking is where you ask the controller to just return a single-byte from the lower or upper portion of the particular column location. This means that you might want the [15:8] portion of the 16-bit word, on the [7:0] portion of the 16-bit word. The CPU needs to support opcodes like “MOVE.B” — and needs a memory subsystem that can support it.
  • There was a bug in the synthesizable test bench code sdram_rw.v. The bug involves the Maximum number of reads/writes during the test. This was set to an arbitrary 200,000 32-bit writes, which works out to be about 800KB. Well the chip is 8MB, so clearly this isn’t right. The right number is 2097152. * 4 (aka 32-bits) = 8,388,608. The right number of bytes.

The changes were relatively minor for the controller itself:

  • Changing the MODE register from¬†12’b000000110001 to¬†12’b000000110000. (Set burst length from 2 to 1)
  • Removing the WRITE1_ST and READ_PRE_ST states from the “data” state machine, simply skipping to the next state.
  • Changing the wishbone interface bus widths to 16-bit instead of 32-bit
  • Adding the SEL_I() wishbone interface to support byte-masking. I think this is right choice looking at the WishBone Spec.
  • Pass the wishbone byte selection through to the DQM pins on the memory chip.

These changes will decrease the latency for a completed single cycle read by one. I have some future plans to add a cache to the front of the memory. I’ve got to read more about how these interfaces work and potentially add some priority arbitration in front of the controller.

This brings me one step closer to integrating the onboard SDRAM to the J68 softcore.

Something old, something new: building a NAS with a few older parts


About eight years ago, I bought a Dell XPS 420 pictured above. This was an Intel Core 2 Quad Q9550 running at 2.83ghz. With 8GB of ram. Despite running reliably for over 8 years, the main problem was that the case was a piece of crap. The ventilation on it was horrible, and the (2) supported hard drive slots were just ran too hot. I modded the case adding multiple 120mm fans for¬†an improvement, but this case was never going to support more than a couple drives. Never mind, also, that the motherboard was a BTX format. You’ve read right, BTX. Not ATX. Not microatx. Not ITX. BTX. As in the supposed next gen motherboard format. It didn’t pan out. Despite being (8) years old, this processor still ran like a champ, though, and due to previously virtualizing¬†the physical Windows 10 machine with VMware Workstation, I had no other use for the hardware.

Now enter These guys are my heros. I feel at home with other people storing tens to hundreds of terabytes. at home. for no reason other than to do it! I was in heaven! This was my inspiration.

Here were my goals:

  • Use my existing processor and 8GB ram. There’s nothing wrong with them, plus this processor can be OC’d, if necessary.
  • Build something with reasonable performance and obvious bottlenecks removed.
  • Make sure that the solution that I implement can be upgraded at a later date. Don’t spend money that will simply be thrown away.

Here’s what I ended up with:

PCPartPicker part list / Price breakdown by merchant

Type Item Price
CPU Intel Core 2 Quad Q9550 2.83Ghz Quad-Core Processor Purchased For $0.00
CPU Cooler Cooler Master Hyper 212 EVO 82.9 CFM Sleeve Bearing CPU Cooler $24.88 @ OutletPC
Motherboard Intel DQ45CB Micro ATX LGA775 Motherboard Purchased For $22.50
Memory Wintec Value 4GB (2 x 2GB) DDR2-800 Memory Purchased For $0.00
Memory Wintec Value 4GB (2 x 2GB) DDR2-800 Memory Purchased For $0.00
Storage PNY CS1311 120GB 2.5″ Solid State Drive $39.99 @ Best Buy
Case Cooler Master Storm Scout 2 (Black) ATX Mid Tower Case $96.99 @ Best Buy
Power Supply Antec 550W 80+ Gold Certified Fully-Modular ATX Power Supply $89.99 @ SuperBiiz
Other SAS9211-8I 8PORT Int 6GB Sata+sas Pcie 2.0 $99.35 @ Amazon
Other iStarUSA BPN-DE340SS-RED Red Color 3×5.25 to 4×3.5 SAS / SATA Trayless Hot-Swap Cage $91.00
Prices include shipping, taxes, rebates, and discounts
Total (before mail-in rebates) $474.70
Mail-in rebates -$10.00
Total $464.70
Generated by PCPartPicker 2016-11-30 17:28 EST-0500

Rationale for new parts:

CPU Cooler: Pretty popular one, and could easily dissipate the 95W from that processor. I might overclock, so that cooler gives me headroom.

Motherboard: It was $22.50 shipped, supported my processor, supported my DDR2-800 RAM, had (5) onboard SATA II’s which should support most hard drives just fine. I don’t think new compatible motherboards are made any longer for this processor. Onboard video is nice because then I can use the PCI-e 2.0 x16 slot for the SATA card below. Gigabit LAN. This part would be thrown away (along with the CPU and RAM) once I decide to upgrade.

Storage: PNY1311. This is basically the cheapest SSD that still got decent reviews. It was about $40 for the 120gb model. I’m planning on running linux. I think my current install takes around 9gb. I’ll have plenty of room. Besides, I’m not actually copying any large amounts of data to or fro on it. I don’t want to use a spinning drive for the OS boot device.

Case: Although I’m not a big fan of the shape, it has a few things going for it. It has a nice carry handle at the top. It has plenty of venting including room for (9) fans — (3) included with the purchase. It has a side window. The INTERNAL hard drive slots number about 7. The hard drive slots face OUT of the case which¬†makes AMCs very good. The (3) external slots are very useful for the drive cage below. Supports both 2.0 USB on the front panel AND 3.0. This is perfect — 2.0 for now, 3.0 for later!

Trayless hot-swap cage: This is very nice feature. I bought the 4-banger instead of the 5-banger because the 4 inserts better than the 5. It’s great to add, move, change a drive without taking the case side off. It fits because it has specific support for rails in between the three external slots. The other versions will likely require you to mod your case! Pay attention to the slots cuts into the sides of certain units.

Power Supply: 550W should be sufficient. The processor takes 95W. I wanted to add plenty of headroom for powering drives. Fully modular is really nice for cable management.

LSI 9211-8i SATA card: This card supposedly has great Linux support, and many people in the datahoarding community recommended it. I sent an email recently about support and got emailed a useful answer back within 5 minutes!! This card supports (8) SATA ports at 6.0gbps each.¬†I’m willing to bet this interface is faster than the on board ports!


  • (4) hot-swap trayless drive bays available with easy in/out. I’m a little worried about the small 80mm fan on the back of this. I’m going to monitor temps on these drives.
  • Physical support for about (10) drives. I’m using one of the bays for the OS boot drive SSD.
  • Proper power supply. Those drives might take as much as 250 watts PEAK on startup.
  • I think a 2.83ghz quad-core (non HT though!) should be beefy enough for a NAS box. We’ll see!
  • 8GB of RAM should also be sufficient for a mostly headless Ubuntu install.
  • While this is the physical SETUP, what’s the logical one going to be? I’m not sure. I think Ubuntu 16.04 LTS, SnapRaid, mergerfs, with rclone/crypt going to Amazon Clone Drive.

Wish list for next upgrade

  • USB 3.0 for fast flash drive access. I might buy a PCIe 2.0 x1 card for this soon!
  • Modern hyperthreaded processor. I have to learn where the bottlenecks for this current/new system would be!
  • Maybe dual gigabit with some bonding capability. Yes, I’d have to upgrade the switch too. Not sure how cost effective this is. I can see how 125MB/s isn’t going to cut it if I end up striping somehow.
  • ECC RAM for sure. Again, how much does that increase my overall costs?
  • I’d love to have most/all drives be external drive cages. Support for 10-12 drives externally!

Please feel free to comment below. Especially if you have ideas for how to provision this box. My LSI card comes in a couple days.

UPDATE Dec 2016

I had major compatibility problems with the DQ45CB motherboard and the LSI 9211-8i card. The main problem was that the mobo detected the LSI card as a video card, and shut off the needed onboard video. When I added a PCIe x1 video card, then Ubuntu started throwing a bunch of errors. And Windows 10 wouldn’t even boot. After days of fighting to resolve the various conflicts, I simply ended up buying some new stuff. More expensive, yes, but simpler with a better end result. I ended up adding:

  • Intel E3-1275v5 processor at 3.6ghz. Quad Core / 8 Threads. Supports ECC RAM
  • Supermicro motherboard x11ssh-F-O. Supports ECC, 8 onboard SATA ports, plenty of PCIe slots, USB 3.0, and everything else you’d expect
  • 32GB of DDR4 ECC. It’s the slowest DDR available, but this is what’s supported with the CPU/mobo combination
  • A couple WD 4TB Gold drives, and a couple WD 8TB Gold drives. These are probably the best drives in the world for my application, have long 5-year warranties, high MTBFs, and I’m really hoping I get the ROI I think I’m paying for.

This is now probably more than just a NAS. It could easily host a few VMs, with processor time to spare.


  • Windows 10. Ubuntu 16.04 LTS isn’t directly supported by my motherboard, and again, I’m taking the path of least resistance. Windows 10 is fine and supports my preferred pooling solution.
  • StableBit DrivePool. Really sweet software, and it’s driving my OS choice. mergerfs on Linux seems to still have many issues to work out, and StableBit (despite being ~$56 commercial) just seems more reliable with tons of configuration features to really dial in what I need. Love it so far.
  • rclone with encrypted Amazon Cloud Drive for backup. This is my current backup solution, although it’s lacking in many ways. ACD will accept my full 150mbps pipe, which just kicks butt for fast uploading. It’s free. I still have to come up with versioning and better backup support. rclone is fantastic, but it just lacks some key features that I’d like to layer on top.
  • SnapRaid for my hashing, silent bit corruption detection, and repair needs. This software seems plenty mature, and it essentially works but I’m simply not happy with how it’s working with the rest of my setup. Maybe my setup needs to stabilize a bit before SnapRaid will make sense?