Feasibility & Architecture Sprint
$3k–6k1–2 weeks. Spec, architecture, resource & timing budget, and a clear go/no-go before you commit a dollar to development.
Best first stepVerified FPGA acceleration
I design, verify, and deliver production-ready FPGA accelerators for compute-heavy workloads. AI-augmented for speed, sign-off-rigorous so it's safe for silicon — every module passes synthesis and bit-exact golden-model simulation before it ships.
module gemm_accel // 8×8 systolic · Q1.15 ghdl analyze ........ PASS ghdl elaborate ...... PASS vivado ooc synth ..... PASS sim vs golden ........ BIT-EXACT timing @200 MHz ....... MET
Why teams call me
CPU/GPU paths that are too slow, too power-hungry, or too costly to scale. FPGAs can deliver an order-of-magnitude win — but design, verification, and timing closure are where projects stall. I own that, end to end.
Output-stationary systolic and streaming dataflows that keep the silicon fed and hit your cycle budget.
Fixed-point, resource-aware designs that fit cheap parts and run cool — no datacenter GPU required.
Every block verified against a golden model and closed in timing — so it works on the board, not just in a demo.
Services
Start with a low-risk feasibility sprint; scale up to full delivery, verification, or ongoing capacity.
1–2 weeks. Spec, architecture, resource & timing budget, and a clear go/no-go before you commit a dollar to development.
Best first stepSynthesizable, verified RTL + block design + simulation, delivered and integrated to your board.
Most popularTake your existing RTL to a verified, timing-closed, golden-checked, sign-off-ready state.
De-risk a buildOngoing senior FPGA capacity: architecture reviews, design, and mentoring on a monthly retainer.
OngoingThe approach
I run an AI-augmented RTL pipeline that generates modules fast — then gates every one through real verification before it's accepted. You don't get "AI code." You get verified designs, delivered faster because verification is automated and continuous.
AI-assisted RTL from a precise, human-authored spec.
GHDL analyze + elaborate; Vivado out-of-context synthesis.
Simulate bit-exact against a Python golden model — with regression gates.
Timing closure and resource sign-off on the target part.
Nothing ships unverified. If it doesn't pass, it doesn't go in the build.
Selected work
A dense matrix-multiply (GEMM) accelerator for a free-toolchain Zynq-7020 (PYNQ-Z2 / Zybo Z7-20). An 8×8 output-stationary systolic array of 64 DSP slices, Q1.15 fixed-point, fed from DDR over AXI3 HP ports with ping-pong double-buffering to hide memory latency.
Verified end-to-end: each module checked in synthesis and simulated bit-exact against a Python golden model with identical rounding/saturation. Targets timing at 200 MHz on a –1 speed grade — and builds in free Vivado Standard Edition, no paid license.
Architecture, RTL, verification harness, and block-design integration — delivered with my AI-augmented pipeline.
How we work
We pin down the workload, constraints, and target part. Fixed-price, low risk.
Spec, dataflow, and a resource + timing budget you can trust before building.
AI-augmented RTL implementation — fast, modular, and reviewable.
Synthesis + bit-exact golden simulation + timing closure. Sign-off quality.
Integrated to your board, documented, with support through bring-up.
About
I'm an FPGA / RTL engineer with [X]+ years building acceleration and signal-processing hardware on Zynq and AMD-Xilinx devices [add 1–2 past roles or domains]. I specialize in the hard parts teams get stuck on: systolic and streaming dataflows, fixed-point DSP, AXI memory paths, clock-domain crossing, and timing closure.
My edge is method. I pair deep hardware judgment with an AI-augmented pipeline that makes delivery fast without cutting corners on verification — because in silicon, "looks right" isn't good enough. One engineer owns your result and signs off on it.
Let's talk →FAQ
Unverified HDL is risky — whoever writes it. That's the whole point of my process: generation is fast, but every module is gated through synthesis and bit-exact golden-model simulation before it's accepted. You get the speed without the risk.
Most start with a fixed-price 1–2 week feasibility sprint: I deliver a spec, architecture, and a resource/timing budget with an honest go/no-go. If we proceed, that work credits toward full design & delivery.
AMD-Xilinx Zynq / 7-series and the Vivado + GHDL toolchain, VHDL-2008, AXI3/AXI4, DSP48, fixed-point DSP, MMCM/CDC. [Add Verilog/Intel-Quartus/etc. if relevant to you.]
Yes — fully remote, working with teams worldwide. Engagements are scoped and priced up front so there are no surprises.
By outcome, not by the hour: fixed-price sprints and project quotes, or a monthly retainer for ongoing capacity. You always know the number before we start.
Get started
Book a 20-minute call, or send a line about your workload and constraints. I'll tell you honestly whether FPGA is the right move.