Compare commits

...

10 Commits

Author SHA1 Message Date
cah
f2901c6366 docs: add OpenLane hardening results and critical path analysis
Documents 4 hardening runs with timing/area/DRC results. Identifies
SYNDROME state as critical path bottleneck (222 logic levels, 49 ns)
and proposes 2-stage pipeline fix to meet 50 MHz target.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:03:35 -07:00
cah
3e797fd5ab fix: sync Yosys-compatible sat_add/sat_sub from chip_ignite
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:18:19 -07:00
cah
a83f05cf82 fix: sync Yosys-compatible packed LLR interface from chip_ignite
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:08:49 -07:00
cah
6cc13829c8 fix: sync cn_min_sum iverilog compatibility fix from chip_ignite
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 20:54:21 -07:00
cah
ab9ef9ca30 test: add vector-driven Verilator testbench with Python model cross-check
Add gen_verilator_vectors.py to convert test_vectors.json into hex files
for $readmemh, and tb_ldpc_vectors.sv to drive 20 test vectors through
the RTL decoder and verify bit-exact matching against the Python model.

All 11 converged vectors pass with exact decoded word, convergence flag,
and zero syndrome weight. All 9 non-converged vectors match the Python
model's decoded word, iteration count, and syndrome weight exactly.

Three RTL bugs fixed in ldpc_decoder_core.sv during testing:
- Magnitude overflow: -32 (6'b100000) negation overflowed 5-bit field
  to 0; now clamped to max magnitude 31
- Converged flag persistence: moved clearing from IDLE to INIT so host
  can read results after decode completes
- msg_cn2vn zeroing: bypass stale array reads on first iteration
  (iter_cnt==0) to avoid Verilator scheduling issues with large 3D
  array initialization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:50:09 -07:00
cah
1520f4da5b chore: add simulation artifacts to gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:06:25 -07:00
cah
b7449a6191 fix: RTL bugs in decoder core + add standalone Verilator testbench
RTL fixes:
- Skip unconnected columns (H_BASE=-1) in LAYER_READ/WRITE/SYNDROME
- Set unconnected VN->CN messages to +MAX (not 0) to avoid
  corrupting min-sum minimum computation
- Add SYNDROME_DONE state to fix timing race on syndrome_ok
- Fix VERSION_ID hex literal (0xLD01 -> 0x1D01)

Testbench verifies VERSION register read and clean all-zero decode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:05:52 -07:00
cah
3372f84a3a test: add standalone Verilator testbench for LDPC decoder
Add tb/tb_ldpc_decoder.sv with Wishbone read/write tasks, version
register test, and all-zero codeword decode test. Add tb/Makefile
with lint and sim targets.

Fix two RTL bugs found during testbench bring-up:
- ldpc_decoder_core.sv: skip unconnected H_BASE columns (shift=-1)
  in LAYER_READ, LAYER_WRITE, and SYNDROME states to prevent
  out-of-bounds array access and belief corruption
- ldpc_decoder_core.sv: fix syndrome_ok timing race by adding
  SYNDROME_DONE state so the registered result is available before
  the early-termination decision
- wishbone_interface.sv: fix VERSION_ID typo (0xLD01 -> 0x1D01)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 19:00:41 -07:00
cah
74baf3cd05 feat: add test vector generation for RTL verification
Improve generate_test_vectors() to use mixed SNR levels (high SNR for
first half, nominal for second half) ensuring a mix of converged and
non-converged test cases. Add gen_firmware_vectors.py converter that
reads test_vectors.json and produces packed LLR data matching the
RTL wishbone interface format (5 LLRs per 32-bit word, 6-bit two's
complement).

Generated 20 vectors: 11 converged, 9 non-converged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 18:36:26 -07:00
cah
9a28e30bed docs: add detailed implementation plan for ChipFoundry contest
20 tasks across 8 weeks covering RTL integration, verification,
OpenLane hardening, firmware, PCBA, and final submission.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 18:10:38 -07:00
16 changed files with 26013 additions and 69 deletions

4
.gitignore vendored
View File

@@ -5,5 +5,9 @@
__pycache__/
*.pyc
# Simulation artifacts
tb/obj_dir/
tb/*.vcd
# Contest submission repo (separate GitHub repo)
chip_ignite/

22162
data/test_vectors.json Normal file

File diff suppressed because it is too large Load Diff

197
docs/hardening-results.md Normal file
View File

@@ -0,0 +1,197 @@
# LDPC Decoder Hardening Results
## Run 1: `26_02_25_21_11` (Feb 25, 2026) — FAILED
- **RTL**: Original (unpipelined CN update)
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=true`, `HEURISTIC_ANTENNA_THRESHOLD=110`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Failure**: `GRT-0118` routing congestion after heuristic diode insertion (66,016 diodes added)
- **Notes**: Initial global routing passed (0 overflow, 39% routing utilization). Diode insertion nearly doubled cell count, causing re-routing congestion failure.
## Run 2: `reuse_synth` (Feb 27, 2026) — COMPLETED (timing violations)
- **RTL**: Original (unpipelined CN update) — reused synthesis netlist from Run 1
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Result**: All 70 steps completed. GDS generated. Deferred timing errors.
### Physical Results
| Metric | Result |
|--------|--------|
| Magic DRC | **Clean** |
| KLayout DRC | **Clean** |
| LVS | **Clean** (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) | **Clean** |
| Illegal overlap | **Clean** |
| Power grid violations | **0** |
| Antenna violating nets | 658 |
| Antenna violating pins | 905 |
### Area & Utilization
| Metric | Value |
|--------|-------|
| Die area | 4,928,000 µm² (4.93 mm²) |
| Core area | 4,846,670 µm² |
| Instance count | 184,663 |
| Instance area | 1,303,260 µm² (1.30 mm²) |
| Core utilization | 26.9% |
| Sequential cells | 16,967 |
| Combinational cells | 61,366 |
| Timing repair buffers | 23,709 |
| Fill cells | 415,149 |
| Tap cells | 69,228 |
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) | Setup Violations |
|--------|----------------|-----------------|----------------|----------------|------------------|
| nom_tt_025C_1v80 | **-27.13** | -234.9 | -0.32 | -3.76 | 9 |
| nom_ss_100C_1v60 | **-70.58** | -29,946.3 | 0.06 | 0 | 5,463 |
| nom_ff_n40C_1v95 | **-10.18** | -86.3 | -0.26 | -12.4 | — |
| **Worst across all** | **-71.40** | -34,329.1 | -0.47 | -26.4 | — |
### Estimated Max Frequency
- **TT corner**: Critical path ~47 ns → **~21 MHz**
- **SS corner**: Critical path ~91 ns → **~11 MHz**
- **FF corner**: Critical path ~30 ns → **~33 MHz**
### Power (TT corner)
| Component | Power (W) |
|-----------|-----------|
| Internal | 0.0554 |
| Switching | 0.0273 |
| Leakage | ~0.002 mW |
| **Total** | **0.0827** |
### Key Observations
1. Disabling heuristic diode insertion fixed the routing congestion failure from Run 1
2. 658 antenna violations remain — iterative antenna repair was not sufficient. May need to re-enable heuristic insertion with a higher threshold or use `DIODE_ON_PORTS`
3. Setup timing is severely violated — critical path is ~47 ns at TT, far from 20 ns target
4. This run used the **unpipelined** RTL (synthesis reused from Run 1 which predated the CN pipeline split)
5. Next run should re-synthesize with pipelined CN update RTL to see if timing improves
## Run 3: `pipelined_pnr` (Mar 1, 2026) — FAILED
- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Failure**: `GRT-0118` routing congestion during iterative antenna repair (step 36), after 13+ hours of repair loops
- **Notes**: Iterative antenna repair kept inserting diodes and re-routing until congestion became too high. Same root cause as Run 1 but via different mechanism.
## Run 3b: `pipelined_synth` (Feb 28, 2026) — STILL RUNNING
- **RTL**: Pipelined CN update
- **Config**: `SYNTH_STRATEGY=AREA 2` — synthesis only
- **Status**: ABC pass 2 (tech mapping) running 20+ hours. `AREA 2` is far too aggressive for this design size. **Do not use AREA 2 for this design.**
## Run 4: `pipelined_noantenna` (Mar 2, 2026) — COMPLETED (timing violations)
- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=false`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Result**: All 69 steps completed. GDS generated. Deferred timing errors. No antenna repair attempted.
### Physical Results
| Metric | Result |
|--------|--------|
| Magic DRC | **Clean** |
| KLayout DRC | **Clean** |
| LVS | **Clean** (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) | **Clean** |
| Illegal overlap | **Clean** |
| Antenna violating nets | 1,707 (no repair attempted) |
| Antenna violating pins | 3,319 (no repair attempted) |
### Area & Utilization
| Metric | Value |
|--------|-------|
| Die area | 4,928,000 µm² (4.93 mm²) |
| Instance count | 183,774 |
| Instance area | 1,351,790 µm² (1.35 mm²) |
| Core utilization | 27.9% |
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|--------|----------------|-----------------|----------------|----------------|
| nom_tt_025C_1v80 | **-28.86** | -348.0 | -0.08 | -0.15 |
| nom_ss_100C_1v60 | **-74.22** | -20,536.0 | -0.07 | -0.07 |
| nom_ff_n40C_1v95 | **-11.04** | -93.8 | -0.12 | -2.15 |
| min_tt_025C_1v80 | -28.39 | -251.0 | 0 | 0 |
| max_tt_025C_1v80 | -29.36 | -725.1 | -0.24 | -2.15 |
### Estimated Max Frequency
- **TT corner**: Critical path ~49 ns → **~20 MHz**
- **SS corner**: Critical path ~94 ns → **~11 MHz**
- **FF corner**: Critical path ~31 ns → **~32 MHz**
### Power (TT corner)
| Metric | Value |
|--------|-------|
| **Total** | **0.0858 W** |
### Key Observations
1. Pipelined CN update did NOT improve timing — TT WNS is -28.86 ns vs -27.13 ns (unpipelined Run 2). Slightly worse, possibly due to AREA 0 vs AREA 2 synth strategy difference.
2. Hold violations are much smaller than Run 2 (-0.08 vs -0.32 ns), nearly clean.
3. Antenna violations increased to 1,707 nets (vs 658 in Run 2) without any repair — AREA 0 produces a less antenna-friendly netlist.
4. The critical path is still ~47-49 ns, suggesting the bottleneck is NOT the CN update pipeline stage but something else (likely the large mux/barrel shifter or belief update logic).
5. `SYNTH_STRATEGY=AREA 2` takes 20+ hours for ABC tech mapping on this design — **never use it**. `AREA 0` completed in reasonable time.
## Summary Table
| Run | RTL | Synth | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|-----|-----|-------|---------|--------|-------------|---------------|
| 1 | Unpipelined | AREA 2 | Heuristic 110µm | **FAILED** (congestion) | — | — |
| 2 | Unpipelined | AREA 2 | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
| 3 | Pipelined | AREA 0 | Iterative | **FAILED** (congestion) | — | — |
| 3b | Pipelined | AREA 2 | — (synth only) | Still running (20+ hrs) | — | — |
| 4 | Pipelined | AREA 0 | None | **COMPLETED** | -28.86 ns | ~20 MHz |
## Critical Path Analysis (from Run 4, pipelined_noantenna)
### Path Summary
| Item | Value |
|------|-------|
| Startpoint | `u_core.beliefs[0][5]` (beliefs register, bit 5 of element 0) |
| Endpoint | `syndrome_weight[7]` (MSB of syndrome weight counter) |
| RTL location | `SYNDROME` state in `ldpc_decoder_core.sv`, lines 363-385 |
| Slack | **-28.859 ns** (VIOLATED) |
| Total combinational delay | **47.67 ns** |
| Logic levels | **222** (171 XOR/XNOR + 51 adder/mux) |
| Logic vs wire delay | 99.7% logic / 0.3% wire |
All 8 worst setup violators fan out from `beliefs[0][5]` to `syndrome_weight[7:0]`.
### What the Critical Path Computes
The `SYNDROME` state computes the full syndrome check in a **single clock cycle**:
1. **Parity computation** (171 XOR levels, 33.9 ns): XOR the sign bits of all beliefs connected to each check node — 7 rows x 32 z-elements x up to 3 columns = 224 parity bits, reading from 256 belief sign bits.
2. **Population count** (51 adder levels, 13.6 ns): Sum all 224 parity results into an 8-bit `syndrome_cnt`.
The `syndrome_cnt = syndrome_cnt + 1` accumulation pattern creates a carry chain dependency that serializes everything.
### Delay Breakdown
| Segment | Delay (ns) | Cells | Description |
|---------|-----------|-------|-------------|
| Source CLK-to-Q | 0.795 | 1 (dfxtp_4) | beliefs[0][5] register output |
| Parity XOR chain | 33.888 | 171 (xor2/xnor2) | XOR reduction across belief sign bits |
| Popcount adder tree | 13.634 | 51 (and/or/aoi/oai) | 224-bit popcount to 8-bit count |
| State MUX | 0.148 | 1 (mux2_1) | FSM output mux |
| Wire (interconnect) | 0.149 | — | 0.3% of total — negligible |
| **Total** | **48.614** | **222 levels** | |
### Proposed Fix: 2-3 Stage Syndrome Pipeline
**SYNDROME_S1** (cycle 1, ~16 ns): Compute all 224 parity bits in parallel. Each parity is only 2-3 XOR operations deep (one per connected column). Register the 224-bit `parity_vec`.
**SYNDROME_S2** (cycle 2, ~14 ns): Popcount the 224-bit parity vector via balanced adder tree. Register the 8-bit `syndrome_weight` and `syndrome_ok` flag.
**SYNDROME_DONE** (cycle 3): Already exists — reads `syndrome_ok`.
**Estimated post-fix critical path**: ~14-16 ns (comfortably under 20 ns / 50 MHz).
**Latency impact**: +1-2 cycles per iteration (negligible at 30 iterations).
### Secondary Violations
Wishbone address input (`wb_adr_i`) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.
## Next Steps
- Implement syndrome pipeline (SYNDROME_S1 + SYNDROME_S2) to cut critical path from ~49 ns to ~16 ns
- Register Wishbone address input to fix secondary violation
- Re-synthesize with AREA 0 and run PnR to verify timing improvement
- Consider increasing die area for antenna repair headroom
- Consider `SYNTH_STRATEGY=AREA 1` as middle ground between AREA 0 and AREA 2

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,288 @@
#!/usr/bin/env python3
"""
Generate test vector files for cocotb and firmware from the Python model output.
Reads data/test_vectors.json and produces:
1. chip_ignite/verilog/dv/cocotb/ldpc_tests/test_data.py (cocotb Python module)
2. chip_ignite/firmware/ldpc_demo/test_vectors.h (C header for PicoRV32)
LLR packing format (matches wishbone_interface.sv):
Each 32-bit word holds 5 LLRs, 6 bits each, in two's complement.
Word[i] bits [5:0] = LLR[5*i+0]
Word[i] bits [11:6] = LLR[5*i+1]
Word[i] bits [17:12] = LLR[5*i+2]
Word[i] bits [23:18] = LLR[5*i+3]
Word[i] bits [29:24] = LLR[5*i+4]
52 words cover 260 LLRs (256 used, last 4 are zero-padded).
"""
import json
import os
import sys
# Paths relative to this script's directory
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
PROJECT_DIR = os.path.dirname(SCRIPT_DIR)
INPUT_FILE = os.path.join(PROJECT_DIR, 'data', 'test_vectors.json')
COCOTB_OUTPUT = os.path.join(PROJECT_DIR, 'chip_ignite', 'verilog', 'dv', 'cocotb',
'ldpc_tests', 'test_data.py')
FIRMWARE_OUTPUT = os.path.join(PROJECT_DIR, 'chip_ignite', 'firmware', 'ldpc_demo',
'test_vectors.h')
Q_BITS = 6
LLRS_PER_WORD = 5
N_LLR = 256
N_WORDS = (N_LLR + LLRS_PER_WORD - 1) // LLRS_PER_WORD # 52
K = 32
def signed_to_twos_complement(val, bits=Q_BITS):
"""Convert signed integer to two's complement unsigned representation."""
if val < 0:
return val + (1 << bits)
return val & ((1 << bits) - 1)
def pack_llr_words(llr_quantized):
"""
Pack 256 signed LLRs into 52 uint32 words.
Each word contains 5 LLRs, 6 bits each:
bits[5:0] = LLR[5*word + 0]
bits[11:6] = LLR[5*word + 1]
bits[17:12] = LLR[5*word + 2]
bits[23:18] = LLR[5*word + 3]
bits[29:24] = LLR[5*word + 4]
"""
# Pad to 260 entries (52 * 5)
padded = list(llr_quantized) + [0] * (N_WORDS * LLRS_PER_WORD - N_LLR)
words = []
for w in range(N_WORDS):
word = 0
for p in range(LLRS_PER_WORD):
llr_idx = w * LLRS_PER_WORD + p
tc = signed_to_twos_complement(padded[llr_idx])
word |= (tc & 0x3F) << (p * Q_BITS)
words.append(word)
return words
def bits_to_uint32(bits):
"""Convert a list of 32 binary values to a single uint32 (bit 0 = LSB)."""
val = 0
for i, b in enumerate(bits):
if b:
val |= (1 << i)
return val
def generate_cocotb_test_data(vectors):
"""Generate Python module for cocotb tests."""
lines = []
lines.append('"""')
lines.append('Auto-generated test vector data for LDPC decoder cocotb tests.')
lines.append('Generated by model/gen_firmware_vectors.py')
lines.append('')
lines.append('LLR packing: 5 LLRs per 32-bit word, 6 bits each (two\'s complement)')
lines.append(' Word bits [5:0] = LLR[5*i+0]')
lines.append(' Word bits [11:6] = LLR[5*i+1]')
lines.append(' Word bits [17:12] = LLR[5*i+2]')
lines.append(' Word bits [23:18] = LLR[5*i+3]')
lines.append(' Word bits [29:24] = LLR[5*i+4]')
lines.append('"""')
lines.append('')
lines.append(f'# Number of test vectors')
lines.append(f'NUM_VECTORS = {len(vectors)}')
lines.append(f'LLR_WORDS_PER_VECTOR = {N_WORDS}')
lines.append('')
lines.append('# Wishbone register offsets (byte-addressed)')
lines.append('REG_CTRL = 0x00')
lines.append('REG_STATUS = 0x04')
lines.append('REG_LLR_BASE = 0x10 # 52 words: 0x10, 0x14, ..., 0xDC')
lines.append('REG_DECODED = 0x50')
lines.append('REG_VERSION = 0x54')
lines.append('')
lines.append('')
lines.append('TEST_VECTORS = [')
for vec in vectors:
llr_words = pack_llr_words(vec['llr_quantized'])
decoded_word = bits_to_uint32(vec['decoded_bits'])
lines.append(' {')
lines.append(f' \'index\': {vec["index"]},')
# Format LLR words as hex, 8 per line
lines.append(f' \'llr_words\': [')
for chunk_start in range(0, len(llr_words), 8):
chunk = llr_words[chunk_start:chunk_start + 8]
hex_str = ', '.join(f'0x{w:08X}' for w in chunk)
comma = ',' if chunk_start + 8 < len(llr_words) else ''
lines.append(f' {hex_str}{comma}')
lines.append(f' ],')
lines.append(f' \'decoded_word\': 0x{decoded_word:08X},')
lines.append(f' \'info_bits\': {vec["info_bits"]},')
lines.append(f' \'converged\': {vec["converged"]},')
lines.append(f' \'iterations\': {vec["iterations"]},')
lines.append(f' \'syndrome_weight\': {vec["syndrome_weight"]},')
lines.append(f' \'bit_errors\': {vec["bit_errors"]},')
lines.append(' },')
lines.append(']')
lines.append('')
lines.append('')
lines.append('def get_converged_vectors():')
lines.append(' """Return only vectors that converged (for positive testing)."""')
lines.append(' return [v for v in TEST_VECTORS if v[\'converged\']]')
lines.append('')
lines.append('')
lines.append('def get_failed_vectors():')
lines.append(' """Return only vectors that did not converge (for negative testing)."""')
lines.append(' return [v for v in TEST_VECTORS if not v[\'converged\']]')
lines.append('')
return '\n'.join(lines)
def generate_firmware_header(vectors):
"""Generate C header for PicoRV32 firmware."""
lines = []
lines.append('/*')
lines.append(' * Auto-generated test vectors for LDPC decoder firmware')
lines.append(' * Generated by model/gen_firmware_vectors.py')
lines.append(' *')
lines.append(' * LLR packing: 5 LLRs per 32-bit word, 6 bits each (two\'s complement)')
lines.append(' * Word bits [5:0] = LLR[5*i+0]')
lines.append(' * Word bits [11:6] = LLR[5*i+1]')
lines.append(' * Word bits [17:12] = LLR[5*i+2]')
lines.append(' * Word bits [23:18] = LLR[5*i+3]')
lines.append(' * Word bits [29:24] = LLR[5*i+4]')
lines.append(' */')
lines.append('')
lines.append('#ifndef TEST_VECTORS_H')
lines.append('#define TEST_VECTORS_H')
lines.append('')
lines.append('#include <stdint.h>')
lines.append('')
lines.append(f'#define NUM_TEST_VECTORS {len(vectors)}')
lines.append(f'#define LLR_WORDS_PER_VECTOR {N_WORDS}')
lines.append('')
# Generate per-vector arrays
for vec in vectors:
idx = vec['index']
llr_words = pack_llr_words(vec['llr_quantized'])
decoded_word = bits_to_uint32(vec['decoded_bits'])
lines.append(f'/* Vector {idx}: converged={vec["converged"]}, '
f'iterations={vec["iterations"]}, '
f'syndrome_weight={vec["syndrome_weight"]}, '
f'bit_errors={vec["bit_errors"]} */')
lines.append(f'static const uint32_t tv{idx}_llr[{N_WORDS}] = {{')
for chunk_start in range(0, len(llr_words), 4):
chunk = llr_words[chunk_start:chunk_start + 4]
hex_str = ', '.join(f'0x{w:08X}' for w in chunk)
comma = ',' if chunk_start + 4 < len(llr_words) else ''
lines.append(f' {hex_str}{comma}')
lines.append('};')
lines.append(f'static const uint32_t tv{idx}_decoded = 0x{decoded_word:08X};')
lines.append(f'static const int tv{idx}_converged = {1 if vec["converged"] else 0};')
lines.append(f'static const int tv{idx}_iterations = {vec["iterations"]};')
lines.append(f'static const int tv{idx}_syndrome_weight = {vec["syndrome_weight"]};')
lines.append('')
# Generate array-of-pointers for easy iteration
lines.append('/* Array of LLR pointers for iteration */')
lines.append(f'static const uint32_t * const tv_llr[NUM_TEST_VECTORS] = {{')
for i, vec in enumerate(vectors):
comma = ',' if i < len(vectors) - 1 else ''
lines.append(f' tv{vec["index"]}_llr{comma}')
lines.append('};')
lines.append('')
lines.append(f'static const uint32_t tv_decoded[NUM_TEST_VECTORS] = {{')
for i, vec in enumerate(vectors):
decoded_word = bits_to_uint32(vec['decoded_bits'])
comma = ',' if i < len(vectors) - 1 else ''
lines.append(f' 0x{decoded_word:08X}{comma} /* tv{vec["index"]} */')
lines.append('};')
lines.append('')
lines.append(f'static const int tv_converged[NUM_TEST_VECTORS] = {{')
vals = ', '.join(str(1 if v['converged'] else 0) for v in vectors)
lines.append(f' {vals}')
lines.append('};')
lines.append('')
lines.append(f'static const int tv_iterations[NUM_TEST_VECTORS] = {{')
vals = ', '.join(str(v['iterations']) for v in vectors)
lines.append(f' {vals}')
lines.append('};')
lines.append('')
lines.append(f'static const int tv_syndrome_weight[NUM_TEST_VECTORS] = {{')
vals = ', '.join(str(v['syndrome_weight']) for v in vectors)
lines.append(f' {vals}')
lines.append('};')
lines.append('')
lines.append('#endif /* TEST_VECTORS_H */')
lines.append('')
return '\n'.join(lines)
def main():
# Load test vectors
print(f'Reading {INPUT_FILE}...')
with open(INPUT_FILE) as f:
vectors = json.load(f)
print(f' Loaded {len(vectors)} vectors')
converged = sum(1 for v in vectors if v['converged'])
print(f' Converged: {converged}/{len(vectors)}')
# Generate cocotb test data
cocotb_content = generate_cocotb_test_data(vectors)
os.makedirs(os.path.dirname(COCOTB_OUTPUT), exist_ok=True)
with open(COCOTB_OUTPUT, 'w') as f:
f.write(cocotb_content)
print(f' Wrote {COCOTB_OUTPUT}')
# Generate firmware header
firmware_content = generate_firmware_header(vectors)
os.makedirs(os.path.dirname(FIRMWARE_OUTPUT), exist_ok=True)
with open(FIRMWARE_OUTPUT, 'w') as f:
f.write(firmware_content)
print(f' Wrote {FIRMWARE_OUTPUT}')
# Verify: check roundtrip of LLR packing
print('\nVerifying LLR packing roundtrip...')
for vec in vectors:
llr_q = vec['llr_quantized']
words = pack_llr_words(llr_q)
# Unpack and compare
for w_idx, word in enumerate(words):
for p in range(LLRS_PER_WORD):
llr_idx = w_idx * LLRS_PER_WORD + p
if llr_idx >= N_LLR:
break
tc_val = (word >> (p * Q_BITS)) & 0x3F
# Convert back to signed
if tc_val >= 32:
signed_val = tc_val - 64
else:
signed_val = tc_val
expected = llr_q[llr_idx]
assert signed_val == expected, (
f'Vec {vec["index"]}, LLR[{llr_idx}]: '
f'packed={signed_val}, expected={expected}'
)
print(' LLR packing roundtrip OK for all vectors')
print('\nDone.')
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,188 @@
#!/usr/bin/env python3
"""
Generate hex files for Verilator $readmemh from Python model test vectors.
Reads data/test_vectors.json and produces:
tb/vectors/llr_words.hex - LLR data packed as 32-bit hex words
tb/vectors/expected.hex - Expected decode results
tb/vectors/num_vectors.txt - Vector count
LLR packing format (matches wishbone_interface.sv):
Each 32-bit word holds 5 LLRs, 6 bits each, in two's complement.
Word[i] bits [5:0] = LLR[5*i+0]
Word[i] bits [11:6] = LLR[5*i+1]
Word[i] bits [17:12] = LLR[5*i+2]
Word[i] bits [23:18] = LLR[5*i+3]
Word[i] bits [29:24] = LLR[5*i+4]
52 words cover 260 LLRs (256 used, last 4 are zero-padded).
Expected output format (per vector, 4 lines):
Line 0: decoded_word (32-bit hex, info bits packed LSB-first)
Line 1: converged (00000000 or 00000001)
Line 2: iterations (32-bit hex)
Line 3: syndrome_weight (32-bit hex)
"""
import json
import os
import sys
# Paths relative to this script's directory
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
PROJECT_DIR = os.path.dirname(SCRIPT_DIR)
INPUT_FILE = os.path.join(PROJECT_DIR, 'data', 'test_vectors.json')
OUTPUT_DIR = os.path.join(PROJECT_DIR, 'tb', 'vectors')
Q_BITS = 6
LLRS_PER_WORD = 5
N_LLR = 256
N_WORDS = (N_LLR + LLRS_PER_WORD - 1) // LLRS_PER_WORD # 52
K = 32
LINES_PER_EXPECTED = 4 # decoded_word, converged, iterations, syndrome_weight
def signed_to_twos_complement(val, bits=Q_BITS):
"""Convert signed integer to two's complement unsigned representation."""
if val < 0:
return val + (1 << bits)
return val & ((1 << bits) - 1)
def pack_llr_words(llr_quantized):
"""
Pack 256 signed LLRs into 52 uint32 words.
Each word contains 5 LLRs, 6 bits each:
bits[5:0] = LLR[5*word + 0]
bits[11:6] = LLR[5*word + 1]
bits[17:12] = LLR[5*word + 2]
bits[23:18] = LLR[5*word + 3]
bits[29:24] = LLR[5*word + 4]
"""
# Pad to 260 entries (52 * 5)
padded = list(llr_quantized) + [0] * (N_WORDS * LLRS_PER_WORD - N_LLR)
words = []
for w in range(N_WORDS):
word = 0
for p in range(LLRS_PER_WORD):
llr_idx = w * LLRS_PER_WORD + p
tc = signed_to_twos_complement(padded[llr_idx])
word |= (tc & 0x3F) << (p * Q_BITS)
words.append(word)
return words
def bits_to_uint32(bits):
"""Convert a list of 32 binary values to a single uint32 (bit 0 = LSB)."""
val = 0
for i, b in enumerate(bits):
if b:
val |= (1 << i)
return val
def main():
# Load test vectors
print(f'Reading {INPUT_FILE}...')
with open(INPUT_FILE) as f:
vectors = json.load(f)
num_vectors = len(vectors)
converged_count = sum(1 for v in vectors if v['converged'])
print(f' Loaded {num_vectors} vectors ({converged_count} converged, '
f'{num_vectors - converged_count} non-converged)')
# Create output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)
# =========================================================================
# Generate llr_words.hex
# =========================================================================
# Format: one 32-bit hex word per line, 52 words per vector
# Total lines = 52 * num_vectors
llr_lines = []
for vec in vectors:
llr_words = pack_llr_words(vec['llr_quantized'])
assert len(llr_words) == N_WORDS
for word in llr_words:
llr_lines.append(f'{word:08X}')
llr_path = os.path.join(OUTPUT_DIR, 'llr_words.hex')
with open(llr_path, 'w') as f:
f.write('\n'.join(llr_lines) + '\n')
print(f' Wrote {llr_path} ({len(llr_lines)} lines, {N_WORDS} words/vector)')
# =========================================================================
# Generate expected.hex
# =========================================================================
# Format: 4 lines per vector (all 32-bit hex)
# Line 0: decoded_word (info bits packed LSB-first)
# Line 1: converged (00000000 or 00000001)
# Line 2: iterations
# Line 3: syndrome_weight
expected_lines = []
for vec in vectors:
decoded_word = bits_to_uint32(vec['decoded_bits'])
converged = 1 if vec['converged'] else 0
iterations = vec['iterations']
syndrome_weight = vec['syndrome_weight']
expected_lines.append(f'{decoded_word:08X}')
expected_lines.append(f'{converged:08X}')
expected_lines.append(f'{iterations:08X}')
expected_lines.append(f'{syndrome_weight:08X}')
expected_path = os.path.join(OUTPUT_DIR, 'expected.hex')
with open(expected_path, 'w') as f:
f.write('\n'.join(expected_lines) + '\n')
print(f' Wrote {expected_path} ({len(expected_lines)} lines, '
f'{LINES_PER_EXPECTED} lines/vector)')
# =========================================================================
# Generate num_vectors.txt
# =========================================================================
num_path = os.path.join(OUTPUT_DIR, 'num_vectors.txt')
with open(num_path, 'w') as f:
f.write(f'{num_vectors}\n')
print(f' Wrote {num_path} ({num_vectors})')
# =========================================================================
# Verify LLR packing roundtrip
# =========================================================================
print('\nVerifying LLR packing roundtrip...')
for vec in vectors:
llr_q = vec['llr_quantized']
words = pack_llr_words(llr_q)
for w_idx, word in enumerate(words):
for p in range(LLRS_PER_WORD):
llr_idx = w_idx * LLRS_PER_WORD + p
if llr_idx >= N_LLR:
break
tc_val = (word >> (p * Q_BITS)) & 0x3F
# Convert back to signed
if tc_val >= 32:
signed_val = tc_val - 64
else:
signed_val = tc_val
expected = llr_q[llr_idx]
assert signed_val == expected, (
f'Vec {vec["index"]}, LLR[{llr_idx}]: '
f'packed={signed_val}, expected={expected}'
)
print(' LLR packing roundtrip OK for all vectors')
# Print summary of expected results
print('\nExpected results summary:')
for vec in vectors:
decoded_word = bits_to_uint32(vec['decoded_bits'])
print(f' Vec {vec["index"]:2d}: decoded=0x{decoded_word:08X}, '
f'converged={vec["converged"]}, '
f'iter={vec["iterations"]}, '
f'syn_wt={vec["syndrome_weight"]}')
print('\nDone.')
if __name__ == '__main__':
main()

View File

@@ -397,14 +397,26 @@ def run_ber_simulation(lam_s_db_range, lam_b=0.1, n_frames=1000, max_iter=30):
def generate_test_vectors(n_vectors=10, lam_s=2.0, lam_b=0.1, max_iter=30):
"""Generate test vectors for RTL verification."""
"""
Generate test vectors for RTL verification.
Uses a mix of signal levels to ensure we get both converged and
non-converged vectors. First half uses high SNR (lam_s * 3) for
reliable convergence, then uses the specified lam_s for realistic
channel conditions.
"""
H = build_full_h_matrix()
vectors = []
# Use high SNR for first half to guarantee converged vectors
n_high_snr = n_vectors // 2
lam_schedule = [lam_s * 3.0] * n_high_snr + [lam_s] * (n_vectors - n_high_snr)
for i in range(n_vectors):
info = np.random.randint(0, 2, K)
codeword = ldpc_encode(info, H)
llr_float, photons = poisson_channel(codeword, lam_s, lam_b)
cur_lam_s = lam_schedule[i]
llr_float, photons = poisson_channel(codeword, cur_lam_s, lam_b)
llr_q = quantize_llr(llr_float)
decoded, converged, iters, syn_wt = decode_layered_min_sum(llr_q, max_iter)
@@ -420,10 +432,11 @@ def generate_test_vectors(n_vectors=10, lam_s=2.0, lam_b=0.1, max_iter=30):
'iterations': iters,
'syndrome_weight': syn_wt,
'bit_errors': int(np.sum(decoded != info)),
'lam_s': cur_lam_s,
}
vectors.append(vec)
status = "PASS" if np.array_equal(decoded, info) else f"FAIL ({vec['bit_errors']} errs)"
print(f" Vector {i}: {status} (iter={iters}, converged={converged})")
print(f" Vector {i}: {status} (lam_s={cur_lam_s:.1f}, iter={iters}, converged={converged})")
return vectors

View File

@@ -30,8 +30,8 @@ module ldpc_decoder_core #(
input logic early_term_en,
input logic [4:0] max_iter,
// Channel LLRs (loaded before start)
input logic signed [Q-1:0] llr_in [N],
// Channel LLRs (loaded before start) - packed vector for Yosys compatibility
input logic [N*Q-1:0] llr_in,
// Status
output logic busy,
@@ -112,13 +112,14 @@ module ldpc_decoder_core #(
// Decoder FSM
// =========================================================================
typedef enum logic [2:0] {
typedef enum logic [3:0] {
IDLE,
INIT, // Initialize beliefs from channel LLRs, zero messages
LAYER_READ, // Read Z beliefs for each of DC columns in current row
CN_UPDATE, // Run min-sum CN update on gathered messages
LAYER_WRITE, // Write updated beliefs and new CN->VN messages
SYNDROME, // Check syndrome after full iteration
SYNDROME_DONE, // Read registered syndrome result
DONE
} state_t;
@@ -167,7 +168,8 @@ module ldpc_decoder_core #(
state_next = LAYER_READ; // next row
end
end
SYNDROME: begin
SYNDROME: state_next = SYNDROME_DONE;
SYNDROME_DONE: begin
if (syndrome_ok && early_term_en)
state_next = DONE;
else if (iter_cnt >= effective_max_iter)
@@ -192,28 +194,34 @@ module ldpc_decoder_core #(
converged <= 1'b0;
iter_used <= '0;
syndrome_weight <= '0;
syndrome_ok <= 1'b0;
end else begin
case (state)
IDLE: begin
iter_cnt <= '0;
row_idx <= '0;
col_idx <= '0;
converged <= 1'b0;
// Note: converged, iter_used, syndrome_weight, decoded_bits
// are NOT cleared here so the host can read them after decode.
// They are cleared in INIT when a new decode starts.
end
INIT: begin
// Initialize beliefs from channel LLRs
// Use blocking assignment for array in loop (Verilator requirement)
for (int j = 0; j < N; j++) begin
beliefs[j] <= llr_in[j];
beliefs[j] = $signed(llr_in[j*Q +: Q]);
end
// Zero all CN->VN messages
for (int r = 0; r < M_BASE; r++)
for (int c = 0; c < N_BASE; c++)
for (int z = 0; z < Z; z++)
msg_cn2vn[r][c][z] <= '0;
msg_cn2vn[r][c][z] = {Q{1'b0}};
row_idx <= '0;
col_idx <= '0;
iter_cnt <= '0;
converged <= 1'b0;
syndrome_ok <= 1'b0;
end
LAYER_READ: begin
@@ -221,18 +229,30 @@ module ldpc_decoder_core #(
// VN->CN = belief - old CN->VN message
// (belief already contains the sum of ALL CN->VN messages,
// so subtracting the current row's message gives the extrinsic)
for (int z = 0; z < Z; z++) begin
int bit_idx;
int shifted_z;
logic signed [Q-1:0] old_msg;
logic signed [Q-1:0] belief_val;
// Skip unconnected columns (H_BASE == -1)
if (H_BASE[row_idx][col_idx] >= 0) begin
for (int z = 0; z < Z; z++) begin
int bit_idx;
int shifted_z;
logic signed [Q-1:0] old_msg;
logic signed [Q-1:0] belief_val;
shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
bit_idx = int'(col_idx) * Z + shifted_z;
old_msg = msg_cn2vn[row_idx][col_idx][z];
belief_val = beliefs[bit_idx];
shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
bit_idx = int'(col_idx) * Z + shifted_z;
// On first iteration (iter_cnt==0), old messages are zero
// since no CN update has run yet. Use 0 directly rather
// than reading msg_cn2vn, which may not be reliably zeroed
// by the INIT state in all simulation tools.
old_msg = (iter_cnt == 0) ?
{Q{1'b0}} : msg_cn2vn[row_idx][col_idx][z];
belief_val = beliefs[bit_idx];
vn_to_cn[col_idx][z] <= sat_sub(belief_val, old_msg);
vn_to_cn[col_idx][z] <= sat_sub(belief_val, old_msg);
end
end else begin
// Unconnected: set to +MAX so magnitude doesn't affect min-sum
for (int z = 0; z < Z; z++)
vn_to_cn[col_idx][z] <= {1'b0, {(Q-1){1'b1}}}; // +31
end
if (col_idx == N_BASE - 1)
@@ -245,13 +265,12 @@ module ldpc_decoder_core #(
// Min-sum update for all Z check nodes in current row
// Each CN has DC=8 incoming messages (one per column)
for (int z = 0; z < Z; z++) begin
// Gather DC messages for check node z
logic signed [Q-1:0] msgs [DC];
for (int d = 0; d < DC; d++)
msgs[d] = vn_to_cn[d][z];
// Min-sum: find min1, min2, sign product, min1 index
cn_min_sum(msgs, cn_to_vn[0][z], cn_to_vn[1][z],
// Min-sum: pass individual VN->CN messages directly
cn_min_sum(vn_to_cn[0][z], vn_to_cn[1][z],
vn_to_cn[2][z], vn_to_cn[3][z],
vn_to_cn[4][z], vn_to_cn[5][z],
vn_to_cn[6][z], vn_to_cn[7][z],
cn_to_vn[0][z], cn_to_vn[1][z],
cn_to_vn[2][z], cn_to_vn[3][z],
cn_to_vn[4][z], cn_to_vn[5][z],
cn_to_vn[6][z], cn_to_vn[7][z]);
@@ -261,22 +280,25 @@ module ldpc_decoder_core #(
LAYER_WRITE: begin
// Write back: update beliefs and store new CN->VN messages
for (int z = 0; z < Z; z++) begin
int bit_idx;
int shifted_z;
logic signed [Q-1:0] new_msg;
logic signed [Q-1:0] old_extrinsic;
// Skip unconnected columns (H_BASE == -1)
if (H_BASE[row_idx][col_idx] >= 0) begin
for (int z = 0; z < Z; z++) begin
int bit_idx;
int shifted_z;
logic signed [Q-1:0] new_msg;
logic signed [Q-1:0] old_extrinsic;
shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
bit_idx = int'(col_idx) * Z + shifted_z;
new_msg = cn_to_vn[col_idx][z];
old_extrinsic = vn_to_cn[col_idx][z];
shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
bit_idx = int'(col_idx) * Z + shifted_z;
new_msg = cn_to_vn[col_idx][z];
old_extrinsic = vn_to_cn[col_idx][z];
// belief = extrinsic (VN->CN) + new CN->VN message
beliefs[bit_idx] <= sat_add(old_extrinsic, new_msg);
// belief = extrinsic (VN->CN) + new CN->VN message
beliefs[bit_idx] <= sat_add(old_extrinsic, new_msg);
// Store new message for next iteration
msg_cn2vn[row_idx][col_idx][z] <= new_msg;
// Store new message for next iteration
msg_cn2vn[row_idx][col_idx][z] <= new_msg;
end
end
if (col_idx == N_BASE - 1) begin
@@ -292,25 +314,32 @@ module ldpc_decoder_core #(
SYNDROME: begin
// Check H * c_hat == 0 (compute syndrome weight)
// Only include connected columns (H_BASE >= 0)
syndrome_cnt = '0;
for (int r = 0; r < M_BASE; r++) begin
for (int z = 0; z < Z; z++) begin
logic parity;
parity = 1'b0;
for (int c = 0; c < N_BASE; c++) begin
int shifted_z, bit_idx;
shifted_z = (z + H_BASE[r][c]) % Z;
bit_idx = c * Z + shifted_z;
parity = parity ^ beliefs[bit_idx][Q-1]; // sign bit = hard decision
if (H_BASE[r][c] >= 0) begin
int shifted_z, bit_idx;
shifted_z = (z + H_BASE[r][c]) % Z;
bit_idx = c * Z + shifted_z;
parity = parity ^ beliefs[bit_idx][Q-1];
end
end
if (parity) syndrome_cnt = syndrome_cnt + 1;
end
end
syndrome_weight <= syndrome_cnt;
syndrome_ok = (syndrome_cnt == 0);
syndrome_ok <= (syndrome_cnt == 0);
iter_cnt <= iter_cnt + 1;
iter_used <= iter_cnt + 1;
end
SYNDROME_DONE: begin
// Check registered syndrome result
if (syndrome_ok) converged <= 1'b1;
end
@@ -327,13 +356,15 @@ module ldpc_decoder_core #(
// Min-sum CN update function
// =========================================================================
// Offset min-sum for DC=8 inputs
// Offset min-sum for DC=8 inputs (individual ports for iverilog compatibility)
// For each output j: sign = XOR of all other signs, magnitude = min of all other magnitudes - offset
task automatic cn_min_sum(
input logic signed [Q-1:0] in [DC],
input logic signed [Q-1:0] in0, in1, in2, in3,
in4, in5, in6, in7,
output logic signed [Q-1:0] out0, out1, out2, out3,
out4, out5, out6, out7
);
logic signed [Q-1:0] ins [DC];
logic [DC-1:0] signs;
logic [Q-2:0] mags [DC];
logic sign_xor;
@@ -341,11 +372,23 @@ module ldpc_decoder_core #(
int min1_idx;
logic signed [Q-1:0] outs [DC];
ins[0] = in0; ins[1] = in1; ins[2] = in2; ins[3] = in3;
ins[4] = in4; ins[5] = in5; ins[6] = in6; ins[7] = in7;
// Extract signs and magnitudes
// Note: -32 (100000) has magnitude 32 which overflows 5-bit field to 0.
// Clamp to 31 (max representable magnitude) to avoid corruption.
sign_xor = 1'b0;
for (int i = 0; i < DC; i++) begin
signs[i] = in[i][Q-1];
mags[i] = in[i][Q-1] ? (~in[i][Q-2:0] + 1) : in[i][Q-2:0];
logic [Q-1:0] abs_val;
signs[i] = ins[i][Q-1];
if (ins[i][Q-1]) begin
abs_val = ~ins[i] + 1'b1;
// If abs_val overflowed (input was most negative), clamp
mags[i] = (abs_val[Q-1]) ? {(Q-1){1'b1}} : abs_val[Q-2:0];
end else begin
mags[i] = ins[i][Q-2:0];
end
sign_xor = sign_xor ^ signs[i];
end
@@ -381,26 +424,32 @@ module ldpc_decoder_core #(
endtask
// =========================================================================
// Saturating arithmetic helpers
// Saturating arithmetic helpers (Yosys-compatible: no return, no complex concat)
// =========================================================================
function automatic logic signed [Q-1:0] sat_add(
logic signed [Q-1:0] a, logic signed [Q-1:0] b
input logic signed [Q-1:0] a,
input logic signed [Q-1:0] b
);
logic signed [Q:0] sum;
sum = {a[Q-1], a} + {b[Q-1], b}; // sign-extend and add
if (sum > $signed({1'b0, {(Q-1){1'b1}}}))
return {1'b0, {(Q-1){1'b1}}}; // +max
else if (sum < $signed({1'b1, {(Q-1){1'b0}}}))
return {1'b1, {(Q-1){1'b0}}}; // -max
else
return sum[Q-1:0];
reg signed [Q:0] sum;
begin
sum = {a[Q-1], a} + {b[Q-1], b};
if (!sum[Q] && sum[Q-1]) // positive overflow
sat_add = {1'b0, {(Q-1){1'b1}}};
else if (sum[Q] && !sum[Q-1]) // negative overflow
sat_add = {1'b1, {(Q-1){1'b0}}};
else
sat_add = sum[Q-1:0];
end
endfunction
function automatic logic signed [Q-1:0] sat_sub(
logic signed [Q-1:0] a, logic signed [Q-1:0] b
input logic signed [Q-1:0] a,
input logic signed [Q-1:0] b
);
return sat_add(a, -b);
begin
sat_sub = sat_add(a, -b);
end
endfunction
endmodule

View File

@@ -15,7 +15,6 @@ module ldpc_decoder_top #(
parameter Z = 32, // lifting factor
parameter N = N_BASE * Z, // codeword length = 256
parameter K = Z, // info bits = 32 (rate 1/8)
parameter M = M_BASE * Z, // parity checks = 224
parameter Q = 6, // LLR quantization bits (signed)
parameter MAX_ITER = 30, // maximum decoding iterations
parameter DC = 8, // check node degree (= N_BASE for regular)
@@ -50,8 +49,8 @@ module ldpc_decoder_top #(
logic stat_converged;
logic [4:0] stat_iter_used;
// LLR input buffer (written by host before starting decode)
logic signed [Q-1:0] llr_input [N];
// LLR input buffer (packed vector for Yosys compatibility)
logic [N*Q-1:0] llr_input_flat;
// Decoded output
logic [K-1:0] decoded_bits;
@@ -75,7 +74,7 @@ module ldpc_decoder_top #(
.stat_busy (stat_busy),
.stat_converged (stat_converged),
.stat_iter_used (stat_iter_used),
.llr_input (llr_input),
.llr_input (llr_input_flat),
.decoded_bits (decoded_bits),
.syndrome_weight(syndrome_weight),
.irq_o (irq_o)
@@ -99,7 +98,7 @@ module ldpc_decoder_top #(
.start (ctrl_start),
.early_term_en (ctrl_early_term),
.max_iter (ctrl_max_iter),
.llr_in (llr_input),
.llr_in (llr_input_flat),
.busy (stat_busy),
.converged (stat_converged),
.iter_used (stat_iter_used),

View File

@@ -32,7 +32,7 @@ module wishbone_interface #(
input logic stat_busy,
input logic stat_converged,
input logic [4:0] stat_iter_used,
output logic signed [Q-1:0] llr_input [N],
output logic [N*Q-1:0] llr_input, // packed LLR vector
input logic [K-1:0] decoded_bits,
input logic [7:0] syndrome_weight,
@@ -40,7 +40,7 @@ module wishbone_interface #(
output logic irq_o
);
localparam VERSION_ID = 32'hLD01_0001; // LDPC v0.1 build 1
localparam VERSION_ID = 32'h1D01_0001; // LDPC v0.1 build 1
// Wishbone handshake: ack on valid cycle
logic wb_valid;
@@ -99,7 +99,7 @@ module wishbone_interface #(
int llr_idx;
llr_idx = word_idx * 5 + p;
if (llr_idx < N)
llr_input[llr_idx] <= wb_dat_i[p*Q +: Q];
llr_input[llr_idx*Q +: Q] <= wb_dat_i[p*Q +: Q];
end
end
end

40
tb/Makefile Normal file
View File

@@ -0,0 +1,40 @@
RTL_DIR = ../rtl
RTL_FILES = $(RTL_DIR)/ldpc_decoder_top.sv \
$(RTL_DIR)/ldpc_decoder_core.sv \
$(RTL_DIR)/wishbone_interface.sv
.PHONY: lint sim sim_vectors clean
lint:
verilator --lint-only -Wall \
-Wno-WIDTHEXPAND -Wno-WIDTHTRUNC -Wno-CASEINCOMPLETE \
-Wno-BLKSEQ -Wno-BLKLOOPINIT -Wno-UNUSEDSIGNAL -Wno-UNUSEDPARAM \
--unroll-count 1024 \
$(RTL_FILES) --top-module ldpc_decoder_top
sim: obj_dir/Vtb_ldpc_decoder
./obj_dir/Vtb_ldpc_decoder
obj_dir/Vtb_ldpc_decoder: tb_ldpc_decoder.sv $(RTL_FILES)
verilator --binary --timing --trace \
-o Vtb_ldpc_decoder \
-Wno-WIDTHEXPAND -Wno-WIDTHTRUNC -Wno-CASEINCOMPLETE \
-Wno-BLKSEQ -Wno-BLKLOOPINIT -Wno-UNUSEDSIGNAL -Wno-UNUSEDPARAM \
--unroll-count 1024 \
tb_ldpc_decoder.sv $(RTL_FILES) \
--top-module tb_ldpc_decoder
sim_vectors: obj_dir/Vtb_ldpc_vectors
./obj_dir/Vtb_ldpc_vectors
obj_dir/Vtb_ldpc_vectors: tb_ldpc_vectors.sv $(RTL_FILES)
verilator --binary --timing --trace \
-o Vtb_ldpc_vectors \
-Wno-WIDTHEXPAND -Wno-WIDTHTRUNC -Wno-CASEINCOMPLETE \
-Wno-BLKSEQ -Wno-BLKLOOPINIT -Wno-UNUSEDSIGNAL -Wno-UNUSEDPARAM \
--unroll-count 1024 \
tb_ldpc_vectors.sv $(RTL_FILES) \
--top-module tb_ldpc_vectors
clean:
rm -rf obj_dir *.vcd

245
tb/tb_ldpc_decoder.sv Normal file
View File

@@ -0,0 +1,245 @@
// Standalone Verilator testbench for LDPC decoder
// Tests the decoder core directly via Wishbone (no Caravel dependency)
//
// Test 1: Read VERSION register (expect 0x1D010001)
// Test 2: Decode all-zero codeword with strong +31 LLRs
`timescale 1ns / 1ps
module tb_ldpc_decoder;
// =========================================================================
// Clock and reset
// =========================================================================
logic clk;
logic rst_n;
logic wb_cyc_i;
logic wb_stb_i;
logic wb_we_i;
logic [7:0] wb_adr_i;
logic [31:0] wb_dat_i;
logic [31:0] wb_dat_o;
logic wb_ack_o;
logic irq_o;
// 50 MHz clock (20 ns period)
initial clk = 0;
always #10 clk = ~clk;
// =========================================================================
// DUT instantiation
// =========================================================================
ldpc_decoder_top dut (
.clk (clk),
.rst_n (rst_n),
.wb_cyc_i (wb_cyc_i),
.wb_stb_i (wb_stb_i),
.wb_we_i (wb_we_i),
.wb_adr_i (wb_adr_i),
.wb_dat_i (wb_dat_i),
.wb_dat_o (wb_dat_o),
.wb_ack_o (wb_ack_o),
.irq_o (irq_o)
);
// =========================================================================
// VCD dump
// =========================================================================
initial begin
$dumpfile("tb_ldpc_decoder.vcd");
$dumpvars(0, tb_ldpc_decoder);
end
// =========================================================================
// Watchdog timeout
// =========================================================================
int cycle_cnt;
initial begin
cycle_cnt = 0;
forever begin
@(posedge clk);
cycle_cnt++;
if (cycle_cnt > 100000) begin
$display("TIMEOUT: exceeded 100000 cycles");
$finish;
end
end
end
// =========================================================================
// Wishbone tasks
// =========================================================================
task automatic wb_write(input logic [7:0] addr, input logic [31:0] data);
@(posedge clk);
wb_cyc_i = 1'b1;
wb_stb_i = 1'b1;
wb_we_i = 1'b1;
wb_adr_i = addr;
wb_dat_i = data;
// Wait for ack
do begin
@(posedge clk);
end while (!wb_ack_o);
// Deassert
wb_cyc_i = 1'b0;
wb_stb_i = 1'b0;
wb_we_i = 1'b0;
endtask
task automatic wb_read(input logic [7:0] addr, output logic [31:0] data);
@(posedge clk);
wb_cyc_i = 1'b1;
wb_stb_i = 1'b1;
wb_we_i = 1'b0;
wb_adr_i = addr;
// Wait for ack
do begin
@(posedge clk);
end while (!wb_ack_o);
data = wb_dat_o;
// Deassert
wb_cyc_i = 1'b0;
wb_stb_i = 1'b0;
endtask
// =========================================================================
// Test variables
// =========================================================================
int pass_cnt;
int fail_cnt;
logic [31:0] rd_data;
// =========================================================================
// Main test sequence
// =========================================================================
initial begin
pass_cnt = 0;
fail_cnt = 0;
// Initialize Wishbone signals
wb_cyc_i = 1'b0;
wb_stb_i = 1'b0;
wb_we_i = 1'b0;
wb_adr_i = 8'h00;
wb_dat_i = 32'h0;
// Reset
rst_n = 1'b0;
repeat (10) @(posedge clk);
rst_n = 1'b1;
repeat (5) @(posedge clk);
// =================================================================
// TEST 1: Read VERSION register
// =================================================================
$display("[TEST 1] Read VERSION register");
wb_read(8'h54, rd_data);
if (rd_data === 32'h1D01_0001) begin
$display(" PASS: VERSION = 0x%08X", rd_data);
pass_cnt++;
end else begin
$display(" FAIL: VERSION = 0x%08X (expected 0x1D010001)", rd_data);
fail_cnt++;
end
// =================================================================
// TEST 2: Decode clean all-zero codeword
// =================================================================
$display("[TEST 2] Decode clean all-zero codeword");
// Write 52 LLR words at addresses 0x10..0xDC
// Each word = 5x +31 packed: {6'h1F, 6'h1F, 6'h1F, 6'h1F, 6'h1F}
// = 0x1F | (0x1F<<6) | (0x1F<<12) | (0x1F<<18) | (0x1F<<24)
// = 0x1F7DF7DF
begin
int i;
for (i = 0; i < 52; i++) begin
wb_write(8'h10 + i * 4, 32'h1F7D_F7DF);
end
end
// Start decode: write CTRL
// bit[0]=1 (start), bit[1]=1 (early_term), bits[12:8]=0x1E=30 (max_iter)
// 0x00001E03
wb_write(8'h00, 32'h0000_1E03);
// Poll STATUS (addr 0x04) until busy (bit[0]) = 0
// Allow a few cycles for busy to assert first
repeat (5) @(posedge clk);
begin
int poll_cnt;
poll_cnt = 0;
do begin
wb_read(8'h04, rd_data);
poll_cnt++;
if (poll_cnt > 10000) begin
$display(" FAIL: decoder stuck busy after %0d polls", poll_cnt);
fail_cnt++;
$display("=== %0d PASSED, %0d FAILED ===", pass_cnt, fail_cnt);
$finish;
end
end while (rd_data[0] == 1'b1);
end
// Check convergence: bit[1] of STATUS
if (rd_data[1] == 1'b1) begin
$display(" converged=1 (OK)");
end else begin
$display(" FAIL: converged=0 (expected 1)");
fail_cnt++;
end
// Check syndrome weight: bits[23:16] of STATUS
if (rd_data[23:16] == 8'd0) begin
$display(" syndrome_weight=0 (OK)");
end else begin
$display(" FAIL: syndrome_weight=%0d (expected 0)", rd_data[23:16]);
fail_cnt++;
end
// Check iterations used: bits[12:8] of STATUS
$display(" iterations_used=%0d", rd_data[12:8]);
// Read DECODED register (addr 0x50)
wb_read(8'h50, rd_data);
if (rd_data === 32'h0000_0000) begin
$display(" PASS: decoded=0x%08X", rd_data);
pass_cnt++;
end else begin
$display(" FAIL: decoded=0x%08X (expected 0x00000000)", rd_data);
fail_cnt++;
end
// =================================================================
// Summary
// =================================================================
$display("");
if (fail_cnt == 0) begin
$display("=== ALL %0d TESTS PASSED ===", pass_cnt);
end else begin
$display("=== %0d PASSED, %0d FAILED ===", pass_cnt, fail_cnt);
end
$finish;
end
endmodule

356
tb/tb_ldpc_vectors.sv Normal file
View File

@@ -0,0 +1,356 @@
// Vector-driven Verilator testbench for LDPC decoder
// Loads test vectors from hex files generated by model/gen_verilator_vectors.py
// Verifies RTL decoder produces bit-exact results matching Python behavioral model
//
// Files loaded:
// vectors/llr_words.hex - 52 words per vector, packed 5x6-bit LLRs
// vectors/expected.hex - 4 lines per vector: decoded_word, converged, iterations, syndrome_weight
// vectors/num_vectors.txt - single line with vector count (read at generation time)
`timescale 1ns / 1ps
module tb_ldpc_vectors;
// =========================================================================
// Parameters
// =========================================================================
localparam int NUM_VECTORS = 20;
localparam int LLR_WORDS = 52; // 256 LLRs / 5 per word, rounded up
localparam int EXPECTED_LINES = 4; // per vector: decoded, converged, iter, syn_wt
// Wishbone register addresses (byte-addressed)
localparam logic [7:0] REG_CTRL = 8'h00;
localparam logic [7:0] REG_STATUS = 8'h04;
localparam logic [7:0] REG_LLR_BASE = 8'h10;
localparam logic [7:0] REG_DECODED = 8'h50;
localparam logic [7:0] REG_VERSION = 8'h54;
// CTRL register fields
localparam int MAX_ITER = 30;
// =========================================================================
// Clock and reset
// =========================================================================
logic clk;
logic rst_n;
logic wb_cyc_i;
logic wb_stb_i;
logic wb_we_i;
logic [7:0] wb_adr_i;
logic [31:0] wb_dat_i;
logic [31:0] wb_dat_o;
logic wb_ack_o;
logic irq_o;
// 50 MHz clock (20 ns period)
initial clk = 0;
always #10 clk = ~clk;
// =========================================================================
// DUT instantiation
// =========================================================================
ldpc_decoder_top dut (
.clk (clk),
.rst_n (rst_n),
.wb_cyc_i (wb_cyc_i),
.wb_stb_i (wb_stb_i),
.wb_we_i (wb_we_i),
.wb_adr_i (wb_adr_i),
.wb_dat_i (wb_dat_i),
.wb_dat_o (wb_dat_o),
.wb_ack_o (wb_ack_o),
.irq_o (irq_o)
);
// =========================================================================
// VCD dump
// =========================================================================
initial begin
$dumpfile("tb_ldpc_vectors.vcd");
$dumpvars(0, tb_ldpc_vectors);
end
// =========================================================================
// Watchdog timeout (generous for 20 vectors * 30 iterations each)
// =========================================================================
int cycle_cnt;
initial begin
cycle_cnt = 0;
forever begin
@(posedge clk);
cycle_cnt++;
if (cycle_cnt > 2000000) begin
$display("TIMEOUT: exceeded 2000000 cycles");
$finish;
end
end
end
// =========================================================================
// Test vector memory
// =========================================================================
// LLR words: 52 words per vector, total 52 * NUM_VECTORS = 1040
logic [31:0] llr_mem [LLR_WORDS * NUM_VECTORS];
// Expected results: 4 words per vector, total 4 * NUM_VECTORS = 80
logic [31:0] expected_mem [EXPECTED_LINES * NUM_VECTORS];
initial begin
$readmemh("vectors/llr_words.hex", llr_mem);
$readmemh("vectors/expected.hex", expected_mem);
end
// =========================================================================
// Wishbone tasks (same as standalone testbench)
// =========================================================================
task automatic wb_write(input logic [7:0] addr, input logic [31:0] data);
@(posedge clk);
wb_cyc_i = 1'b1;
wb_stb_i = 1'b1;
wb_we_i = 1'b1;
wb_adr_i = addr;
wb_dat_i = data;
// Wait for ack
do begin
@(posedge clk);
end while (!wb_ack_o);
// Deassert
wb_cyc_i = 1'b0;
wb_stb_i = 1'b0;
wb_we_i = 1'b0;
endtask
task automatic wb_read(input logic [7:0] addr, output logic [31:0] data);
@(posedge clk);
wb_cyc_i = 1'b1;
wb_stb_i = 1'b1;
wb_we_i = 1'b0;
wb_adr_i = addr;
// Wait for ack
do begin
@(posedge clk);
end while (!wb_ack_o);
data = wb_dat_o;
// Deassert
wb_cyc_i = 1'b0;
wb_stb_i = 1'b0;
endtask
// =========================================================================
// Test variables
// =========================================================================
int pass_cnt;
int fail_cnt;
int vec_pass; // per-vector pass flag
logic [31:0] rd_data;
// Expected values for current vector
logic [31:0] exp_decoded;
logic [31:0] exp_converged;
logic [31:0] exp_iterations;
logic [31:0] exp_syndrome_wt;
// Actual values from RTL
logic [31:0] act_decoded;
logic act_converged;
logic [4:0] act_iter_used;
logic [7:0] act_syndrome_wt;
// =========================================================================
// Main test sequence
// =========================================================================
initial begin
pass_cnt = 0;
fail_cnt = 0;
// Initialize Wishbone signals
wb_cyc_i = 1'b0;
wb_stb_i = 1'b0;
wb_we_i = 1'b0;
wb_adr_i = 8'h00;
wb_dat_i = 32'h0;
// Reset
rst_n = 1'b0;
repeat (10) @(posedge clk);
rst_n = 1'b1;
repeat (5) @(posedge clk);
// =================================================================
// Sanity check: Read VERSION register
// =================================================================
$display("=== LDPC Vector-Driven Testbench ===");
$display("Vectors: %0d, LLR words/vector: %0d", NUM_VECTORS, LLR_WORDS);
$display("");
wb_read(REG_VERSION, rd_data);
if (rd_data === 32'h1D01_0001) begin
$display("[SANITY] VERSION = 0x%08X (OK)", rd_data);
end else begin
$display("[SANITY] VERSION = 0x%08X (UNEXPECTED, expected 0x1D010001)", rd_data);
end
$display("");
// =================================================================
// Process each test vector
// =================================================================
for (int v = 0; v < NUM_VECTORS; v++) begin
vec_pass = 1;
// Load expected values
exp_decoded = expected_mem[v * EXPECTED_LINES + 0];
exp_converged = expected_mem[v * EXPECTED_LINES + 1];
exp_iterations = expected_mem[v * EXPECTED_LINES + 2];
exp_syndrome_wt = expected_mem[v * EXPECTED_LINES + 3];
$display("[VEC %0d] Expected: decoded=0x%08X, converged=%0d, iter=%0d, syn_wt=%0d",
v, exp_decoded, exp_converged[0], exp_iterations, exp_syndrome_wt);
// ---------------------------------------------------------
// Step 1: Write 52 LLR words via Wishbone
// ---------------------------------------------------------
for (int w = 0; w < LLR_WORDS; w++) begin
wb_write(REG_LLR_BASE + w * 4, llr_mem[v * LLR_WORDS + w]);
end
// ---------------------------------------------------------
// Step 2: Start decode
// CTRL: bit[0]=start, bit[1]=early_term, bits[12:8]=max_iter
// max_iter=30 -> 0x1E, so CTRL = 0x00001E03
// ---------------------------------------------------------
wb_write(REG_CTRL, {19'b0, 5'(MAX_ITER), 6'b0, 1'b1, 1'b1});
// Wait a few cycles for busy to assert
repeat (5) @(posedge clk);
// ---------------------------------------------------------
// Step 3: Poll STATUS until busy=0
// ---------------------------------------------------------
begin
int poll_cnt;
poll_cnt = 0;
do begin
wb_read(REG_STATUS, rd_data);
poll_cnt++;
if (poll_cnt > 50000) begin
$display(" FAIL: decoder stuck busy after %0d polls", poll_cnt);
fail_cnt++;
$display("");
$display("=== ABORTED: %0d PASSED, %0d FAILED ===", pass_cnt, fail_cnt);
$finish;
end
end while (rd_data[0] == 1'b1);
end
// ---------------------------------------------------------
// Step 4: Read results
// ---------------------------------------------------------
// STATUS fields (from last poll read)
act_converged = rd_data[1];
act_iter_used = rd_data[12:8];
act_syndrome_wt = rd_data[23:16];
// Read DECODED register
wb_read(REG_DECODED, act_decoded);
$display(" Actual: decoded=0x%08X, converged=%0d, iter=%0d, syn_wt=%0d",
act_decoded, act_converged, act_iter_used, act_syndrome_wt);
// ---------------------------------------------------------
// Step 5: Compare results
// ---------------------------------------------------------
if (exp_converged[0]) begin
// CONVERGED vector: decoded_word MUST match (bit-exact)
if (act_decoded !== exp_decoded) begin
$display(" FAIL: decoded mismatch (expected 0x%08X, got 0x%08X)",
exp_decoded, act_decoded);
vec_pass = 0;
end
// Converged: RTL must also report converged
if (!act_converged) begin
$display(" FAIL: RTL did not converge (Python model converged)");
vec_pass = 0;
end
// Converged: syndrome weight must be 0
if (act_syndrome_wt !== 8'd0) begin
$display(" FAIL: syndrome_weight=%0d (expected 0 for converged)",
act_syndrome_wt);
vec_pass = 0;
end
// Iteration count: informational (allow +/- 2 tolerance)
if (act_iter_used > exp_iterations[4:0] + 2 ||
(exp_iterations[4:0] > 2 && act_iter_used < exp_iterations[4:0] - 2)) begin
$display(" NOTE: iteration count differs (expected %0d, got %0d)",
exp_iterations, act_iter_used);
end
end else begin
// NON-CONVERGED vector
// Decoded word comparison is informational only
if (act_decoded !== exp_decoded) begin
$display(" INFO: decoded differs from Python model (expected for non-converged)");
end
// Convergence status: RTL should also report non-converged
if (act_converged) begin
// Interesting: RTL converged but Python didn't. Could happen with
// fixed-point vs floating-point differences. Report but don't fail.
$display(" NOTE: RTL converged but Python model did not");
end
// Syndrome weight should be non-zero for non-converged
if (!act_converged && act_syndrome_wt == 8'd0) begin
$display(" FAIL: syndrome_weight=0 but converged=0 (inconsistent)");
vec_pass = 0;
end
end
// ---------------------------------------------------------
// Step 6: Record result
// ---------------------------------------------------------
if (vec_pass) begin
$display(" PASS");
pass_cnt++;
end else begin
$display(" FAIL");
fail_cnt++;
end
$display("");
end // for each vector
// =================================================================
// Summary
// =================================================================
$display("=== RESULTS: %0d PASSED, %0d FAILED out of %0d vectors ===",
pass_cnt, fail_cnt, NUM_VECTORS);
if (fail_cnt == 0) begin
$display("=== ALL VECTORS PASSED ===");
end else begin
$display("=== SOME VECTORS FAILED ===");
end
$finish;
end
endmodule

80
tb/vectors/expected.hex Normal file
View File

@@ -0,0 +1,80 @@
3FD74222
00000001
00000001
00000000
09A5626C
00000001
00000001
00000000
2FFC25FC
00000001
00000001
00000000
5DABF50B
00000001
00000001
00000000
05D8EA33
00000001
00000001
00000000
19AF1473
00000001
00000001
00000000
34D925D3
00000001
00000001
00000000
45C1E650
00000001
00000001
00000000
A4CA7D49
00000001
00000001
00000000
D849EB80
00000001
00000001
00000000
9BCA9A40
00000001
00000001
00000000
79FFC352
00000000
0000001E
00000043
5D2534DC
00000000
0000001E
0000003B
F21718ED
00000000
0000001E
0000003D
7FE0197C
00000000
0000001E
00000041
9E869CC2
00000000
0000001E
0000004B
4E7507D9
00000000
0000001E
00000038
BB5F2BF1
00000000
0000001E
00000033
AA500741
00000000
0000001E
0000004C
F98E6EFE
00000000
0000001E
0000002A

1040
tb/vectors/llr_words.hex Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1 @@
20