Initial LDPC optical decoder project scaffold

Rate-1/8 QC-LDPC decoder for photon-starved optical communication. Target: Efabless chipIgnite (SkyWater 130nm, Caravel harness). - RTL: decoder top, core (layered min-sum), Wishbone interface - Python behavioral model with Poisson channel simulation - 7x8 base matrix, Z=32, n=256, k=32 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 21:47:40 -07:00
commit b93a6f5769
5 changed files with 1261 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,135 @@
+# ldpc_optical - LDPC Decoder for Photon-Starved Optical Communication
+
+## Overview
+
+Low-Density Parity Check (LDPC) decoder targeting the Efabless chipIgnite program (SkyWater 130nm, Caravel harness). Designed for photon-starved optical communication links where received signals are soft probabilities (partial bits), not hard 0/1 decisions.
+
+## Target Application
+
+- **Channel**: Photon-counting optical (Poisson channel, single-photon detectors)
+- **Use case**: Deep space optical, underwater optical, or any photon-starved link
+- **Input format**: Soft LLR (log-likelihood ratios) representing probability of 0 vs 1
+- **Code rate**: 1/8 (32 info bits -> 256 coded bits) for maximum coding gain
+- **Decoding**: Offset min-sum (hardware-friendly approximation of belief propagation)
+
+## Architecture
+
+```
+Caravel SoC (Sky130, ~10 mm^2 user area)
+==============================================+
+|  PicoRV32 (Caravel)                          |
+|      |                                       |
+|      | Wishbone B4                           |
+|      |                                       |
+|  +---v--------------------------------------+|
+|  | ldpc_decoder_top                         ||
+|  |   |                                     ||
+|  |   +-- wishbone_interface                 ||
+|  |   +-- llr_ram (256 x 6-bit)             ||
+|  |   +-- msg_ram (1792 x 6-bit)            ||
+|  |   +-- vn_update_array [Z=32]            ||
+|  |   +-- cn_update_array [Z=32]            ||
+|  |   +-- barrel_shifter_z32                 ||
+|  |   +-- iteration_controller               ||
+|  |   +-- syndrome_checker                   ||
+|  |   +-- hard_decision_out                  ||
+|  +------------------------------------------+|
+==============================================+
+```
+
+## Code Parameters
+
+| Parameter | Value | Notes |
+|-----------|-------|-------|
+| Code type | QC-LDPC | Quasi-cyclic for hardware efficiency |
+| Rate | 1/8 (R = 0.125) | k=32 info bits, n=256 coded bits |
+| Base matrix | 7 x 8 | M_BASE=7 rows, N_BASE=8 cols |
+| Lifting factor Z | 32 | n = N_BASE * Z = 256 |
+| Quantization | 6-bit signed | 1 sign + 5 magnitude |
+| Max iterations | 30 | With early termination on syndrome check |
+| Decoding algorithm | Offset min-sum | Offset ~0.5, ~0.2 dB from sum-product |
+| Scheduling | Layered (row-serial) | ~2x faster convergence than flooding |
+
+## Fabrication Target
+
+| Parameter | Value |
+|-----------|-------|
+| Process | SkyWater 130nm (Sky130) |
+| Platform | Efabless Caravel harness |
+| User area | ~10.3 mm^2 (2.92 x 3.52 mm) |
+| Target clock | 150 MHz (aggressive for Sky130) |
+| Estimated area | ~1.5 mm^2 (decoder only) |
+| Interface | Wishbone B4 slave |
+
+## Directory Structure
+
+- `rtl/` - SystemVerilog RTL sources
+- `tb/` - Verilator testbenches
+- `model/` - Python behavioral model (bit-exact reference)
+- `data/` - H-matrix definitions, test vectors
+- `openlane/` - OpenLane ASIC flow configuration (future)
+- `docs/` - Design documentation
+
+## Channel Model (Photon-Counting Optical)
+
+The receiver uses single-photon detectors. Each time slot produces a photon count (or binary click/no-click). The channel LLR is:
+
+```
+LLR(y) = log(P(y | bit=1) / P(y | bit=0))
+```
+
+For binary detection (click/no-click):
+- P(click | bit=1) = 1 - exp(-(lambda_s + lambda_b))
+- P(click | bit=0) = 1 - exp(-lambda_b)
+
+where lambda_s = signal photons/slot, lambda_b = background photons/slot.
+
+LLR computation is done in software (PicoRV32). The decoder only sees quantized 6-bit LLRs.
+
+## Simulation
+
+```bash
+# Verilator lint check
+verilator --lint-only -Wall rtl/*.sv
+
+# Run testbench
+verilator --binary --timing -o sim_ldpc tb/tb_ldpc_decoder.sv rtl/*.sv
+./obj_dir/sim_ldpc
+
+# Python behavioral model
+cd model && python3 ldpc_sim.py
+```
+
+## Key Design Decisions
+
+1. **Soft input (LLR), not hard bits**: The whole point of LDPC for photon-starved channels. Hard-decision decoding would lose ~2-3 dB of coding gain.
+2. **Rate 1/8**: Extreme redundancy for very low SNR. Shannon limit at R=1/8 is Eb/N0 ~ -1.59 dB; practical LDPC can approach 0 to +1 dB.
+3. **Min-sum over sum-product**: No multipliers or LUT-based tanh needed. Just comparators and adders. Critical for area on Sky130.
+4. **Layered scheduling**: Process one row of base matrix at a time, updating messages immediately. Converges in ~half the iterations of flooding schedule.
+5. **Z=32 parallelism**: 32 VN/CN processors working in parallel. Matches lifting factor for natural throughput.
+
+## Performance Estimates
+
+- Codeword decode: ~630 cycles (30 iterations x 21 cycles/iter)
+- At 150 MHz: ~238K codewords/sec
+- Decoded throughput: 238K x 32 bits = 7.6 Mbps
+- Latency: ~4.2 us per codeword
+- Area: ~1.5 mm^2 at Sky130 (leaves ~8.5 mm^2 for additional blocks)
+
+## Register Map (Wishbone, word-addressed)
+
+| Offset | Name | R/W | Description |
+|--------|------|-----|-------------|
+| 0x00 | CTRL | R/W | [0]=start, [1]=early_term_en, [12:8]=max_iter |
+| 0x04 | STATUS | R | [0]=busy, [1]=converged, [12:8]=iterations_used |
+| 0x08 | CONFIG | R/W | [2:0]=code_sel (future: multiple H matrices) |
+| 0x10-0x4F | LLR_IN | W | Channel LLRs, packed 5x6-bit per word |
+| 0x50-0x57 | DECODED | R | 32 decoded bits (1 word) |
+| 0x5C | SYNDROME_WT | R | Syndrome weight (0 = valid codeword) |
+
+## Notes
+
+- No multipliers in the entire design (add/compare/select only)
+- 150 MHz is aggressive for Sky130 — may need to relax to 100 MHz depending on synthesis results
+- Error floor at rate 1/8 expected around BER 10^-7 to 10^-9 — may need outer RS code for optical comm BER requirements
+- Base matrix H must be carefully designed (PEG algorithm or density evolution optimization)
--- a/model/ldpc_sim.py
+++ b/model/ldpc_sim.py
@@ -0,0 +1,474 @@
+#!/usr/bin/env python3
+"""
+LDPC Decoder Behavioral Model - Bit-Exact Reference for RTL Verification
+
+Implements offset min-sum decoding with layered scheduling for a
+rate-1/8 QC-LDPC code (n=256, k=32, Z=32, base matrix 7x8).
+
+Channel model: Poisson photon-counting (optical communication)
+
+Usage:
+    python3 ldpc_sim.py                    # Run BER simulation
+    python3 ldpc_sim.py --gen-vectors      # Generate RTL test vectors
+    python3 ldpc_sim.py --sweep-snr        # SNR sweep for BER curve
+"""
+
+import numpy as np
+import argparse
+import json
+import os
+
+# =============================================================================
+# Code parameters
+# =============================================================================
+N_BASE = 8       # base matrix columns
+M_BASE = 7       # base matrix rows
+Z = 32           # lifting factor
+N = N_BASE * Z   # 256 codeword bits
+K = Z            # 32 info bits (rate 1/8)
+M = M_BASE * Z   # 224 parity checks
+Q_BITS = 6       # quantization bits (signed)
+Q_MAX = 2**(Q_BITS-1) - 1   # +31
+Q_MIN = -(2**(Q_BITS-1))    # -32
+OFFSET = 1       # min-sum offset (integer)
+
+# Base matrix: H_BASE[row][col] = cyclic shift, -1 = no connection
+# This must match the RTL exactly!
+H_BASE = np.array([
+    [ 0,  5, 11, 17, 23, 29,  3,  9],
+    [15,  0, 21,  7, 13, 19, 25, 31],
+    [10, 20,  0, 30,  8, 16, 24,  2],
+    [27, 14,  1,  0, 18,  6, 12, 22],
+    [ 4, 28, 16, 12,  0, 26,  8, 20],
+    [19,  9, 31, 25, 15,  0, 21, 11],
+    [22, 26,  6, 14, 30, 10,  0, 18],
+], dtype=np.int8)
+
+
+def build_full_h_matrix():
+    """Expand QC base matrix to full binary parity-check matrix H (M x N)."""
+    H = np.zeros((M, N), dtype=np.int8)
+    for r in range(M_BASE):
+        for c in range(N_BASE):
+            shift = H_BASE[r, c]
+            if shift < 0:
+                continue  # null sub-matrix
+            # Cyclic permutation matrix of size Z with shift
+            for z in range(Z):
+                H[r * Z + z, c * Z + (z + shift) % Z] = 1
+    return H
+
+
+def ldpc_encode(info_bits, H):
+    """
+    Systematic encoding: info bits are the first K bits of codeword.
+    Solve H * c^T = 0 for parity bits given info bits.
+
+    For a systematic code, H = [H_p | H_i] where H_p is invertible.
+    c = [info | parity], H_p * parity^T = H_i * info^T (mod 2)
+
+    This uses dense GF(2) Gaussian elimination. Fine for small codes.
+    """
+    # info_bits goes in columns 0..K-1 (first base column = info)
+    # Parity bits in columns K..N-1
+
+    # We need to solve: H[:,K:] * p = H[:,:K] * info (mod 2)
+    H_p = H[:, K:].copy()  # M x (N-K) = 224 x 224
+    H_i = H[:, :K].copy()  # M x K = 224 x 32
+
+    syndrome = H_i @ info_bits % 2  # M-vector
+
+    # Gaussian elimination on H_p to solve for parity
+    n_parity = N - K  # 224
+    assert H_p.shape == (M, n_parity)
+
+    # Augmented matrix [H_p | syndrome]
+    aug = np.hstack([H_p, syndrome.reshape(-1, 1)]).astype(np.int8)
+
+    # Forward elimination
+    pivot_row = 0
+    for col in range(n_parity):
+        # Find pivot
+        found = False
+        for row in range(pivot_row, M):
+            if aug[row, col] == 1:
+                aug[[pivot_row, row]] = aug[[row, pivot_row]]
+                found = True
+                break
+        if not found:
+            continue  # skip this column (rank deficient)
+
+        # Eliminate
+        for row in range(M):
+            if row != pivot_row and aug[row, col] == 1:
+                aug[row] = (aug[row] + aug[pivot_row]) % 2
+        pivot_row += 1
+
+    parity = aug[:n_parity, -1]  # solution
+    codeword = np.concatenate([info_bits, parity])
+
+    # Verify
+    check = H @ codeword % 2
+    assert np.all(check == 0), f"Encoding failed: syndrome weight = {check.sum()}"
+
+    return codeword
+
+
+def poisson_channel(codeword, lam_s, lam_b):
+    """
+    Simulate photon-counting optical channel.
+
+    For each bit:
+      bit=1: transmit pulse -> expected photons = lam_s + lam_b
+      bit=0: no pulse -> expected photons = lam_b (background only)
+
+    Receiver counts photons (Poisson distributed).
+    Output: LLR = log(P(y|1) / P(y|0)) for each received symbol.
+
+    For binary (click/no-click) detector:
+      P(click|1) = 1 - exp(-(lam_s + lam_b))
+      P(click|0) = 1 - exp(-lam_b)
+    """
+    n = len(codeword)
+
+    # Expected photon counts
+    lam = np.where(codeword == 1, lam_s + lam_b, lam_b)
+
+    # Poisson photon counts
+    photon_counts = np.random.poisson(lam)
+
+    # Compute exact LLR for each observation
+    # P(y|1) = (lam_s+lam_b)^y * exp(-(lam_s+lam_b)) / y!
+    # P(y|0) = lam_b^y * exp(-lam_b) / y!
+    # LLR = y * log((lam_s+lam_b)/lam_b) - lam_s
+    llr = np.zeros(n, dtype=np.float64)
+    for i in range(n):
+        y = photon_counts[i]
+        if lam_b > 0:
+            llr[i] = y * np.log((lam_s + lam_b) / lam_b) - lam_s
+        else:
+            # No background: click = definitely bit 1, no click = definitely bit 0
+            if y > 0:
+                llr[i] = 100.0  # strong positive (bit=1)
+            else:
+                llr[i] = -lam_s  # no photons, likely bit=0
+
+    return llr, photon_counts
+
+
+def quantize_llr(llr_float, q_bits=Q_BITS):
+    """Quantize floating-point LLR to signed integer."""
+    q_max = 2**(q_bits-1) - 1
+    q_min = -(2**(q_bits-1))
+    # Scale: map typical LLR range to integer range
+    # For photon channel, LLRs are typically in [-5, +5] range
+    scale = q_max / 5.0
+    llr_scaled = np.round(llr_float * scale).astype(np.int32)
+    return np.clip(llr_scaled, q_min, q_max).astype(np.int8)
+
+
+def sat_add_q(a, b):
+    """Saturating add in Q-bit signed arithmetic."""
+    s = int(a) + int(b)
+    return max(Q_MIN, min(Q_MAX, s))
+
+
+def sat_sub_q(a, b):
+    """Saturating subtract in Q-bit signed arithmetic."""
+    return sat_add_q(a, -b)
+
+
+def min_sum_cn_update(msgs_in, offset=OFFSET):
+    """
+    Offset min-sum check node update.
+
+    For each output j:
+      sign = XOR of all other input signs
+      magnitude = min of all other magnitudes - offset (clamp to 0)
+
+    Args:
+        msgs_in: list of DC signed integers (Q-bit)
+        offset: offset correction value
+
+    Returns:
+        msgs_out: list of DC signed integers (Q-bit)
+    """
+    dc = len(msgs_in)
+    signs = [1 if m < 0 else 0 for m in msgs_in]
+    mags = [abs(m) for m in msgs_in]
+    sign_xor = sum(signs) % 2
+
+    # Find min1, min2, and index of min1
+    min1 = Q_MAX
+    min2 = Q_MAX
+    min1_idx = 0
+    for i in range(dc):
+        if mags[i] < min1:
+            min2 = min1
+            min1 = mags[i]
+            min1_idx = i
+        elif mags[i] < min2:
+            min2 = mags[i]
+
+    msgs_out = []
+    for j in range(dc):
+        mag = min2 if j == min1_idx else min1
+        mag = max(0, mag - offset)  # offset correction
+        sgn = sign_xor ^ signs[j]   # extrinsic sign
+        val = -mag if sgn else mag
+        msgs_out.append(val)
+
+    return msgs_out
+
+
+def decode_layered_min_sum(llr_q, max_iter=30, early_term=True):
+    """
+    Layered offset min-sum LDPC decoder (bit-exact reference for RTL).
+
+    Args:
+        llr_q: quantized channel LLRs (N-length array of signed Q-bit integers)
+        max_iter: maximum iterations
+        early_term: stop when syndrome is zero
+
+    Returns:
+        decoded_bits: hard decisions (N-length binary array)
+        converged: True if syndrome == 0
+        iterations: number of iterations performed
+        syndrome_weight: final syndrome weight
+    """
+    # Initialize beliefs from channel LLRs
+    beliefs = [int(x) for x in llr_q]
+
+    # Initialize CN->VN messages to zero
+    # msg[row][col][z] = message from CN (row*Z+z) to VN at shifted position
+    msg = [[[0 for _ in range(Z)] for _ in range(N_BASE)] for _ in range(M_BASE)]
+
+    for iteration in range(max_iter):
+        # Process each base matrix row (layer)
+        for row in range(M_BASE):
+            # Step 1: Compute VN->CN messages by subtracting old CN->VN
+            vn_to_cn = [[0]*Z for _ in range(N_BASE)]
+            for col in range(N_BASE):
+                shift = int(H_BASE[row, col])
+                if shift < 0:
+                    continue
+                for z in range(Z):
+                    shifted_z = (z + shift) % Z
+                    bit_idx = col * Z + shifted_z
+                    old_msg = msg[row][col][z]
+                    vn_to_cn[col][z] = sat_sub_q(beliefs[bit_idx], old_msg)
+
+            # Step 2: CN min-sum update
+            cn_to_vn = [[0]*Z for _ in range(N_BASE)]
+            for z in range(Z):
+                # Gather messages from all columns for this check node
+                cn_inputs = [vn_to_cn[col][z] for col in range(N_BASE)]
+                cn_outputs = min_sum_cn_update(cn_inputs)
+                for col in range(N_BASE):
+                    cn_to_vn[col][z] = cn_outputs[col]
+
+            # Step 3: Update beliefs and store new messages
+            for col in range(N_BASE):
+                shift = int(H_BASE[row, col])
+                if shift < 0:
+                    continue
+                for z in range(Z):
+                    shifted_z = (z + shift) % Z
+                    bit_idx = col * Z + shifted_z
+                    new_msg = cn_to_vn[col][z]
+                    extrinsic = vn_to_cn[col][z]
+                    beliefs[bit_idx] = sat_add_q(extrinsic, new_msg)
+                    msg[row][col][z] = new_msg
+
+        # Syndrome check
+        hard = [1 if b < 0 else 0 for b in beliefs]
+        syndrome_weight = compute_syndrome_weight(hard)
+
+        if early_term and syndrome_weight == 0:
+            return np.array(hard[:K]), True, iteration + 1, 0
+
+    hard = [1 if b < 0 else 0 for b in beliefs]
+    syndrome_weight = compute_syndrome_weight(hard)
+    return np.array(hard[:K]), syndrome_weight == 0, max_iter, syndrome_weight
+
+
+def compute_syndrome_weight(hard_bits):
+    """Compute syndrome weight = number of unsatisfied parity checks."""
+    weight = 0
+    for r in range(M_BASE):
+        for z in range(Z):
+            parity = 0
+            for c in range(N_BASE):
+                shift = int(H_BASE[r, c])
+                if shift < 0:
+                    continue
+                shifted_z = (z + shift) % Z
+                bit_idx = c * Z + shifted_z
+                parity ^= hard_bits[bit_idx]
+            if parity:
+                weight += 1
+    return weight
+
+
+def run_ber_simulation(lam_s_db_range, lam_b=0.1, n_frames=1000, max_iter=30):
+    """
+    Run BER simulation over a range of signal photon counts.
+
+    Args:
+        lam_s_db_range: signal photons/slot in dB (10*log10(lam_s))
+        lam_b: background photon rate
+        n_frames: number of codewords per SNR point
+        max_iter: decoder iterations
+    """
+    H = build_full_h_matrix()
+    print(f"H matrix: {H.shape}, rank = {np.linalg.matrix_rank(H.astype(float))}")
+    print(f"Code: ({N},{K}) rate {K/N:.3f}, Z={Z}")
+    print(f"Background photons: {lam_b}")
+    print(f"{'lam_s_dB':>10s} {'lam_s':>8s} {'BER':>10s} {'FER':>10s} {'avg_iter':>10s}")
+    print("-" * 55)
+
+    results = []
+    for lam_s_db in lam_s_db_range:
+        lam_s = 10**(lam_s_db / 10)
+        bit_errors = 0
+        frame_errors = 0
+        total_bits = 0
+        total_iter = 0
+
+        for frame in range(n_frames):
+            # Random info bits
+            info = np.random.randint(0, 2, K)
+
+            # Encode
+            codeword = ldpc_encode(info, H)
+
+            # Channel
+            llr_float, _ = poisson_channel(codeword, lam_s, lam_b)
+            llr_q = quantize_llr(llr_float)
+
+            # Decode
+            decoded, converged, iters, _ = decode_layered_min_sum(llr_q, max_iter)
+            total_iter += iters
+
+            # Count errors
+            errs = np.sum(decoded != info)
+            bit_errors += errs
+            total_bits += K
+            if errs > 0:
+                frame_errors += 1
+
+        ber = bit_errors / total_bits if total_bits > 0 else 0
+        fer = frame_errors / n_frames
+        avg_iter = total_iter / n_frames
+
+        print(f"{lam_s_db:10.1f} {lam_s:8.3f} {ber:10.6f} {fer:10.4f} {avg_iter:10.1f}")
+        results.append({
+            'lam_s_db': lam_s_db, 'lam_s': lam_s,
+            'ber': ber, 'fer': fer, 'avg_iter': avg_iter
+        })
+
+    return results
+
+
+def generate_test_vectors(n_vectors=10, lam_s=2.0, lam_b=0.1, max_iter=30):
+    """Generate test vectors for RTL verification."""
+    H = build_full_h_matrix()
+    vectors = []
+
+    for i in range(n_vectors):
+        info = np.random.randint(0, 2, K)
+        codeword = ldpc_encode(info, H)
+        llr_float, photons = poisson_channel(codeword, lam_s, lam_b)
+        llr_q = quantize_llr(llr_float)
+        decoded, converged, iters, syn_wt = decode_layered_min_sum(llr_q, max_iter)
+
+        vec = {
+            'index': i,
+            'info_bits': info.tolist(),
+            'codeword': codeword.tolist(),
+            'photon_counts': photons.tolist(),
+            'llr_float': llr_float.tolist(),
+            'llr_quantized': llr_q.tolist(),
+            'decoded_bits': decoded.tolist(),
+            'converged': bool(converged),
+            'iterations': iters,
+            'syndrome_weight': syn_wt,
+            'bit_errors': int(np.sum(decoded != info)),
+        }
+        vectors.append(vec)
+        status = "PASS" if np.array_equal(decoded, info) else f"FAIL ({vec['bit_errors']} errs)"
+        print(f"  Vector {i}: {status} (iter={iters}, converged={converged})")
+
+    return vectors
+
+
+def main():
+    parser = argparse.ArgumentParser(description='LDPC Decoder Behavioral Model')
+    parser.add_argument('--gen-vectors', action='store_true',
+                        help='Generate RTL test vectors')
+    parser.add_argument('--sweep-snr', action='store_true',
+                        help='Run BER vs SNR sweep')
+    parser.add_argument('--n-frames', type=int, default=1000,
+                        help='Frames per SNR point (default: 1000)')
+    parser.add_argument('--max-iter', type=int, default=30,
+                        help='Max decoder iterations (default: 30)')
+    parser.add_argument('--lam-s', type=float, default=2.0,
+                        help='Signal photons/slot for test vectors (default: 2.0)')
+    parser.add_argument('--lam-b', type=float, default=0.1,
+                        help='Background photons/slot (default: 0.1)')
+    parser.add_argument('--seed', type=int, default=42,
+                        help='Random seed (default: 42)')
+    args = parser.parse_args()
+
+    np.random.seed(args.seed)
+
+    if args.gen_vectors:
+        print(f"Generating test vectors (lam_s={args.lam_s}, lam_b={args.lam_b})...")
+        vectors = generate_test_vectors(
+            n_vectors=20, lam_s=args.lam_s, lam_b=args.lam_b,
+            max_iter=args.max_iter
+        )
+        out_path = os.path.join(os.path.dirname(__file__), '..', 'data', 'test_vectors.json')
+        with open(out_path, 'w') as f:
+            json.dump(vectors, f, indent=2)
+        print(f"\nWrote {len(vectors)} vectors to {out_path}")
+
+    elif args.sweep_snr:
+        print("BER Sweep: Poisson photon-counting channel, rate-1/8 QC-LDPC")
+        lam_s_db_range = np.arange(-6, 10, 1.0)  # -6 to +9 dB
+        results = run_ber_simulation(
+            lam_s_db_range, lam_b=args.lam_b,
+            n_frames=args.n_frames, max_iter=args.max_iter
+        )
+        out_path = os.path.join(os.path.dirname(__file__), '..', 'data', 'ber_results.json')
+        with open(out_path, 'w') as f:
+            json.dump(results, f, indent=2)
+        print(f"\nWrote results to {out_path}")
+
+    else:
+        # Quick demo
+        print("=== LDPC Rate-1/8 Decoder Demo ===")
+        print(f"Code: ({N},{K}), rate {K/N:.3f}, Z={Z}")
+
+        H = build_full_h_matrix()
+        print(f"H matrix: {H.shape}, density: {H.sum()/(H.shape[0]*H.shape[1]):.4f}")
+
+        info = np.random.randint(0, 2, K)
+        print(f"\nInfo bits ({K}): {info}")
+
+        codeword = ldpc_encode(info, H)
+        print(f"Codeword ({N} bits), weight: {codeword.sum()}")
+
+        # Simulate at a few photon levels
+        for lam_s in [0.5, 1.0, 2.0, 5.0]:
+            np.random.seed(args.seed)
+            llr_float, photons = poisson_channel(codeword, lam_s, args.lam_b)
+            llr_q = quantize_llr(llr_float)
+            decoded, converged, iters, syn_wt = decode_layered_min_sum(llr_q)
+            errors = np.sum(decoded != info)
+            print(f"  lam_s={lam_s:.1f}: decoded in {iters} iter, "
+                  f"converged={converged}, errors={errors}")
+
+
+if __name__ == '__main__':
+    main()
--- a/rtl/ldpc_decoder_core.sv
+++ b/rtl/ldpc_decoder_core.sv
@@ -0,0 +1,403 @@
+// LDPC Decoder Core - Layered Min-Sum with QC structure
+//
+// Layered scheduling processes one base-matrix row at a time.
+// For each row, we:
+//   1. Read VN beliefs for all Z columns connected to this row
+//   2. Subtract old CN->VN messages to get VN->CN messages
+//   3. Run CN min-sum update
+//   4. Add new CN->VN messages back to VN beliefs
+//   5. Write updated beliefs back
+//
+// This converges ~2x faster than flooding and needs only one message memory
+// (CN->VN messages for current layer, overwritten each layer).
+
+module ldpc_decoder_core #(
+    parameter N_BASE    = 8,
+    parameter M_BASE    = 7,
+    parameter Z         = 32,
+    parameter N         = N_BASE * Z,
+    parameter M         = M_BASE * Z,
+    parameter Q         = 6,
+    parameter MAX_ITER  = 30,
+    parameter DC        = 8,        // check node degree
+    parameter DV_MAX    = 7         // max variable node degree
+)(
+    input  logic                    clk,
+    input  logic                    rst_n,
+
+    // Control
+    input  logic                    start,
+    input  logic                    early_term_en,
+    input  logic [4:0]              max_iter,
+
+    // Channel LLRs (loaded before start)
+    input  logic signed [Q-1:0]     llr_in [N],
+
+    // Status
+    output logic                    busy,
+    output logic                    converged,
+    output logic [4:0]              iter_used,
+
+    // Results
+    output logic [Z-1:0]            decoded_bits,   // first Z bits = info bits
+    output logic [7:0]              syndrome_weight
+);
+
+    // =========================================================================
+    // Base matrix H stored as shift values (-1 = no connection)
+    // H_BASE[row][col] = cyclic shift amount, or -1 if zero sub-matrix
+    // =========================================================================
+
+    // This is a placeholder base matrix for rate-1/8 QC-LDPC.
+    // Must be replaced with a properly designed matrix (PEG algorithm or
+    // density evolution optimized). All entries >= 0 means fully connected
+    // (regular dv=7, dc=8). For irregular codes, some entries would be -1.
+    //
+    // TODO: Replace with optimized base matrix from model/design_h_matrix.py
+
+    logic signed [5:0] H_BASE [M_BASE][N_BASE];
+
+    // Shift values for 7x8 base matrix (Z=32, values 0..31, -1=null)
+    // This is a regular (7,8) code - every entry is connected
+    initial begin
+        // Row 0
+        H_BASE[0][0] =  0; H_BASE[0][1] =  5; H_BASE[0][2] = 11;
+        H_BASE[0][3] = 17; H_BASE[0][4] = 23; H_BASE[0][5] = 29;
+        H_BASE[0][6] =  3; H_BASE[0][7] =  9;
+        // Row 1
+        H_BASE[1][0] = 15; H_BASE[1][1] =  0; H_BASE[1][2] = 21;
+        H_BASE[1][3] =  7; H_BASE[1][4] = 13; H_BASE[1][5] = 19;
+        H_BASE[1][6] = 25; H_BASE[1][7] = 31;
+        // Row 2
+        H_BASE[2][0] = 10; H_BASE[2][1] = 20; H_BASE[2][2] =  0;
+        H_BASE[2][3] = 30; H_BASE[2][4] =  8; H_BASE[2][5] = 16;
+        H_BASE[2][6] = 24; H_BASE[2][7] =  2;
+        // Row 3
+        H_BASE[3][0] = 27; H_BASE[3][1] = 14; H_BASE[3][2] =  1;
+        H_BASE[3][3] =  0; H_BASE[3][4] = 18; H_BASE[3][5] =  6;
+        H_BASE[3][6] = 12; H_BASE[3][7] = 22;
+        // Row 4
+        H_BASE[4][0] =  4; H_BASE[4][1] = 28; H_BASE[4][2] = 16;
+        H_BASE[4][3] = 12; H_BASE[4][4] =  0; H_BASE[4][5] = 26;
+        H_BASE[4][6] =  8; H_BASE[4][7] = 20;
+        // Row 5
+        H_BASE[5][0] = 19; H_BASE[5][1] =  9; H_BASE[5][2] = 31;
+        H_BASE[5][3] = 25; H_BASE[5][4] = 15; H_BASE[5][5] =  0;
+        H_BASE[5][6] = 21; H_BASE[5][7] = 11;
+        // Row 6
+        H_BASE[6][0] = 22; H_BASE[6][1] = 26; H_BASE[6][2] =  6;
+        H_BASE[6][3] = 14; H_BASE[6][4] = 30; H_BASE[6][5] = 10;
+        H_BASE[6][6] =  0; H_BASE[6][7] = 18;
+    end
+
+    // =========================================================================
+    // Memory: VN beliefs (total posterior LLR per bit)
+    // beliefs[j] = channel_llr[j] + sum of all CN->VN messages to j
+    // =========================================================================
+
+    logic signed [Q-1:0] beliefs [N];
+
+    // =========================================================================
+    // Memory: CN->VN messages for layered update
+    // msg_cn2vn[row][col][z] = message from check (row*Z+z) to variable (col*Z+shift(z))
+    // Stored as [M_BASE][N_BASE] banks of Z entries each
+    // =========================================================================
+
+    logic signed [Q-1:0] msg_cn2vn [M_BASE][N_BASE][Z];
+
+    // =========================================================================
+    // Decoder FSM
+    // =========================================================================
+
+    typedef enum logic [2:0] {
+        IDLE,
+        INIT,           // Initialize beliefs from channel LLRs, zero messages
+        LAYER_READ,     // Read Z beliefs for each of DC columns in current row
+        CN_UPDATE,      // Run min-sum CN update on gathered messages
+        LAYER_WRITE,    // Write updated beliefs and new CN->VN messages
+        SYNDROME,       // Check syndrome after full iteration
+        DONE
+    } state_t;
+
+    state_t state, state_next;
+
+    logic [4:0]  iter_cnt;
+    logic [2:0]  row_idx;       // current base matrix row (0..M_BASE-1)
+    logic [2:0]  col_idx;       // current column being read/written (0..N_BASE-1)
+    logic [4:0]  effective_max_iter;
+
+    // Working registers for current layer CN update
+    logic signed [Q-1:0] vn_to_cn [DC][Z];  // VN->CN messages for current row
+    logic signed [Q-1:0] cn_to_vn [DC][Z];  // new CN->VN messages (output of min-sum)
+
+    // Syndrome check
+    logic [7:0] syndrome_cnt;
+    logic       syndrome_ok;
+
+    assign effective_max_iter = (max_iter == 0) ? MAX_ITER[4:0] : max_iter;
+    assign busy = (state != IDLE) && (state != DONE);
+
+    // =========================================================================
+    // State machine
+    // =========================================================================
+
+    always_ff @(posedge clk or negedge rst_n) begin
+        if (!rst_n) begin
+            state <= IDLE;
+        end else begin
+            state <= state_next;
+        end
+    end
+
+    always_comb begin
+        state_next = state;
+        case (state)
+            IDLE:        if (start) state_next = INIT;
+            INIT:        state_next = LAYER_READ;
+            LAYER_READ:  if (col_idx == N_BASE - 1) state_next = CN_UPDATE;
+            CN_UPDATE:   state_next = LAYER_WRITE;
+            LAYER_WRITE: begin
+                if (col_idx == N_BASE - 1) begin
+                    if (row_idx == M_BASE - 1)
+                        state_next = SYNDROME;
+                    else
+                        state_next = LAYER_READ;  // next row
+                end
+            end
+            SYNDROME: begin
+                if (syndrome_ok && early_term_en)
+                    state_next = DONE;
+                else if (iter_cnt >= effective_max_iter)
+                    state_next = DONE;
+                else
+                    state_next = LAYER_READ;  // next iteration
+                end
+            DONE:        if (!start) state_next = IDLE;
+            default:     state_next = IDLE;
+        endcase
+    end
+
+    // =========================================================================
+    // Datapath
+    // =========================================================================
+
+    always_ff @(posedge clk or negedge rst_n) begin
+        if (!rst_n) begin
+            iter_cnt   <= '0;
+            row_idx    <= '0;
+            col_idx    <= '0;
+            converged  <= 1'b0;
+            iter_used  <= '0;
+            syndrome_weight <= '0;
+        end else begin
+            case (state)
+                IDLE: begin
+                    iter_cnt  <= '0;
+                    row_idx   <= '0;
+                    col_idx   <= '0;
+                    converged <= 1'b0;
+                end
+
+                INIT: begin
+                    // Initialize beliefs from channel LLRs
+                    for (int j = 0; j < N; j++) begin
+                        beliefs[j] <= llr_in[j];
+                    end
+                    // Zero all CN->VN messages
+                    for (int r = 0; r < M_BASE; r++)
+                        for (int c = 0; c < N_BASE; c++)
+                            for (int z = 0; z < Z; z++)
+                                msg_cn2vn[r][c][z] <= '0;
+                    row_idx <= '0;
+                    col_idx <= '0;
+                    iter_cnt <= '0;
+                end
+
+                LAYER_READ: begin
+                    // For column col_idx in current row_idx:
+                    // VN->CN = belief - old CN->VN message
+                    // (belief already contains the sum of ALL CN->VN messages,
+                    //  so subtracting the current row's message gives the extrinsic)
+                    for (int z = 0; z < Z; z++) begin
+                        int bit_idx;
+                        int shifted_z;
+                        logic signed [Q-1:0] old_msg;
+                        logic signed [Q-1:0] belief_val;
+
+                        shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
+                        bit_idx   = int'(col_idx) * Z + shifted_z;
+                        old_msg   = msg_cn2vn[row_idx][col_idx][z];
+                        belief_val = beliefs[bit_idx];
+
+                        vn_to_cn[col_idx][z] <= sat_sub(belief_val, old_msg);
+                    end
+
+                    if (col_idx == N_BASE - 1)
+                        col_idx <= '0;
+                    else
+                        col_idx <= col_idx + 1;
+                end
+
+                CN_UPDATE: begin
+                    // Min-sum update for all Z check nodes in current row
+                    // Each CN has DC=8 incoming messages (one per column)
+                    for (int z = 0; z < Z; z++) begin
+                        // Gather DC messages for check node z
+                        logic signed [Q-1:0] msgs [DC];
+                        for (int d = 0; d < DC; d++)
+                            msgs[d] = vn_to_cn[d][z];
+
+                        // Min-sum: find min1, min2, sign product, min1 index
+                        cn_min_sum(msgs, cn_to_vn[0][z], cn_to_vn[1][z],
+                                   cn_to_vn[2][z], cn_to_vn[3][z],
+                                   cn_to_vn[4][z], cn_to_vn[5][z],
+                                   cn_to_vn[6][z], cn_to_vn[7][z]);
+                    end
+                    col_idx <= '0;  // prepare for LAYER_WRITE
+                end
+
+                LAYER_WRITE: begin
+                    // Write back: update beliefs and store new CN->VN messages
+                    for (int z = 0; z < Z; z++) begin
+                        int bit_idx;
+                        int shifted_z;
+                        logic signed [Q-1:0] new_msg;
+                        logic signed [Q-1:0] old_extrinsic;
+
+                        shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
+                        bit_idx   = int'(col_idx) * Z + shifted_z;
+                        new_msg   = cn_to_vn[col_idx][z];
+                        old_extrinsic = vn_to_cn[col_idx][z];
+
+                        // belief = extrinsic (VN->CN) + new CN->VN message
+                        beliefs[bit_idx] <= sat_add(old_extrinsic, new_msg);
+
+                        // Store new message for next iteration
+                        msg_cn2vn[row_idx][col_idx][z] <= new_msg;
+                    end
+
+                    if (col_idx == N_BASE - 1) begin
+                        col_idx <= '0;
+                        if (row_idx == M_BASE - 1)
+                            row_idx <= '0;
+                        else
+                            row_idx <= row_idx + 1;
+                    end else begin
+                        col_idx <= col_idx + 1;
+                    end
+                end
+
+                SYNDROME: begin
+                    // Check H * c_hat == 0 (compute syndrome weight)
+                    syndrome_cnt = '0;
+                    for (int r = 0; r < M_BASE; r++) begin
+                        for (int z = 0; z < Z; z++) begin
+                            logic parity;
+                            parity = 1'b0;
+                            for (int c = 0; c < N_BASE; c++) begin
+                                int shifted_z, bit_idx;
+                                shifted_z = (z + H_BASE[r][c]) % Z;
+                                bit_idx = c * Z + shifted_z;
+                                parity = parity ^ beliefs[bit_idx][Q-1]; // sign bit = hard decision
+                            end
+                            if (parity) syndrome_cnt = syndrome_cnt + 1;
+                        end
+                    end
+                    syndrome_weight <= syndrome_cnt;
+                    syndrome_ok = (syndrome_cnt == 0);
+
+                    iter_cnt <= iter_cnt + 1;
+                    iter_used <= iter_cnt + 1;
+                    if (syndrome_ok) converged <= 1'b1;
+                end
+
+                DONE: begin
+                    // Output decoded info bits (first Z=32 bits, column 0)
+                    for (int z = 0; z < Z; z++)
+                        decoded_bits[z] <= beliefs[z][Q-1]; // sign bit = hard decision
+                end
+            endcase
+        end
+    end
+
+    // =========================================================================
+    // Min-sum CN update function
+    // =========================================================================
+
+    // Offset min-sum for DC=8 inputs
+    // For each output j: sign = XOR of all other signs, magnitude = min of all other magnitudes - offset
+    task automatic cn_min_sum(
+        input  logic signed [Q-1:0] in [DC],
+        output logic signed [Q-1:0] out0, out1, out2, out3,
+                                     out4, out5, out6, out7
+    );
+        logic [DC-1:0] signs;
+        logic [Q-2:0]  mags [DC];
+        logic          sign_xor;
+        logic [Q-2:0]  min1, min2;
+        int            min1_idx;
+        logic signed [Q-1:0] outs [DC];
+
+        // Extract signs and magnitudes
+        sign_xor = 1'b0;
+        for (int i = 0; i < DC; i++) begin
+            signs[i] = in[i][Q-1];
+            mags[i]  = in[i][Q-1] ? (~in[i][Q-2:0] + 1) : in[i][Q-2:0];
+            sign_xor = sign_xor ^ signs[i];
+        end
+
+        // Find two smallest magnitudes
+        min1 = {(Q-1){1'b1}};
+        min2 = {(Q-1){1'b1}};
+        min1_idx = 0;
+        for (int i = 0; i < DC; i++) begin
+            if (mags[i] < min1) begin
+                min2     = min1;
+                min1     = mags[i];
+                min1_idx = i;
+            end else if (mags[i] < min2) begin
+                min2 = mags[i];
+            end
+        end
+
+        // Compute extrinsic outputs with offset correction
+        for (int j = 0; j < DC; j++) begin
+            logic [Q-2:0] mag_out;
+            logic          sign_out;
+
+            mag_out  = (j == min1_idx) ? min2 : min1;
+            // Offset correction (subtract 1 in integer representation)
+            mag_out  = (mag_out > 1) ? (mag_out - 1) : {(Q-1){1'b0}};
+            sign_out = sign_xor ^ signs[j];
+
+            outs[j] = sign_out ? (~{1'b0, mag_out} + 1) : {1'b0, mag_out};
+        end
+
+        out0 = outs[0]; out1 = outs[1]; out2 = outs[2]; out3 = outs[3];
+        out4 = outs[4]; out5 = outs[5]; out6 = outs[6]; out7 = outs[7];
+    endtask
+
+    // =========================================================================
+    // Saturating arithmetic helpers
+    // =========================================================================
+
+    function automatic logic signed [Q-1:0] sat_add(
+        logic signed [Q-1:0] a, logic signed [Q-1:0] b
+    );
+        logic signed [Q:0] sum;
+        sum = {a[Q-1], a} + {b[Q-1], b};  // sign-extend and add
+        if (sum > $signed({1'b0, {(Q-1){1'b1}}}))
+            return {1'b0, {(Q-1){1'b1}}};  // +max
+        else if (sum < $signed({1'b1, {(Q-1){1'b0}}}))
+            return {1'b1, {(Q-1){1'b0}}};  // -max
+        else
+            return sum[Q-1:0];
+    endfunction
+
+    function automatic logic signed [Q-1:0] sat_sub(
+        logic signed [Q-1:0] a, logic signed [Q-1:0] b
+    );
+        return sat_add(a, -b);
+    endfunction
+
+endmodule
--- a/rtl/ldpc_decoder_top.sv
+++ b/rtl/ldpc_decoder_top.sv
@@ -0,0 +1,110 @@
+// LDPC Decoder Top - QC-LDPC Rate 1/8 for Photon-Starved Optical Communication
+// Target: Efabless chipIgnite (SkyWater 130nm, Caravel harness)
+//
+// Code parameters:
+//   Rate 1/8, n=256 coded bits, k=32 info bits
+//   QC-LDPC with 7x8 base matrix, lifting factor Z=32
+//   Offset min-sum decoding, layered scheduling
+//
+// Input:  6-bit signed LLRs (log-likelihood ratios from photon detector)
+// Output: 32 decoded information bits + convergence status
+
+module ldpc_decoder_top #(
+    parameter N_BASE    = 8,        // base matrix columns
+    parameter M_BASE    = 7,        // base matrix rows
+    parameter Z         = 32,       // lifting factor
+    parameter N         = N_BASE * Z, // codeword length = 256
+    parameter K         = Z,        // info bits = 32 (rate 1/8)
+    parameter M         = M_BASE * Z, // parity checks = 224
+    parameter Q         = 6,        // LLR quantization bits (signed)
+    parameter MAX_ITER  = 30,       // maximum decoding iterations
+    parameter DC        = 8,        // check node degree (= N_BASE for regular)
+    parameter DV_MAX    = 7         // max variable node degree (= M_BASE for regular)
+)(
+    input  logic        clk,
+    input  logic        rst_n,
+
+    // Wishbone B4 pipelined slave interface
+    input  logic        wb_cyc_i,
+    input  logic        wb_stb_i,
+    input  logic        wb_we_i,
+    input  logic [7:0]  wb_adr_i,   // byte address (256 bytes address space)
+    input  logic [31:0] wb_dat_i,
+    output logic [31:0] wb_dat_o,
+    output logic        wb_ack_o,
+
+    // Interrupt (active high, directly to Caravel IRQ)
+    output logic        irq_o
+);
+
+    // =========================================================================
+    // Wishbone register interface
+    // =========================================================================
+
+    // Control/status registers
+    logic        ctrl_start;         // pulse: begin decoding
+    logic        ctrl_early_term;    // enable early termination
+    logic [4:0]  ctrl_max_iter;      // max iterations (0 = use MAX_ITER)
+
+    logic        stat_busy;
+    logic        stat_converged;
+    logic [4:0]  stat_iter_used;
+
+    // LLR input buffer (written by host before starting decode)
+    logic signed [Q-1:0] llr_input [N];
+
+    // Decoded output
+    logic [K-1:0]  decoded_bits;
+    logic [7:0]    syndrome_weight;
+
+    wishbone_interface #(
+        .N(N), .K(K), .Q(Q)
+    ) u_wb (
+        .clk            (clk),
+        .rst_n          (rst_n),
+        .wb_cyc_i       (wb_cyc_i),
+        .wb_stb_i       (wb_stb_i),
+        .wb_we_i        (wb_we_i),
+        .wb_adr_i       (wb_adr_i),
+        .wb_dat_i       (wb_dat_i),
+        .wb_dat_o       (wb_dat_o),
+        .wb_ack_o       (wb_ack_o),
+        .ctrl_start     (ctrl_start),
+        .ctrl_early_term(ctrl_early_term),
+        .ctrl_max_iter  (ctrl_max_iter),
+        .stat_busy      (stat_busy),
+        .stat_converged (stat_converged),
+        .stat_iter_used (stat_iter_used),
+        .llr_input      (llr_input),
+        .decoded_bits   (decoded_bits),
+        .syndrome_weight(syndrome_weight),
+        .irq_o          (irq_o)
+    );
+
+    // =========================================================================
+    // Decoder core
+    // =========================================================================
+
+    ldpc_decoder_core #(
+        .N_BASE   (N_BASE),
+        .M_BASE   (M_BASE),
+        .Z        (Z),
+        .Q        (Q),
+        .MAX_ITER (MAX_ITER),
+        .DC       (DC),
+        .DV_MAX   (DV_MAX)
+    ) u_core (
+        .clk            (clk),
+        .rst_n          (rst_n),
+        .start          (ctrl_start),
+        .early_term_en  (ctrl_early_term),
+        .max_iter       (ctrl_max_iter),
+        .llr_in         (llr_input),
+        .busy           (stat_busy),
+        .converged      (stat_converged),
+        .iter_used      (stat_iter_used),
+        .decoded_bits   (decoded_bits),
+        .syndrome_weight(syndrome_weight)
+    );
+
+endmodule
--- a/rtl/wishbone_interface.sv
+++ b/rtl/wishbone_interface.sv
@@ -0,0 +1,139 @@
+// Wishbone B4 slave interface for LDPC decoder
+// Compatible with Caravel SoC Wishbone interconnect
+//
+// Register map (byte-addressed):
+//   0x00 CTRL     R/W  [0]=start (auto-clear), [1]=early_term_en, [12:8]=max_iter
+//   0x04 STATUS   R    [0]=busy, [1]=converged, [12:8]=iterations_used, [23:16]=syndrome_wt
+//   0x10-0x4F LLR  W   Channel LLRs packed 5x6-bit per 32-bit word (52 words for 256 LLRs)
+//   0x50 DECODED  R    32 decoded info bits
+//   0x54 VERSION  R    Version/ID register
+
+module wishbone_interface #(
+    parameter N = 256,
+    parameter K = 32,
+    parameter Q = 6
+)(
+    input  logic        clk,
+    input  logic        rst_n,
+
+    // Wishbone slave
+    input  logic        wb_cyc_i,
+    input  logic        wb_stb_i,
+    input  logic        wb_we_i,
+    input  logic [7:0]  wb_adr_i,
+    input  logic [31:0] wb_dat_i,
+    output logic [31:0] wb_dat_o,
+    output logic        wb_ack_o,
+
+    // To/from decoder core
+    output logic                    ctrl_start,
+    output logic                    ctrl_early_term,
+    output logic [4:0]              ctrl_max_iter,
+    input  logic                    stat_busy,
+    input  logic                    stat_converged,
+    input  logic [4:0]              stat_iter_used,
+    output logic signed [Q-1:0]     llr_input [N],
+    input  logic [K-1:0]            decoded_bits,
+    input  logic [7:0]              syndrome_weight,
+
+    // Interrupt
+    output logic                    irq_o
+);
+
+    localparam VERSION_ID = 32'hLD01_0001;  // LDPC v0.1 build 1
+
+    // Wishbone handshake: ack on valid cycle
+    logic wb_valid;
+    assign wb_valid = wb_cyc_i && wb_stb_i;
+
+    always_ff @(posedge clk or negedge rst_n) begin
+        if (!rst_n)
+            wb_ack_o <= 1'b0;
+        else
+            wb_ack_o <= wb_valid && !wb_ack_o;  // single-cycle ack
+    end
+
+    // =========================================================================
+    // Control register
+    // =========================================================================
+
+    logic start_pending;
+    logic early_term_reg;
+    logic [4:0] max_iter_reg;
+
+    // Start is a pulse: set on write, cleared after one cycle
+    always_ff @(posedge clk or negedge rst_n) begin
+        if (!rst_n) begin
+            start_pending  <= 1'b0;
+            early_term_reg <= 1'b1;   // early termination on by default
+            max_iter_reg   <= 5'd0;   // 0 = use MAX_ITER default
+        end else begin
+            if (ctrl_start)
+                start_pending <= 1'b0;
+
+            if (wb_valid && wb_we_i && !wb_ack_o && wb_adr_i == 8'h00) begin
+                start_pending  <= wb_dat_i[0];
+                early_term_reg <= wb_dat_i[1];
+                max_iter_reg   <= wb_dat_i[12:8];
+            end
+        end
+    end
+
+    assign ctrl_start     = start_pending && !stat_busy;
+    assign ctrl_early_term = early_term_reg;
+    assign ctrl_max_iter   = max_iter_reg;
+
+    // =========================================================================
+    // LLR input: pack 5 LLRs per 32-bit word
+    // Word at offset 0x10 + 4*i contains LLRs [5*i] through [5*i+4]
+    // Bits [5:0] = LLR[5*i], [11:6] = LLR[5*i+1], ... [29:24] = LLR[5*i+4]
+    // 52 words cover 260 LLRs (256 used, 4 padding)
+    // =========================================================================
+
+    always_ff @(posedge clk) begin
+        if (wb_valid && wb_we_i && !wb_ack_o) begin
+            if (wb_adr_i >= 8'h10 && wb_adr_i < 8'hE0) begin
+                int word_idx;
+                word_idx = (wb_adr_i - 8'h10) >> 2;
+                for (int p = 0; p < 5; p++) begin
+                    int llr_idx;
+                    llr_idx = word_idx * 5 + p;
+                    if (llr_idx < N)
+                        llr_input[llr_idx] <= wb_dat_i[p*Q +: Q];
+                end
+            end
+        end
+    end
+
+    // =========================================================================
+    // Read mux
+    // =========================================================================
+
+    always_comb begin
+        wb_dat_o = 32'h0;
+        case (wb_adr_i)
+            8'h00: wb_dat_o = {19'b0, max_iter_reg, 6'b0, early_term_reg, start_pending};
+            8'h04: wb_dat_o = {8'b0, syndrome_weight, 3'b0, stat_iter_used, 6'b0, stat_converged, stat_busy};
+            8'h50: wb_dat_o = decoded_bits;
+            8'h54: wb_dat_o = VERSION_ID;
+            default: wb_dat_o = 32'h0;
+        endcase
+    end
+
+    // =========================================================================
+    // Interrupt: assert when decode completes (busy falls)
+    // =========================================================================
+
+    logic busy_d1;
+    always_ff @(posedge clk or negedge rst_n) begin
+        if (!rst_n) begin
+            busy_d1 <= 1'b0;
+            irq_o   <= 1'b0;
+        end else begin
+            busy_d1 <= stat_busy;
+            // Pulse IRQ on falling edge of busy
+            irq_o <= busy_d1 && !stat_busy;
+        end
+    end
+
+endmodule