diff --git a/docs/hardening-results.md b/docs/hardening-results.md new file mode 100644 index 0000000..cb950c5 --- /dev/null +++ b/docs/hardening-results.md @@ -0,0 +1,197 @@ +# LDPC Decoder Hardening Results + +## Run 1: `26_02_25_21_11` (Feb 25, 2026) — FAILED +- **RTL**: Original (unpipelined CN update) +- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=true`, `HEURISTIC_ANTENNA_THRESHOLD=110` +- **Die area**: 2800 x 1760 µm (4.93 mm²) +- **Failure**: `GRT-0118` routing congestion after heuristic diode insertion (66,016 diodes added) +- **Notes**: Initial global routing passed (0 overflow, 39% routing utilization). Diode insertion nearly doubled cell count, causing re-routing congestion failure. + +## Run 2: `reuse_synth` (Feb 27, 2026) — COMPLETED (timing violations) +- **RTL**: Original (unpipelined CN update) — reused synthesis netlist from Run 1 +- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true` +- **Die area**: 2800 x 1760 µm (4.93 mm²) +- **Result**: All 70 steps completed. GDS generated. Deferred timing errors. + +### Physical Results +| Metric | Result | +|--------|--------| +| Magic DRC | **Clean** | +| KLayout DRC | **Clean** | +| LVS | **Clean** (0 errors, 0 unmatched) | +| XOR (Magic vs KLayout) | **Clean** | +| Illegal overlap | **Clean** | +| Power grid violations | **0** | +| Antenna violating nets | 658 | +| Antenna violating pins | 905 | + +### Area & Utilization +| Metric | Value | +|--------|-------| +| Die area | 4,928,000 µm² (4.93 mm²) | +| Core area | 4,846,670 µm² | +| Instance count | 184,663 | +| Instance area | 1,303,260 µm² (1.30 mm²) | +| Core utilization | 26.9% | +| Sequential cells | 16,967 | +| Combinational cells | 61,366 | +| Timing repair buffers | 23,709 | +| Fill cells | 415,149 | +| Tap cells | 69,228 | + +### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target) +| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) | Setup Violations | +|--------|----------------|-----------------|----------------|----------------|------------------| +| nom_tt_025C_1v80 | **-27.13** | -234.9 | -0.32 | -3.76 | 9 | +| nom_ss_100C_1v60 | **-70.58** | -29,946.3 | 0.06 | 0 | 5,463 | +| nom_ff_n40C_1v95 | **-10.18** | -86.3 | -0.26 | -12.4 | — | +| **Worst across all** | **-71.40** | -34,329.1 | -0.47 | -26.4 | — | + +### Estimated Max Frequency +- **TT corner**: Critical path ~47 ns → **~21 MHz** +- **SS corner**: Critical path ~91 ns → **~11 MHz** +- **FF corner**: Critical path ~30 ns → **~33 MHz** + +### Power (TT corner) +| Component | Power (W) | +|-----------|-----------| +| Internal | 0.0554 | +| Switching | 0.0273 | +| Leakage | ~0.002 mW | +| **Total** | **0.0827** | + +### Key Observations +1. Disabling heuristic diode insertion fixed the routing congestion failure from Run 1 +2. 658 antenna violations remain — iterative antenna repair was not sufficient. May need to re-enable heuristic insertion with a higher threshold or use `DIODE_ON_PORTS` +3. Setup timing is severely violated — critical path is ~47 ns at TT, far from 20 ns target +4. This run used the **unpipelined** RTL (synthesis reused from Run 1 which predated the CN pipeline split) +5. Next run should re-synthesize with pipelined CN update RTL to see if timing improves + +## Run 3: `pipelined_pnr` (Mar 1, 2026) — FAILED +- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2) +- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true` +- **Die area**: 2800 x 1760 µm (4.93 mm²) +- **Failure**: `GRT-0118` routing congestion during iterative antenna repair (step 36), after 13+ hours of repair loops +- **Notes**: Iterative antenna repair kept inserting diodes and re-routing until congestion became too high. Same root cause as Run 1 but via different mechanism. + +## Run 3b: `pipelined_synth` (Feb 28, 2026) — STILL RUNNING +- **RTL**: Pipelined CN update +- **Config**: `SYNTH_STRATEGY=AREA 2` — synthesis only +- **Status**: ABC pass 2 (tech mapping) running 20+ hours. `AREA 2` is far too aggressive for this design size. **Do not use AREA 2 for this design.** + +## Run 4: `pipelined_noantenna` (Mar 2, 2026) — COMPLETED (timing violations) +- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2) +- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=false` +- **Die area**: 2800 x 1760 µm (4.93 mm²) +- **Result**: All 69 steps completed. GDS generated. Deferred timing errors. No antenna repair attempted. + +### Physical Results +| Metric | Result | +|--------|--------| +| Magic DRC | **Clean** | +| KLayout DRC | **Clean** | +| LVS | **Clean** (0 errors, 0 unmatched) | +| XOR (Magic vs KLayout) | **Clean** | +| Illegal overlap | **Clean** | +| Antenna violating nets | 1,707 (no repair attempted) | +| Antenna violating pins | 3,319 (no repair attempted) | + +### Area & Utilization +| Metric | Value | +|--------|-------| +| Die area | 4,928,000 µm² (4.93 mm²) | +| Instance count | 183,774 | +| Instance area | 1,351,790 µm² (1.35 mm²) | +| Core utilization | 27.9% | + +### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target) +| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) | +|--------|----------------|-----------------|----------------|----------------| +| nom_tt_025C_1v80 | **-28.86** | -348.0 | -0.08 | -0.15 | +| nom_ss_100C_1v60 | **-74.22** | -20,536.0 | -0.07 | -0.07 | +| nom_ff_n40C_1v95 | **-11.04** | -93.8 | -0.12 | -2.15 | +| min_tt_025C_1v80 | -28.39 | -251.0 | 0 | 0 | +| max_tt_025C_1v80 | -29.36 | -725.1 | -0.24 | -2.15 | + +### Estimated Max Frequency +- **TT corner**: Critical path ~49 ns → **~20 MHz** +- **SS corner**: Critical path ~94 ns → **~11 MHz** +- **FF corner**: Critical path ~31 ns → **~32 MHz** + +### Power (TT corner) +| Metric | Value | +|--------|-------| +| **Total** | **0.0858 W** | + +### Key Observations +1. Pipelined CN update did NOT improve timing — TT WNS is -28.86 ns vs -27.13 ns (unpipelined Run 2). Slightly worse, possibly due to AREA 0 vs AREA 2 synth strategy difference. +2. Hold violations are much smaller than Run 2 (-0.08 vs -0.32 ns), nearly clean. +3. Antenna violations increased to 1,707 nets (vs 658 in Run 2) without any repair — AREA 0 produces a less antenna-friendly netlist. +4. The critical path is still ~47-49 ns, suggesting the bottleneck is NOT the CN update pipeline stage but something else (likely the large mux/barrel shifter or belief update logic). +5. `SYNTH_STRATEGY=AREA 2` takes 20+ hours for ABC tech mapping on this design — **never use it**. `AREA 0` completed in reasonable time. + +## Summary Table + +| Run | RTL | Synth | Antenna | Status | TT Setup WNS | Max Freq (TT) | +|-----|-----|-------|---------|--------|-------------|---------------| +| 1 | Unpipelined | AREA 2 | Heuristic 110µm | **FAILED** (congestion) | — | — | +| 2 | Unpipelined | AREA 2 | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz | +| 3 | Pipelined | AREA 0 | Iterative | **FAILED** (congestion) | — | — | +| 3b | Pipelined | AREA 2 | — (synth only) | Still running (20+ hrs) | — | — | +| 4 | Pipelined | AREA 0 | None | **COMPLETED** | -28.86 ns | ~20 MHz | + +## Critical Path Analysis (from Run 4, pipelined_noantenna) + +### Path Summary +| Item | Value | +|------|-------| +| Startpoint | `u_core.beliefs[0][5]` (beliefs register, bit 5 of element 0) | +| Endpoint | `syndrome_weight[7]` (MSB of syndrome weight counter) | +| RTL location | `SYNDROME` state in `ldpc_decoder_core.sv`, lines 363-385 | +| Slack | **-28.859 ns** (VIOLATED) | +| Total combinational delay | **47.67 ns** | +| Logic levels | **222** (171 XOR/XNOR + 51 adder/mux) | +| Logic vs wire delay | 99.7% logic / 0.3% wire | + +All 8 worst setup violators fan out from `beliefs[0][5]` to `syndrome_weight[7:0]`. + +### What the Critical Path Computes + +The `SYNDROME` state computes the full syndrome check in a **single clock cycle**: + +1. **Parity computation** (171 XOR levels, 33.9 ns): XOR the sign bits of all beliefs connected to each check node — 7 rows x 32 z-elements x up to 3 columns = 224 parity bits, reading from 256 belief sign bits. +2. **Population count** (51 adder levels, 13.6 ns): Sum all 224 parity results into an 8-bit `syndrome_cnt`. + +The `syndrome_cnt = syndrome_cnt + 1` accumulation pattern creates a carry chain dependency that serializes everything. + +### Delay Breakdown +| Segment | Delay (ns) | Cells | Description | +|---------|-----------|-------|-------------| +| Source CLK-to-Q | 0.795 | 1 (dfxtp_4) | beliefs[0][5] register output | +| Parity XOR chain | 33.888 | 171 (xor2/xnor2) | XOR reduction across belief sign bits | +| Popcount adder tree | 13.634 | 51 (and/or/aoi/oai) | 224-bit popcount to 8-bit count | +| State MUX | 0.148 | 1 (mux2_1) | FSM output mux | +| Wire (interconnect) | 0.149 | — | 0.3% of total — negligible | +| **Total** | **48.614** | **222 levels** | | + +### Proposed Fix: 2-3 Stage Syndrome Pipeline + +**SYNDROME_S1** (cycle 1, ~16 ns): Compute all 224 parity bits in parallel. Each parity is only 2-3 XOR operations deep (one per connected column). Register the 224-bit `parity_vec`. + +**SYNDROME_S2** (cycle 2, ~14 ns): Popcount the 224-bit parity vector via balanced adder tree. Register the 8-bit `syndrome_weight` and `syndrome_ok` flag. + +**SYNDROME_DONE** (cycle 3): Already exists — reads `syndrome_ok`. + +**Estimated post-fix critical path**: ~14-16 ns (comfortably under 20 ns / 50 MHz). +**Latency impact**: +1-2 cycles per iteration (negligible at 30 iterations). + +### Secondary Violations + +Wishbone address input (`wb_adr_i`) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary. + +## Next Steps +- Implement syndrome pipeline (SYNDROME_S1 + SYNDROME_S2) to cut critical path from ~49 ns to ~16 ns +- Register Wishbone address input to fix secondary violation +- Re-synthesize with AREA 0 and run PnR to verify timing improvement +- Consider increasing die area for antenna repair headroom +- Consider `SYNTH_STRATEGY=AREA 1` as middle ground between AREA 0 and AREA 2