Files
ldpc_optical/docs/hardening-results.md
cah f2901c6366 docs: add OpenLane hardening results and critical path analysis
Documents 4 hardening runs with timing/area/DRC results. Identifies
SYNDROME state as critical path bottleneck (222 logic levels, 49 ns)
and proposes 2-stage pipeline fix to meet 50 MHz target.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:03:35 -07:00

9.7 KiB

LDPC Decoder Hardening Results

Run 1: 26_02_25_21_11 (Feb 25, 2026) — FAILED

  • RTL: Original (unpipelined CN update)
  • Config: CLOCK_PERIOD=20 (50 MHz), RUN_HEURISTIC_DIODE_INSERTION=true, HEURISTIC_ANTENNA_THRESHOLD=110
  • Die area: 2800 x 1760 µm (4.93 mm²)
  • Failure: GRT-0118 routing congestion after heuristic diode insertion (66,016 diodes added)
  • Notes: Initial global routing passed (0 overflow, 39% routing utilization). Diode insertion nearly doubled cell count, causing re-routing congestion failure.

Run 2: reuse_synth (Feb 27, 2026) — COMPLETED (timing violations)

  • RTL: Original (unpipelined CN update) — reused synthesis netlist from Run 1
  • Config: CLOCK_PERIOD=20 (50 MHz), RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=true
  • Die area: 2800 x 1760 µm (4.93 mm²)
  • Result: All 70 steps completed. GDS generated. Deferred timing errors.

Physical Results

Metric Result
Magic DRC Clean
KLayout DRC Clean
LVS Clean (0 errors, 0 unmatched)
XOR (Magic vs KLayout) Clean
Illegal overlap Clean
Power grid violations 0
Antenna violating nets 658
Antenna violating pins 905

Area & Utilization

Metric Value
Die area 4,928,000 µm² (4.93 mm²)
Core area 4,846,670 µm²
Instance count 184,663
Instance area 1,303,260 µm² (1.30 mm²)
Core utilization 26.9%
Sequential cells 16,967
Combinational cells 61,366
Timing repair buffers 23,709
Fill cells 415,149
Tap cells 69,228

Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)

Corner Setup WNS (ns) Setup TNS (ns) Hold WNS (ns) Hold TNS (ns) Setup Violations
nom_tt_025C_1v80 -27.13 -234.9 -0.32 -3.76 9
nom_ss_100C_1v60 -70.58 -29,946.3 0.06 0 5,463
nom_ff_n40C_1v95 -10.18 -86.3 -0.26 -12.4
Worst across all -71.40 -34,329.1 -0.47 -26.4

Estimated Max Frequency

  • TT corner: Critical path ~47 ns → ~21 MHz
  • SS corner: Critical path ~91 ns → ~11 MHz
  • FF corner: Critical path ~30 ns → ~33 MHz

Power (TT corner)

Component Power (W)
Internal 0.0554
Switching 0.0273
Leakage ~0.002 mW
Total 0.0827

Key Observations

  1. Disabling heuristic diode insertion fixed the routing congestion failure from Run 1
  2. 658 antenna violations remain — iterative antenna repair was not sufficient. May need to re-enable heuristic insertion with a higher threshold or use DIODE_ON_PORTS
  3. Setup timing is severely violated — critical path is ~47 ns at TT, far from 20 ns target
  4. This run used the unpipelined RTL (synthesis reused from Run 1 which predated the CN pipeline split)
  5. Next run should re-synthesize with pipelined CN update RTL to see if timing improves

Run 3: pipelined_pnr (Mar 1, 2026) — FAILED

  • RTL: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
  • Config: CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=true
  • Die area: 2800 x 1760 µm (4.93 mm²)
  • Failure: GRT-0118 routing congestion during iterative antenna repair (step 36), after 13+ hours of repair loops
  • Notes: Iterative antenna repair kept inserting diodes and re-routing until congestion became too high. Same root cause as Run 1 but via different mechanism.

Run 3b: pipelined_synth (Feb 28, 2026) — STILL RUNNING

  • RTL: Pipelined CN update
  • Config: SYNTH_STRATEGY=AREA 2 — synthesis only
  • Status: ABC pass 2 (tech mapping) running 20+ hours. AREA 2 is far too aggressive for this design size. Do not use AREA 2 for this design.

Run 4: pipelined_noantenna (Mar 2, 2026) — COMPLETED (timing violations)

  • RTL: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
  • Config: CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=false
  • Die area: 2800 x 1760 µm (4.93 mm²)
  • Result: All 69 steps completed. GDS generated. Deferred timing errors. No antenna repair attempted.

Physical Results

Metric Result
Magic DRC Clean
KLayout DRC Clean
LVS Clean (0 errors, 0 unmatched)
XOR (Magic vs KLayout) Clean
Illegal overlap Clean
Antenna violating nets 1,707 (no repair attempted)
Antenna violating pins 3,319 (no repair attempted)

Area & Utilization

Metric Value
Die area 4,928,000 µm² (4.93 mm²)
Instance count 183,774
Instance area 1,351,790 µm² (1.35 mm²)
Core utilization 27.9%

Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)

Corner Setup WNS (ns) Setup TNS (ns) Hold WNS (ns) Hold TNS (ns)
nom_tt_025C_1v80 -28.86 -348.0 -0.08 -0.15
nom_ss_100C_1v60 -74.22 -20,536.0 -0.07 -0.07
nom_ff_n40C_1v95 -11.04 -93.8 -0.12 -2.15
min_tt_025C_1v80 -28.39 -251.0 0 0
max_tt_025C_1v80 -29.36 -725.1 -0.24 -2.15

Estimated Max Frequency

  • TT corner: Critical path ~49 ns → ~20 MHz
  • SS corner: Critical path ~94 ns → ~11 MHz
  • FF corner: Critical path ~31 ns → ~32 MHz

Power (TT corner)

Metric Value
Total 0.0858 W

Key Observations

  1. Pipelined CN update did NOT improve timing — TT WNS is -28.86 ns vs -27.13 ns (unpipelined Run 2). Slightly worse, possibly due to AREA 0 vs AREA 2 synth strategy difference.
  2. Hold violations are much smaller than Run 2 (-0.08 vs -0.32 ns), nearly clean.
  3. Antenna violations increased to 1,707 nets (vs 658 in Run 2) without any repair — AREA 0 produces a less antenna-friendly netlist.
  4. The critical path is still ~47-49 ns, suggesting the bottleneck is NOT the CN update pipeline stage but something else (likely the large mux/barrel shifter or belief update logic).
  5. SYNTH_STRATEGY=AREA 2 takes 20+ hours for ABC tech mapping on this design — never use it. AREA 0 completed in reasonable time.

Summary Table

Run RTL Synth Antenna Status TT Setup WNS Max Freq (TT)
1 Unpipelined AREA 2 Heuristic 110µm FAILED (congestion)
2 Unpipelined AREA 2 Iterative COMPLETED -27.13 ns ~21 MHz
3 Pipelined AREA 0 Iterative FAILED (congestion)
3b Pipelined AREA 2 — (synth only) Still running (20+ hrs)
4 Pipelined AREA 0 None COMPLETED -28.86 ns ~20 MHz

Critical Path Analysis (from Run 4, pipelined_noantenna)

Path Summary

Item Value
Startpoint u_core.beliefs[0][5] (beliefs register, bit 5 of element 0)
Endpoint syndrome_weight[7] (MSB of syndrome weight counter)
RTL location SYNDROME state in ldpc_decoder_core.sv, lines 363-385
Slack -28.859 ns (VIOLATED)
Total combinational delay 47.67 ns
Logic levels 222 (171 XOR/XNOR + 51 adder/mux)
Logic vs wire delay 99.7% logic / 0.3% wire

All 8 worst setup violators fan out from beliefs[0][5] to syndrome_weight[7:0].

What the Critical Path Computes

The SYNDROME state computes the full syndrome check in a single clock cycle:

  1. Parity computation (171 XOR levels, 33.9 ns): XOR the sign bits of all beliefs connected to each check node — 7 rows x 32 z-elements x up to 3 columns = 224 parity bits, reading from 256 belief sign bits.
  2. Population count (51 adder levels, 13.6 ns): Sum all 224 parity results into an 8-bit syndrome_cnt.

The syndrome_cnt = syndrome_cnt + 1 accumulation pattern creates a carry chain dependency that serializes everything.

Delay Breakdown

Segment Delay (ns) Cells Description
Source CLK-to-Q 0.795 1 (dfxtp_4) beliefs[0][5] register output
Parity XOR chain 33.888 171 (xor2/xnor2) XOR reduction across belief sign bits
Popcount adder tree 13.634 51 (and/or/aoi/oai) 224-bit popcount to 8-bit count
State MUX 0.148 1 (mux2_1) FSM output mux
Wire (interconnect) 0.149 0.3% of total — negligible
Total 48.614 222 levels

Proposed Fix: 2-3 Stage Syndrome Pipeline

SYNDROME_S1 (cycle 1, ~16 ns): Compute all 224 parity bits in parallel. Each parity is only 2-3 XOR operations deep (one per connected column). Register the 224-bit parity_vec.

SYNDROME_S2 (cycle 2, ~14 ns): Popcount the 224-bit parity vector via balanced adder tree. Register the 8-bit syndrome_weight and syndrome_ok flag.

SYNDROME_DONE (cycle 3): Already exists — reads syndrome_ok.

Estimated post-fix critical path: ~14-16 ns (comfortably under 20 ns / 50 MHz). Latency impact: +1-2 cycles per iteration (negligible at 30 iterations).

Secondary Violations

Wishbone address input (wb_adr_i) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.

Next Steps

  • Implement syndrome pipeline (SYNDROME_S1 + SYNDROME_S2) to cut critical path from ~49 ns to ~16 ns
  • Register Wishbone address input to fix secondary violation
  • Re-synthesize with AREA 0 and run PnR to verify timing improvement
  • Consider increasing die area for antenna repair headroom
  • Consider SYNTH_STRATEGY=AREA 1 as middle ground between AREA 0 and AREA 2