LDPC Decoder Hardening Results
Run 1: 26_02_25_21_11 (Feb 25, 2026) — FAILED
- RTL: Original (unpipelined CN update)
- Config:
CLOCK_PERIOD=20 (50 MHz), RUN_HEURISTIC_DIODE_INSERTION=true, HEURISTIC_ANTENNA_THRESHOLD=110
- Die area: 2800 x 1760 µm (4.93 mm²)
- Failure:
GRT-0118 routing congestion after heuristic diode insertion (66,016 diodes added)
- Notes: Initial global routing passed (0 overflow, 39% routing utilization). Diode insertion nearly doubled cell count, causing re-routing congestion failure.
Run 2: reuse_synth (Feb 27, 2026) — COMPLETED (timing violations)
- RTL: Original (unpipelined CN update) — reused synthesis netlist from Run 1
- Config:
CLOCK_PERIOD=20 (50 MHz), RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=true
- Die area: 2800 x 1760 µm (4.93 mm²)
- Result: All 70 steps completed. GDS generated. Deferred timing errors.
Physical Results
| Metric |
Result |
| Magic DRC |
Clean |
| KLayout DRC |
Clean |
| LVS |
Clean (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) |
Clean |
| Illegal overlap |
Clean |
| Power grid violations |
0 |
| Antenna violating nets |
658 |
| Antenna violating pins |
905 |
Area & Utilization
| Metric |
Value |
| Die area |
4,928,000 µm² (4.93 mm²) |
| Core area |
4,846,670 µm² |
| Instance count |
184,663 |
| Instance area |
1,303,260 µm² (1.30 mm²) |
| Core utilization |
26.9% |
| Sequential cells |
16,967 |
| Combinational cells |
61,366 |
| Timing repair buffers |
23,709 |
| Fill cells |
415,149 |
| Tap cells |
69,228 |
Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner |
Setup WNS (ns) |
Setup TNS (ns) |
Hold WNS (ns) |
Hold TNS (ns) |
Setup Violations |
| nom_tt_025C_1v80 |
-27.13 |
-234.9 |
-0.32 |
-3.76 |
9 |
| nom_ss_100C_1v60 |
-70.58 |
-29,946.3 |
0.06 |
0 |
5,463 |
| nom_ff_n40C_1v95 |
-10.18 |
-86.3 |
-0.26 |
-12.4 |
— |
| Worst across all |
-71.40 |
-34,329.1 |
-0.47 |
-26.4 |
— |
Estimated Max Frequency
- TT corner: Critical path ~47 ns → ~21 MHz
- SS corner: Critical path ~91 ns → ~11 MHz
- FF corner: Critical path ~30 ns → ~33 MHz
Power (TT corner)
| Component |
Power (W) |
| Internal |
0.0554 |
| Switching |
0.0273 |
| Leakage |
~0.002 mW |
| Total |
0.0827 |
Key Observations
- Disabling heuristic diode insertion fixed the routing congestion failure from Run 1
- 658 antenna violations remain — iterative antenna repair was not sufficient. May need to re-enable heuristic insertion with a higher threshold or use
DIODE_ON_PORTS
- Setup timing is severely violated — critical path is ~47 ns at TT, far from 20 ns target
- This run used the unpipelined RTL (synthesis reused from Run 1 which predated the CN pipeline split)
- Next run should re-synthesize with pipelined CN update RTL to see if timing improves
Run 3: pipelined_pnr (Mar 1, 2026) — FAILED
- RTL: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- Config:
CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=true
- Die area: 2800 x 1760 µm (4.93 mm²)
- Failure:
GRT-0118 routing congestion during iterative antenna repair (step 36), after 13+ hours of repair loops
- Notes: Iterative antenna repair kept inserting diodes and re-routing until congestion became too high. Same root cause as Run 1 but via different mechanism.
Run 3b: pipelined_synth (Feb 28, 2026) — STILL RUNNING
- RTL: Pipelined CN update
- Config:
SYNTH_STRATEGY=AREA 2 — synthesis only
- Status: ABC pass 2 (tech mapping) running 20+ hours.
AREA 2 is far too aggressive for this design size. Do not use AREA 2 for this design.
Run 4: pipelined_noantenna (Mar 2, 2026) — COMPLETED (timing violations)
- RTL: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- Config:
CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=false
- Die area: 2800 x 1760 µm (4.93 mm²)
- Result: All 69 steps completed. GDS generated. Deferred timing errors. No antenna repair attempted.
Physical Results
| Metric |
Result |
| Magic DRC |
Clean |
| KLayout DRC |
Clean |
| LVS |
Clean (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) |
Clean |
| Illegal overlap |
Clean |
| Antenna violating nets |
1,707 (no repair attempted) |
| Antenna violating pins |
3,319 (no repair attempted) |
Area & Utilization
| Metric |
Value |
| Die area |
4,928,000 µm² (4.93 mm²) |
| Instance count |
183,774 |
| Instance area |
1,351,790 µm² (1.35 mm²) |
| Core utilization |
27.9% |
Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner |
Setup WNS (ns) |
Setup TNS (ns) |
Hold WNS (ns) |
Hold TNS (ns) |
| nom_tt_025C_1v80 |
-28.86 |
-348.0 |
-0.08 |
-0.15 |
| nom_ss_100C_1v60 |
-74.22 |
-20,536.0 |
-0.07 |
-0.07 |
| nom_ff_n40C_1v95 |
-11.04 |
-93.8 |
-0.12 |
-2.15 |
| min_tt_025C_1v80 |
-28.39 |
-251.0 |
0 |
0 |
| max_tt_025C_1v80 |
-29.36 |
-725.1 |
-0.24 |
-2.15 |
Estimated Max Frequency
- TT corner: Critical path ~49 ns → ~20 MHz
- SS corner: Critical path ~94 ns → ~11 MHz
- FF corner: Critical path ~31 ns → ~32 MHz
Power (TT corner)
| Metric |
Value |
| Total |
0.0858 W |
Key Observations
- Pipelined CN update did NOT improve timing — TT WNS is -28.86 ns vs -27.13 ns (unpipelined Run 2). Slightly worse, possibly due to AREA 0 vs AREA 2 synth strategy difference.
- Hold violations are much smaller than Run 2 (-0.08 vs -0.32 ns), nearly clean.
- Antenna violations increased to 1,707 nets (vs 658 in Run 2) without any repair — AREA 0 produces a less antenna-friendly netlist.
- The critical path is still ~47-49 ns, suggesting the bottleneck is NOT the CN update pipeline stage but something else (likely the large mux/barrel shifter or belief update logic).
SYNTH_STRATEGY=AREA 2 takes 20+ hours for ABC tech mapping on this design — never use it. AREA 0 completed in reasonable time.
Summary Table
| Run |
RTL |
Synth |
Antenna |
Status |
TT Setup WNS |
Max Freq (TT) |
| 1 |
Unpipelined |
AREA 2 |
Heuristic 110µm |
FAILED (congestion) |
— |
— |
| 2 |
Unpipelined |
AREA 2 |
Iterative |
COMPLETED |
-27.13 ns |
~21 MHz |
| 3 |
Pipelined |
AREA 0 |
Iterative |
FAILED (congestion) |
— |
— |
| 3b |
Pipelined |
AREA 2 |
— (synth only) |
Still running (20+ hrs) |
— |
— |
| 4 |
Pipelined |
AREA 0 |
None |
COMPLETED |
-28.86 ns |
~20 MHz |
Critical Path Analysis (from Run 4, pipelined_noantenna)
Path Summary
| Item |
Value |
| Startpoint |
u_core.beliefs[0][5] (beliefs register, bit 5 of element 0) |
| Endpoint |
syndrome_weight[7] (MSB of syndrome weight counter) |
| RTL location |
SYNDROME state in ldpc_decoder_core.sv, lines 363-385 |
| Slack |
-28.859 ns (VIOLATED) |
| Total combinational delay |
47.67 ns |
| Logic levels |
222 (171 XOR/XNOR + 51 adder/mux) |
| Logic vs wire delay |
99.7% logic / 0.3% wire |
All 8 worst setup violators fan out from beliefs[0][5] to syndrome_weight[7:0].
What the Critical Path Computes
The SYNDROME state computes the full syndrome check in a single clock cycle:
- Parity computation (171 XOR levels, 33.9 ns): XOR the sign bits of all beliefs connected to each check node — 7 rows x 32 z-elements x up to 3 columns = 224 parity bits, reading from 256 belief sign bits.
- Population count (51 adder levels, 13.6 ns): Sum all 224 parity results into an 8-bit
syndrome_cnt.
The syndrome_cnt = syndrome_cnt + 1 accumulation pattern creates a carry chain dependency that serializes everything.
Delay Breakdown
| Segment |
Delay (ns) |
Cells |
Description |
| Source CLK-to-Q |
0.795 |
1 (dfxtp_4) |
beliefs[0][5] register output |
| Parity XOR chain |
33.888 |
171 (xor2/xnor2) |
XOR reduction across belief sign bits |
| Popcount adder tree |
13.634 |
51 (and/or/aoi/oai) |
224-bit popcount to 8-bit count |
| State MUX |
0.148 |
1 (mux2_1) |
FSM output mux |
| Wire (interconnect) |
0.149 |
— |
0.3% of total — negligible |
| Total |
48.614 |
222 levels |
|
Proposed Fix: 2-3 Stage Syndrome Pipeline
SYNDROME_S1 (cycle 1, ~16 ns): Compute all 224 parity bits in parallel. Each parity is only 2-3 XOR operations deep (one per connected column). Register the 224-bit parity_vec.
SYNDROME_S2 (cycle 2, ~14 ns): Popcount the 224-bit parity vector via balanced adder tree. Register the 8-bit syndrome_weight and syndrome_ok flag.
SYNDROME_DONE (cycle 3): Already exists — reads syndrome_ok.
Estimated post-fix critical path: ~14-16 ns (comfortably under 20 ns / 50 MHz).
Latency impact: +1-2 cycles per iteration (negligible at 30 iterations).
Secondary Violations
Wishbone address input (wb_adr_i) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.
Next Steps
- Implement syndrome pipeline (SYNDROME_S1 + SYNDROME_S2) to cut critical path from ~49 ns to ~16 ns
- Register Wishbone address input to fix secondary violation
- Re-synthesize with AREA 0 and run PnR to verify timing improvement
- Consider increasing die area for antenna repair headroom
- Consider
SYNTH_STRATEGY=AREA 1 as middle ground between AREA 0 and AREA 2