LDPC Decoder Hardening Results
Run 1: 26_02_25_21_11 (Feb 25, 2026) — FAILED
- RTL: Original (unpipelined CN update)
- Config:
CLOCK_PERIOD=20 (50 MHz), RUN_HEURISTIC_DIODE_INSERTION=true, HEURISTIC_ANTENNA_THRESHOLD=110
- Die area: 2800 x 1760 µm (4.93 mm²)
- Failure:
GRT-0118 routing congestion after heuristic diode insertion (66,016 diodes added)
- Notes: Initial global routing passed (0 overflow, 39% routing utilization). Diode insertion nearly doubled cell count, causing re-routing congestion failure.
Run 2: reuse_synth (Feb 27, 2026) — COMPLETED (timing violations)
- RTL: Original (unpipelined CN update) — reused synthesis netlist from Run 1
- Config:
CLOCK_PERIOD=20 (50 MHz), RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=true
- Die area: 2800 x 1760 µm (4.93 mm²)
- Result: All 70 steps completed. GDS generated. Deferred timing errors.
Physical Results
| Metric |
Result |
| Magic DRC |
Clean |
| KLayout DRC |
Clean |
| LVS |
Clean (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) |
Clean |
| Illegal overlap |
Clean |
| Power grid violations |
0 |
| Antenna violating nets |
658 |
| Antenna violating pins |
905 |
Area & Utilization
| Metric |
Value |
| Die area |
4,928,000 µm² (4.93 mm²) |
| Core area |
4,846,670 µm² |
| Instance count |
184,663 |
| Instance area |
1,303,260 µm² (1.30 mm²) |
| Core utilization |
26.9% |
| Sequential cells |
16,967 |
| Combinational cells |
61,366 |
| Timing repair buffers |
23,709 |
| Fill cells |
415,149 |
| Tap cells |
69,228 |
Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner |
Setup WNS (ns) |
Setup TNS (ns) |
Hold WNS (ns) |
Hold TNS (ns) |
Setup Violations |
| nom_tt_025C_1v80 |
-27.13 |
-234.9 |
-0.32 |
-3.76 |
9 |
| nom_ss_100C_1v60 |
-70.58 |
-29,946.3 |
0.06 |
0 |
5,463 |
| nom_ff_n40C_1v95 |
-10.18 |
-86.3 |
-0.26 |
-12.4 |
— |
| Worst across all |
-71.40 |
-34,329.1 |
-0.47 |
-26.4 |
— |
Estimated Max Frequency
- TT corner: Critical path ~47 ns → ~21 MHz
- SS corner: Critical path ~91 ns → ~11 MHz
- FF corner: Critical path ~30 ns → ~33 MHz
Power (TT corner)
| Component |
Power (W) |
| Internal |
0.0554 |
| Switching |
0.0273 |
| Leakage |
~0.002 mW |
| Total |
0.0827 |
Key Observations
- Disabling heuristic diode insertion fixed the routing congestion failure from Run 1
- 658 antenna violations remain — iterative antenna repair was not sufficient. May need to re-enable heuristic insertion with a higher threshold or use
DIODE_ON_PORTS
- Setup timing is severely violated — critical path is ~47 ns at TT, far from 20 ns target
- This run used the unpipelined RTL (synthesis reused from Run 1 which predated the CN pipeline split)
- Next run should re-synthesize with pipelined CN update RTL to see if timing improves
Run 3: pipelined_pnr (Mar 1, 2026) — FAILED
- RTL: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- Config:
CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=true
- Die area: 2800 x 1760 µm (4.93 mm²)
- Failure:
GRT-0118 routing congestion during iterative antenna repair (step 36), after 13+ hours of repair loops
- Notes: Iterative antenna repair kept inserting diodes and re-routing until congestion became too high. Same root cause as Run 1 but via different mechanism.
Run 3b: pipelined_synth (Feb 28, 2026) — STILL RUNNING
- RTL: Pipelined CN update
- Config:
SYNTH_STRATEGY=AREA 2 — synthesis only
- Status: ABC pass 2 (tech mapping) running 20+ hours.
AREA 2 is far too aggressive for this design size. Do not use AREA 2 for this design.
Run 4: pipelined_noantenna (Mar 2, 2026) — COMPLETED (timing violations)
- RTL: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- Config:
CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_HEURISTIC_DIODE_INSERTION=false, RUN_ANTENNA_REPAIR=false
- Die area: 2800 x 1760 µm (4.93 mm²)
- Result: All 69 steps completed. GDS generated. Deferred timing errors. No antenna repair attempted.
Physical Results
| Metric |
Result |
| Magic DRC |
Clean |
| KLayout DRC |
Clean |
| LVS |
Clean (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) |
Clean |
| Illegal overlap |
Clean |
| Antenna violating nets |
1,707 (no repair attempted) |
| Antenna violating pins |
3,319 (no repair attempted) |
Area & Utilization
| Metric |
Value |
| Die area |
4,928,000 µm² (4.93 mm²) |
| Instance count |
183,774 |
| Instance area |
1,351,790 µm² (1.35 mm²) |
| Core utilization |
27.9% |
Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner |
Setup WNS (ns) |
Setup TNS (ns) |
Hold WNS (ns) |
Hold TNS (ns) |
| nom_tt_025C_1v80 |
-28.86 |
-348.0 |
-0.08 |
-0.15 |
| nom_ss_100C_1v60 |
-74.22 |
-20,536.0 |
-0.07 |
-0.07 |
| nom_ff_n40C_1v95 |
-11.04 |
-93.8 |
-0.12 |
-2.15 |
| min_tt_025C_1v80 |
-28.39 |
-251.0 |
0 |
0 |
| max_tt_025C_1v80 |
-29.36 |
-725.1 |
-0.24 |
-2.15 |
Estimated Max Frequency
- TT corner: Critical path ~49 ns → ~20 MHz
- SS corner: Critical path ~94 ns → ~11 MHz
- FF corner: Critical path ~31 ns → ~32 MHz
Power (TT corner)
| Metric |
Value |
| Total |
0.0858 W |
Key Observations
- Pipelined CN update did NOT improve timing — TT WNS is -28.86 ns vs -27.13 ns (unpipelined Run 2). Slightly worse, possibly due to AREA 0 vs AREA 2 synth strategy difference.
- Hold violations are much smaller than Run 2 (-0.08 vs -0.32 ns), nearly clean.
- Antenna violations increased to 1,707 nets (vs 658 in Run 2) without any repair — AREA 0 produces a less antenna-friendly netlist.
- The critical path is still ~47-49 ns, suggesting the bottleneck is NOT the CN update pipeline stage but something else (likely the large mux/barrel shifter or belief update logic).
SYNTH_STRATEGY=AREA 2 takes 20+ hours for ABC tech mapping on this design — never use it. AREA 0 completed in reasonable time.
Summary Table
| Run |
RTL |
Synth |
Antenna |
Status |
TT Setup WNS |
Max Freq (TT) |
| 1 |
Unpipelined |
AREA 2 |
Heuristic 110µm |
FAILED (congestion) |
— |
— |
| 2 |
Unpipelined |
AREA 2 |
Iterative |
COMPLETED |
-27.13 ns |
~21 MHz |
| 3 |
Pipelined |
AREA 0 |
Iterative |
FAILED (congestion) |
— |
— |
| 3b |
Pipelined |
AREA 2 |
— (synth only) |
Still running (20+ hrs) |
— |
— |
| 4 |
Pipelined |
AREA 0 |
None |
COMPLETED |
-28.86 ns |
~20 MHz |
Critical Path Analysis (from Run 4, pipelined_noantenna)
Path Summary
| Item |
Value |
| Startpoint |
u_core.beliefs[0][5] (beliefs register, bit 5 of element 0) |
| Endpoint |
syndrome_weight[7] (MSB of syndrome weight counter) |
| RTL location |
SYNDROME state in ldpc_decoder_core.sv, lines 363-385 |
| Slack |
-28.859 ns (VIOLATED) |
| Total combinational delay |
47.67 ns |
| Logic levels |
222 (171 XOR/XNOR + 51 adder/mux) |
| Logic vs wire delay |
99.7% logic / 0.3% wire |
All 8 worst setup violators fan out from beliefs[0][5] to syndrome_weight[7:0].
What the Critical Path Computes
The SYNDROME state computes the full syndrome check in a single clock cycle:
- Parity computation (171 XOR levels, 33.9 ns): XOR the sign bits of all beliefs connected to each check node — 7 rows x 32 z-elements x up to 3 columns = 224 parity bits, reading from 256 belief sign bits.
- Population count (51 adder levels, 13.6 ns): Sum all 224 parity results into an 8-bit
syndrome_cnt.
The syndrome_cnt = syndrome_cnt + 1 accumulation pattern creates a carry chain dependency that serializes everything.
Delay Breakdown
| Segment |
Delay (ns) |
Cells |
Description |
| Source CLK-to-Q |
0.795 |
1 (dfxtp_4) |
beliefs[0][5] register output |
| Parity XOR chain |
33.888 |
171 (xor2/xnor2) |
XOR reduction across belief sign bits |
| Popcount adder tree |
13.634 |
51 (and/or/aoi/oai) |
224-bit popcount to 8-bit count |
| State MUX |
0.148 |
1 (mux2_1) |
FSM output mux |
| Wire (interconnect) |
0.149 |
— |
0.3% of total — negligible |
| Total |
48.614 |
222 levels |
|
Proposed Fix: 2-3 Stage Syndrome Pipeline
SYNDROME_S1 (cycle 1, ~16 ns): Compute all 224 parity bits in parallel. Each parity is only 2-3 XOR operations deep (one per connected column). Register the 224-bit parity_vec.
SYNDROME_S2 (cycle 2, ~14 ns): Popcount the 224-bit parity vector via balanced adder tree. Register the 8-bit syndrome_weight and syndrome_ok flag.
SYNDROME_DONE (cycle 3): Already exists — reads syndrome_ok.
Estimated post-fix critical path: ~14-16 ns (comfortably under 20 ns / 50 MHz).
Latency impact: +1-2 cycles per iteration (negligible at 30 iterations).
Secondary Violations
Wishbone address input (wb_adr_i) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.
Run 5: syndrome_pipeline (Mar 3, 2026) — COMPLETED (timing violations)
- RTL: Pipelined CN + syndrome pipeline (SYNDROME_S1 + SYNDROME_S2 with serial popcount)
- Config:
CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_ANTENNA_REPAIR=false
- Die area: 2800 x 1760 µm (4.93 mm²)
- Result: All 75 steps completed. DRC/LVS clean.
- TT Setup WNS: -28.98 ns — no improvement from Run 4
- Root cause: Yosys serializes
syndrome_cnt = syndrome_cnt + 1 loop-carried dependency into ~48 ns chain
- Lesson: Splitting parity + popcount into 2 cycles helps nothing if the popcount itself is still serial
Run 6: balanced_popcount (Mar 4, 2026) — COMPLETED (TT timing MET!)
- RTL: Pipelined CN + syndrome pipeline with balanced 4-wide adder tree popcount
- Config:
CLOCK_PERIOD=20 (50 MHz), SYNTH_STRATEGY=AREA 0, RUN_ANTENNA_REPAIR=false
- Die area: 2800 x 1760 µm (4.93 mm²)
- Result: All 75 steps completed. DRC/LVS clean. TT timing met!
Physical Results
| Metric |
Result |
| Magic DRC |
Clean |
| KLayout DRC |
Clean |
| LVS |
Clean (0 errors, 0 unmatched) |
| Antenna violating nets |
1,687 (no repair attempted) |
Area & Utilization
| Metric |
Value |
| Die area |
4,928,000 µm² (4.93 mm²) |
| Instance count |
186,915 |
| Instance area |
1,367,580 µm² (1.37 mm²) |
| Core utilization |
28.2% |
| Sequential cells |
18,056 |
| Timing repair buffers |
27,864 |
Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner |
Setup WNS (ns) |
Setup TNS (ns) |
Hold WNS (ns) |
Hold TNS (ns) |
| nom_tt_025C_1v80 |
0.0 |
0 |
-0.45 |
-10.5 |
| nom_ss_100C_1v60 |
-9.18 |
-12,474.4 |
-0.17 |
-0.21 |
| nom_ff_n40C_1v95 |
0.0 |
0 |
-0.37 |
-38.6 |
| max_ss_100C_1v60 |
-10.45 |
-15,896.8 |
-0.44 |
-0.87 |
Estimated Max Frequency
- TT corner: 50 MHz — TIMING MET
- SS corner: Critical path ~40 ns → ~25 MHz (up from ~11 MHz)
- FF corner: 50 MHz — TIMING MET
New Critical Path (SS corner)
| Item |
Value |
| Startpoint |
u_core.col_idx[0] (column index register) |
| Endpoint |
u_core.beliefs registers |
| Slack |
-9.18 ns (nom_ss) |
| Data arrival time |
40.15 ns |
| Description |
Belief update mux path during LAYER_READ/LAYER_WRITE |
The syndrome path is NO LONGER critical. The new bottleneck is the column-indexed mux/barrel-shifter path used during belief reads and writes.
Key Observations
- Balanced popcount tree eliminated the syndrome bottleneck — WNS improved from -28.98 ns to 0.0 ns at TT
- TT and FF corners now fully meet 50 MHz timing
- SS corner still fails (-9.18 ns) due to a different path: belief update mux indexed by col_idx
- Hold violations are minor (-0.45 ns) and can be fixed with post-route optimization
- 1,687 antenna violations need to be addressed (antenna repair was disabled)
Updated Summary Table
| Run |
RTL |
Key Change |
Antenna |
Status |
TT Setup WNS |
Max Freq (TT) |
| 1 |
Unpipelined |
— |
Heuristic |
FAILED |
— |
— |
| 2 |
Unpipelined |
— |
Iterative |
COMPLETED |
-27.13 ns |
~21 MHz |
| 3 |
Pipelined CN |
CN pipeline |
Iterative |
FAILED |
— |
— |
| 4 |
Pipelined CN |
CN pipeline |
None |
COMPLETED |
-28.86 ns |
~20 MHz |
| 5 |
+ Syndrome pipeline |
Serial popcount |
None |
COMPLETED |
-28.98 ns |
~20 MHz |
| 6 |
+ Balanced popcount |
Adder tree |
None |
COMPLETED |
0.0 ns |
50 MHz |
Run 7a: pipelined_layer2 (Mar 9, 2026) — FAILED
- RTL: Run 6 + LAYER_WRITE split into LAYER_WRITE_ADDR + LAYER_WRITE_DATA
- Config:
CLOCK_PERIOD=20, DIODE_ON_PORTS=in, HEURISTIC_ANTENNA_THRESHOLD=200
- Failure:
GRT-0118 routing congestion — heuristic diode insertion on input ports added too many cells
- Lesson: Any heuristic diode insertion causes GRT failure on this design
Run 7b: pipelined_layer3 (Mar 9, 2026) — FAILED
- RTL: Same as 7a (LAYER_WRITE_ADDR/DATA split)
- Config:
DIODE_ON_PORTS=none, RUN_HEURISTIC_DIODE_INSERTION=false
- Failure: Post-CTS resizer diverged — 2.5+ hours at 100% CPU, memory climbing linearly, never converging
- Lesson: LAYER_WRITE pipeline split creates too many paths for OpenROAD resizer
Run 7c: pre_shift (Mar 9, 2026) — FAILED
- RTL: Run 6 + pre-registered H_BASE shift lookahead (
H_BASE[row_idx][col_idx+1])
- Config: Same as 7b
- Failure:
GPL-0302 placement density overflow — 150K cells at 41.3% exceeded 40% target
- Root cause: Yosys cannot fold H_BASE constants through registers → full 256:1 write mux explosion (~2x cell count vs Run 6's 83K)
- Lesson: Registering H_BASE shift values prevents Yosys constant folding
Run 7d: run6_baseline (Mar 9, 2026) — FAILED
- RTL: Reverted to Run 6 baseline (identical RTL)
- Config:
DIODE_ON_PORTS=in (inadvertently left from earlier runs), RUN_HEURISTIC_DIODE_INSERTION=false
- Cells: 85,500
- Failure:
GRT-0118 routing congestion
- Root cause:
DIODE_ON_PORTS=in inserts diodes on input ports even when heuristic insertion is disabled
Run 7e: run6b_nodiode (Mar 10, 2026) — FAILED
- RTL: Run 6 baseline
- Config:
DIODE_ON_PORTS=none, hold margins 0.5/0.3 (from config.json), reused run6_baseline synthesis
- Failure: Post-CTS resizer diverged (9+ GiB memory, 3+ hours, never converged)
- Root cause: Reusing synthesis from a run with different config (
DIODE_ON_PORTS=in) produces a subtly different netlist that causes PnR divergence
Run 7f: run6_clean (Mar 10, 2026) — FAILED
- RTL: Run 6 baseline, clean full run from scratch
- Config:
DIODE_ON_PORTS=none, hold margins 0.5/0.3
- Cells: 85,500
- Hold buffers inserted: 35,506
- Failure:
GRT-0118 routing congestion
- Root cause: Higher hold slack margins (0.5/0.3 vs balanced_popcount's 0.4/0.2) caused 13K extra hold buffers (35K vs 22K), pushing routing congestion over GRT threshold
Run 7g: run6_fixhold (Mar 10, 2026) — FAILED
- RTL: Run 6 baseline, reused
run6_clean synthesis
- Config:
DIODE_ON_PORTS=none, hold margins 0.4/0.2 (matching balanced_popcount)
- Failure: Post-CTS resizer diverged (14+ GiB, 3.5+ hours)
- Root cause: Yosys non-determinism —
run6_clean synthesis produced a slightly different cell mix that didn't route cleanly despite identical config
Run 7h: run6_reuse_bp (Mar 10, 2026) — COMPLETED (reproduces Run 6!)
- RTL: Run 6 baseline, reused balanced_popcount's actual synthesis netlist
- Config:
DIODE_ON_PORTS=none, hold margins 0.4/0.2
- Result: All stages completed. DRC/LVS clean. TT timing met!
- Hold buffers: 22,095 (identical to balanced_popcount)
Physical Results
| Metric |
Result |
| Magic DRC |
Clean |
| KLayout DRC |
Clean |
| LVS |
Clean (circuits match uniquely) |
| Antenna violating nets |
1,687 (repair disabled) |
| Antenna violating pins |
3,416 (repair disabled) |
Area & Utilization
| Metric |
Value |
| Die area |
4,928,000 µm² (4.93 mm²) |
| Instance count |
186,915 |
| Instance area |
1,367,580 µm² (1.37 mm²) |
| Core utilization |
28.2% |
Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner |
Setup WNS (ns) |
Setup TNS (ns) |
Hold WNS (ns) |
Hold TNS (ns) |
| nom_tt_025C_1v80 |
+3.28 |
0 |
-0.45 |
-10.5 |
| nom_ss_100C_1v60 |
-9.18 |
-12,474 |
-0.17 |
-0.21 |
| nom_ff_n40C_1v95 |
+5.93 |
0 |
-0.37 |
-38.6 |
| max_ss_100C_1v60 |
-10.45 |
-15,897 |
-0.44 |
-0.87 |
| min_tt_025C_1v80 |
+3.71 |
0 |
-0.26 |
-1.66 |
| max_tt_025C_1v80 |
+2.90 |
0 |
-0.62 |
-29.5 |
Key Observations
- Results identical to Run 6 — confirms that the balanced_popcount synthesis netlist is the key ingredient
- Yosys non-determinism is significant: re-synthesizing the same RTL with same config produces netlists that fail PnR
- Hold violations (1,543 total) are all on input port paths (
wb_dat_i, wb_adr_i), zero reg-to-reg — fixable with input delay constraints
- Max slew violations (4,112) and max cap violations (655) concentrated in SS corner
Updated Summary Table
| Run |
RTL |
Key Change |
Antenna |
Status |
TT Setup WNS |
Max Freq (TT) |
| 1 |
Unpipelined |
— |
Heuristic |
FAILED |
— |
— |
| 2 |
Unpipelined |
— |
Iterative |
COMPLETED |
-27.13 ns |
~21 MHz |
| 3 |
Pipelined CN |
CN pipeline |
Iterative |
FAILED |
— |
— |
| 4 |
Pipelined CN |
CN pipeline |
None |
COMPLETED |
-28.86 ns |
~20 MHz |
| 5 |
+ Syndrome pipeline |
Serial popcount |
None |
COMPLETED |
-28.98 ns |
~20 MHz |
| 6 |
+ Balanced popcount |
Adder tree |
None |
COMPLETED |
0.0 ns |
50 MHz |
| 7a |
+ LAYER_WRITE split |
ADDR/DATA pipeline |
Heuristic |
FAILED |
— |
— |
| 7b |
+ LAYER_WRITE split |
ADDR/DATA pipeline |
None |
FAILED (resizer) |
— |
— |
| 7c |
+ pre_shift |
H_BASE lookahead |
None |
FAILED (GPL) |
— |
— |
| 7d |
Run 6 baseline |
DIODE_ON_PORTS=in |
None |
FAILED (GRT) |
— |
— |
| 7e |
Run 6 baseline |
Reuse wrong synth |
None |
FAILED (resizer) |
— |
— |
| 7f |
Run 6 baseline |
Hold margins 0.5/0.3 |
None |
FAILED (GRT) |
— |
— |
| 7g |
Run 6 baseline |
Reuse run6_clean synth |
None |
FAILED (resizer) |
— |
— |
| 7h |
Run 6 baseline |
Reuse BP synth |
None |
COMPLETED |
+3.28 ns |
50 MHz |
Key Lessons Learned (Run 7 Series)
- LAYER_WRITE pipeline is not viable: Any register between col_idx and H_BASE causes either cell explosion (Yosys can't fold constants through registers) or PnR divergence (too many paths for resizer)
- Heuristic diode insertion always fails: Both
RUN_HEURISTIC_DIODE_INSERTION=true and DIODE_ON_PORTS=in cause GRT-0118 congestion
- Hold slack margins matter: 0.5/0.3 inserts 35K hold buffers → GRT failure. 0.4/0.2 inserts 22K → passes
- Yosys synthesis is non-deterministic: Re-synthesizing identical RTL+config produces different netlists with different PnR outcomes. The balanced_popcount synthesis netlist is the only one proven to complete
- Config must be consistent: Reusing synthesis from a run with different config settings causes PnR divergence
- Run 6's balanced_popcount synthesis netlist is the golden reference — all future PnR runs should reuse it
Next Steps
- Address antenna violations (1,687 nets) for tapeout — try
GRT_ANTENNA_ITERS with reused BP synthesis
- Fix hold violations via input delay constraints (all are input port paths)
- Consider relaxing SS target or adding pipeline stage to belief update mux for SS corner improvement
- Investigate making Yosys synthesis deterministic (fixed random seed, etc.) for reproducible builds