Files
ldpc_optical/docs/hardening-results.md
cah f22ee197ab docs(hardening): add wrapper attempt history through v8-v11 + LVS-fix lessons
Document the full wrapper hardening trail:
- Mar 12-13 wrapper_v2/v3/v4 results, mpw_precheck 17/19, and 5/5 GLS pass
- May 7-11 v6-v11 LVS-cosmetic-fix attempts (all seven failed)

The v6-v11 series tried to eliminate the 208 cosmetic LVS pin-match
errors via per-pin conb_1 tieoffs and placement tweaks. All failed
because the errors are a Magic SPICE-extraction limitation (constant-
tied output nets collapse into shared power/ground at extract time),
not a hardening defect. Documented so future sessions don't re-explore
this dead end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:13:11 -06:00

495 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# LDPC Decoder Hardening Results
## Run 1: `26_02_25_21_11` (Feb 25, 2026) — FAILED
- **RTL**: Original (unpipelined CN update)
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=true`, `HEURISTIC_ANTENNA_THRESHOLD=110`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Failure**: `GRT-0118` routing congestion after heuristic diode insertion (66,016 diodes added)
- **Notes**: Initial global routing passed (0 overflow, 39% routing utilization). Diode insertion nearly doubled cell count, causing re-routing congestion failure.
## Run 2: `reuse_synth` (Feb 27, 2026) — COMPLETED (timing violations)
- **RTL**: Original (unpipelined CN update) — reused synthesis netlist from Run 1
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Result**: All 70 steps completed. GDS generated. Deferred timing errors.
### Physical Results
| Metric | Result |
|--------|--------|
| Magic DRC | **Clean** |
| KLayout DRC | **Clean** |
| LVS | **Clean** (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) | **Clean** |
| Illegal overlap | **Clean** |
| Power grid violations | **0** |
| Antenna violating nets | 658 |
| Antenna violating pins | 905 |
### Area & Utilization
| Metric | Value |
|--------|-------|
| Die area | 4,928,000 µm² (4.93 mm²) |
| Core area | 4,846,670 µm² |
| Instance count | 184,663 |
| Instance area | 1,303,260 µm² (1.30 mm²) |
| Core utilization | 26.9% |
| Sequential cells | 16,967 |
| Combinational cells | 61,366 |
| Timing repair buffers | 23,709 |
| Fill cells | 415,149 |
| Tap cells | 69,228 |
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) | Setup Violations |
|--------|----------------|-----------------|----------------|----------------|------------------|
| nom_tt_025C_1v80 | **-27.13** | -234.9 | -0.32 | -3.76 | 9 |
| nom_ss_100C_1v60 | **-70.58** | -29,946.3 | 0.06 | 0 | 5,463 |
| nom_ff_n40C_1v95 | **-10.18** | -86.3 | -0.26 | -12.4 | — |
| **Worst across all** | **-71.40** | -34,329.1 | -0.47 | -26.4 | — |
### Estimated Max Frequency
- **TT corner**: Critical path ~47 ns → **~21 MHz**
- **SS corner**: Critical path ~91 ns → **~11 MHz**
- **FF corner**: Critical path ~30 ns → **~33 MHz**
### Power (TT corner)
| Component | Power (W) |
|-----------|-----------|
| Internal | 0.0554 |
| Switching | 0.0273 |
| Leakage | ~0.002 mW |
| **Total** | **0.0827** |
### Key Observations
1. Disabling heuristic diode insertion fixed the routing congestion failure from Run 1
2. 658 antenna violations remain — iterative antenna repair was not sufficient. May need to re-enable heuristic insertion with a higher threshold or use `DIODE_ON_PORTS`
3. Setup timing is severely violated — critical path is ~47 ns at TT, far from 20 ns target
4. This run used the **unpipelined** RTL (synthesis reused from Run 1 which predated the CN pipeline split)
5. Next run should re-synthesize with pipelined CN update RTL to see if timing improves
## Run 3: `pipelined_pnr` (Mar 1, 2026) — FAILED
- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Failure**: `GRT-0118` routing congestion during iterative antenna repair (step 36), after 13+ hours of repair loops
- **Notes**: Iterative antenna repair kept inserting diodes and re-routing until congestion became too high. Same root cause as Run 1 but via different mechanism.
## Run 3b: `pipelined_synth` (Feb 28, 2026) — STILL RUNNING
- **RTL**: Pipelined CN update
- **Config**: `SYNTH_STRATEGY=AREA 2` — synthesis only
- **Status**: ABC pass 2 (tech mapping) running 20+ hours. `AREA 2` is far too aggressive for this design size. **Do not use AREA 2 for this design.**
## Run 4: `pipelined_noantenna` (Mar 2, 2026) — COMPLETED (timing violations)
- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=false`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Result**: All 69 steps completed. GDS generated. Deferred timing errors. No antenna repair attempted.
### Physical Results
| Metric | Result |
|--------|--------|
| Magic DRC | **Clean** |
| KLayout DRC | **Clean** |
| LVS | **Clean** (0 errors, 0 unmatched) |
| XOR (Magic vs KLayout) | **Clean** |
| Illegal overlap | **Clean** |
| Antenna violating nets | 1,707 (no repair attempted) |
| Antenna violating pins | 3,319 (no repair attempted) |
### Area & Utilization
| Metric | Value |
|--------|-------|
| Die area | 4,928,000 µm² (4.93 mm²) |
| Instance count | 183,774 |
| Instance area | 1,351,790 µm² (1.35 mm²) |
| Core utilization | 27.9% |
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|--------|----------------|-----------------|----------------|----------------|
| nom_tt_025C_1v80 | **-28.86** | -348.0 | -0.08 | -0.15 |
| nom_ss_100C_1v60 | **-74.22** | -20,536.0 | -0.07 | -0.07 |
| nom_ff_n40C_1v95 | **-11.04** | -93.8 | -0.12 | -2.15 |
| min_tt_025C_1v80 | -28.39 | -251.0 | 0 | 0 |
| max_tt_025C_1v80 | -29.36 | -725.1 | -0.24 | -2.15 |
### Estimated Max Frequency
- **TT corner**: Critical path ~49 ns → **~20 MHz**
- **SS corner**: Critical path ~94 ns → **~11 MHz**
- **FF corner**: Critical path ~31 ns → **~32 MHz**
### Power (TT corner)
| Metric | Value |
|--------|-------|
| **Total** | **0.0858 W** |
### Key Observations
1. Pipelined CN update did NOT improve timing — TT WNS is -28.86 ns vs -27.13 ns (unpipelined Run 2). Slightly worse, possibly due to AREA 0 vs AREA 2 synth strategy difference.
2. Hold violations are much smaller than Run 2 (-0.08 vs -0.32 ns), nearly clean.
3. Antenna violations increased to 1,707 nets (vs 658 in Run 2) without any repair — AREA 0 produces a less antenna-friendly netlist.
4. The critical path is still ~47-49 ns, suggesting the bottleneck is NOT the CN update pipeline stage but something else (likely the large mux/barrel shifter or belief update logic).
5. `SYNTH_STRATEGY=AREA 2` takes 20+ hours for ABC tech mapping on this design — **never use it**. `AREA 0` completed in reasonable time.
## Summary Table
| Run | RTL | Synth | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|-----|-----|-------|---------|--------|-------------|---------------|
| 1 | Unpipelined | AREA 2 | Heuristic 110µm | **FAILED** (congestion) | — | — |
| 2 | Unpipelined | AREA 2 | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
| 3 | Pipelined | AREA 0 | Iterative | **FAILED** (congestion) | — | — |
| 3b | Pipelined | AREA 2 | — (synth only) | Still running (20+ hrs) | — | — |
| 4 | Pipelined | AREA 0 | None | **COMPLETED** | -28.86 ns | ~20 MHz |
## Critical Path Analysis (from Run 4, pipelined_noantenna)
### Path Summary
| Item | Value |
|------|-------|
| Startpoint | `u_core.beliefs[0][5]` (beliefs register, bit 5 of element 0) |
| Endpoint | `syndrome_weight[7]` (MSB of syndrome weight counter) |
| RTL location | `SYNDROME` state in `ldpc_decoder_core.sv`, lines 363-385 |
| Slack | **-28.859 ns** (VIOLATED) |
| Total combinational delay | **47.67 ns** |
| Logic levels | **222** (171 XOR/XNOR + 51 adder/mux) |
| Logic vs wire delay | 99.7% logic / 0.3% wire |
All 8 worst setup violators fan out from `beliefs[0][5]` to `syndrome_weight[7:0]`.
### What the Critical Path Computes
The `SYNDROME` state computes the full syndrome check in a **single clock cycle**:
1. **Parity computation** (171 XOR levels, 33.9 ns): XOR the sign bits of all beliefs connected to each check node — 7 rows x 32 z-elements x up to 3 columns = 224 parity bits, reading from 256 belief sign bits.
2. **Population count** (51 adder levels, 13.6 ns): Sum all 224 parity results into an 8-bit `syndrome_cnt`.
The `syndrome_cnt = syndrome_cnt + 1` accumulation pattern creates a carry chain dependency that serializes everything.
### Delay Breakdown
| Segment | Delay (ns) | Cells | Description |
|---------|-----------|-------|-------------|
| Source CLK-to-Q | 0.795 | 1 (dfxtp_4) | beliefs[0][5] register output |
| Parity XOR chain | 33.888 | 171 (xor2/xnor2) | XOR reduction across belief sign bits |
| Popcount adder tree | 13.634 | 51 (and/or/aoi/oai) | 224-bit popcount to 8-bit count |
| State MUX | 0.148 | 1 (mux2_1) | FSM output mux |
| Wire (interconnect) | 0.149 | — | 0.3% of total — negligible |
| **Total** | **48.614** | **222 levels** | |
### Proposed Fix: 2-3 Stage Syndrome Pipeline
**SYNDROME_S1** (cycle 1, ~16 ns): Compute all 224 parity bits in parallel. Each parity is only 2-3 XOR operations deep (one per connected column). Register the 224-bit `parity_vec`.
**SYNDROME_S2** (cycle 2, ~14 ns): Popcount the 224-bit parity vector via balanced adder tree. Register the 8-bit `syndrome_weight` and `syndrome_ok` flag.
**SYNDROME_DONE** (cycle 3): Already exists — reads `syndrome_ok`.
**Estimated post-fix critical path**: ~14-16 ns (comfortably under 20 ns / 50 MHz).
**Latency impact**: +1-2 cycles per iteration (negligible at 30 iterations).
### Secondary Violations
Wishbone address input (`wb_adr_i`) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.
## Run 5: `syndrome_pipeline` (Mar 3, 2026) — COMPLETED (timing violations)
- **RTL**: Pipelined CN + syndrome pipeline (SYNDROME_S1 + SYNDROME_S2 with serial popcount)
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_ANTENNA_REPAIR=false`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Result**: All 75 steps completed. DRC/LVS clean.
- **TT Setup WNS**: **-28.98 ns** — no improvement from Run 4
- **Root cause**: Yosys serializes `syndrome_cnt = syndrome_cnt + 1` loop-carried dependency into ~48 ns chain
- **Lesson**: Splitting parity + popcount into 2 cycles helps nothing if the popcount itself is still serial
## Run 6: `balanced_popcount` (Mar 4, 2026) — COMPLETED (TT timing MET!)
- **RTL**: Pipelined CN + syndrome pipeline with balanced 4-wide adder tree popcount
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_ANTENNA_REPAIR=false`
- **Die area**: 2800 x 1760 µm (4.93 mm²)
- **Result**: All 75 steps completed. DRC/LVS clean. **TT timing met!**
### Physical Results
| Metric | Result |
|--------|--------|
| Magic DRC | **Clean** |
| KLayout DRC | **Clean** |
| LVS | **Clean** (0 errors, 0 unmatched) |
| Antenna violating nets | 1,687 (no repair attempted) |
### Area & Utilization
| Metric | Value |
|--------|-------|
| Die area | 4,928,000 µm² (4.93 mm²) |
| Instance count | 186,915 |
| Instance area | 1,367,580 µm² (1.37 mm²) |
| Core utilization | 28.2% |
| Sequential cells | 18,056 |
| Timing repair buffers | 27,864 |
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|--------|----------------|-----------------|----------------|----------------|
| nom_tt_025C_1v80 | **0.0** | 0 | -0.45 | -10.5 |
| nom_ss_100C_1v60 | **-9.18** | -12,474.4 | -0.17 | -0.21 |
| nom_ff_n40C_1v95 | **0.0** | 0 | -0.37 | -38.6 |
| max_ss_100C_1v60 | -10.45 | -15,896.8 | -0.44 | -0.87 |
### Estimated Max Frequency
- **TT corner**: **50 MHz — TIMING MET**
- **SS corner**: Critical path ~40 ns → **~25 MHz** (up from ~11 MHz)
- **FF corner**: **50 MHz — TIMING MET**
### New Critical Path (SS corner)
| Item | Value |
|------|-------|
| Startpoint | `u_core.col_idx[0]` (column index register) |
| Endpoint | `u_core.beliefs` registers |
| Slack | -9.18 ns (nom_ss) |
| Data arrival time | 40.15 ns |
| Description | Belief update mux path during LAYER_READ/LAYER_WRITE |
The syndrome path is NO LONGER critical. The new bottleneck is the column-indexed mux/barrel-shifter path used during belief reads and writes.
### Key Observations
1. **Balanced popcount tree eliminated the syndrome bottleneck** — WNS improved from -28.98 ns to 0.0 ns at TT
2. TT and FF corners now fully meet 50 MHz timing
3. SS corner still fails (-9.18 ns) due to a different path: belief update mux indexed by col_idx
4. Hold violations are minor (-0.45 ns) and can be fixed with post-route optimization
5. 1,687 antenna violations need to be addressed (antenna repair was disabled)
## Updated Summary Table
| Run | RTL | Key Change | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|-----|-----|------------|---------|--------|-------------|---------------|
| 1 | Unpipelined | — | Heuristic | **FAILED** | — | — |
| 2 | Unpipelined | — | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
| 3 | Pipelined CN | CN pipeline | Iterative | **FAILED** | — | — |
| 4 | Pipelined CN | CN pipeline | None | **COMPLETED** | -28.86 ns | ~20 MHz |
| 5 | + Syndrome pipeline | Serial popcount | None | **COMPLETED** | -28.98 ns | ~20 MHz |
| 6 | + Balanced popcount | Adder tree | None | **COMPLETED** | **0.0 ns** | **50 MHz** |
## Run 7a: `pipelined_layer2` (Mar 9, 2026) — FAILED
- **RTL**: Run 6 + LAYER_WRITE split into LAYER_WRITE_ADDR + LAYER_WRITE_DATA
- **Config**: `CLOCK_PERIOD=20`, `DIODE_ON_PORTS=in`, `HEURISTIC_ANTENNA_THRESHOLD=200`
- **Failure**: `GRT-0118` routing congestion — heuristic diode insertion on input ports added too many cells
- **Lesson**: Any heuristic diode insertion causes GRT failure on this design
## Run 7b: `pipelined_layer3` (Mar 9, 2026) — FAILED
- **RTL**: Same as 7a (LAYER_WRITE_ADDR/DATA split)
- **Config**: `DIODE_ON_PORTS=none`, `RUN_HEURISTIC_DIODE_INSERTION=false`
- **Failure**: Post-CTS resizer diverged — 2.5+ hours at 100% CPU, memory climbing linearly, never converging
- **Lesson**: LAYER_WRITE pipeline split creates too many paths for OpenROAD resizer
## Run 7c: `pre_shift` (Mar 9, 2026) — FAILED
- **RTL**: Run 6 + pre-registered H_BASE shift lookahead (`H_BASE[row_idx][col_idx+1]`)
- **Config**: Same as 7b
- **Failure**: `GPL-0302` placement density overflow — 150K cells at 41.3% exceeded 40% target
- **Root cause**: Yosys cannot fold H_BASE constants through registers → full 256:1 write mux explosion (~2x cell count vs Run 6's 83K)
- **Lesson**: Registering H_BASE shift values prevents Yosys constant folding
## Run 7d: `run6_baseline` (Mar 9, 2026) — FAILED
- **RTL**: Reverted to Run 6 baseline (identical RTL)
- **Config**: `DIODE_ON_PORTS=in` (inadvertently left from earlier runs), `RUN_HEURISTIC_DIODE_INSERTION=false`
- **Cells**: 85,500
- **Failure**: `GRT-0118` routing congestion
- **Root cause**: `DIODE_ON_PORTS=in` inserts diodes on input ports even when heuristic insertion is disabled
## Run 7e: `run6b_nodiode` (Mar 10, 2026) — FAILED
- **RTL**: Run 6 baseline
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.5/0.3 (from config.json), reused `run6_baseline` synthesis
- **Failure**: Post-CTS resizer diverged (9+ GiB memory, 3+ hours, never converged)
- **Root cause**: Reusing synthesis from a run with different config (`DIODE_ON_PORTS=in`) produces a subtly different netlist that causes PnR divergence
## Run 7f: `run6_clean` (Mar 10, 2026) — FAILED
- **RTL**: Run 6 baseline, clean full run from scratch
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.5/0.3
- **Cells**: 85,500
- **Hold buffers inserted**: 35,506
- **Failure**: `GRT-0118` routing congestion
- **Root cause**: Higher hold slack margins (0.5/0.3 vs balanced_popcount's 0.4/0.2) caused 13K extra hold buffers (35K vs 22K), pushing routing congestion over GRT threshold
## Run 7g: `run6_fixhold` (Mar 10, 2026) — FAILED
- **RTL**: Run 6 baseline, reused `run6_clean` synthesis
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.4/0.2 (matching balanced_popcount)
- **Failure**: Post-CTS resizer diverged (14+ GiB, 3.5+ hours)
- **Root cause**: Yosys non-determinism — `run6_clean` synthesis produced a slightly different cell mix that didn't route cleanly despite identical config
## Run 7h: `run6_reuse_bp` (Mar 10, 2026) — COMPLETED (reproduces Run 6!)
- **RTL**: Run 6 baseline, **reused balanced_popcount's actual synthesis netlist**
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.4/0.2
- **Result**: All stages completed. DRC/LVS clean. TT timing met!
- **Hold buffers**: 22,095 (identical to balanced_popcount)
### Physical Results
| Metric | Result |
|--------|--------|
| Magic DRC | **Clean** |
| KLayout DRC | **Clean** |
| LVS | **Clean** (circuits match uniquely) |
| Antenna violating nets | 1,687 (repair disabled) |
| Antenna violating pins | 3,416 (repair disabled) |
### Area & Utilization
| Metric | Value |
|--------|-------|
| Die area | 4,928,000 µm² (4.93 mm²) |
| Instance count | 186,915 |
| Instance area | 1,367,580 µm² (1.37 mm²) |
| Core utilization | 28.2% |
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|--------|----------------|-----------------|----------------|----------------|
| nom_tt_025C_1v80 | **+3.28** | 0 | -0.45 | -10.5 |
| nom_ss_100C_1v60 | **-9.18** | -12,474 | -0.17 | -0.21 |
| nom_ff_n40C_1v95 | **+5.93** | 0 | -0.37 | -38.6 |
| max_ss_100C_1v60 | -10.45 | -15,897 | -0.44 | -0.87 |
| min_tt_025C_1v80 | +3.71 | 0 | -0.26 | -1.66 |
| max_tt_025C_1v80 | +2.90 | 0 | -0.62 | -29.5 |
### Key Observations
1. **Results identical to Run 6** — confirms that the balanced_popcount synthesis netlist is the key ingredient
2. Yosys non-determinism is significant: re-synthesizing the same RTL with same config produces netlists that fail PnR
3. Hold violations (1,543 total) are all on input port paths (`wb_dat_i`, `wb_adr_i`), zero reg-to-reg — fixable with input delay constraints
4. Max slew violations (4,112) and max cap violations (655) concentrated in SS corner
## Updated Summary Table
| Run | RTL | Key Change | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|-----|-----|------------|---------|--------|-------------|---------------|
| 1 | Unpipelined | — | Heuristic | **FAILED** | — | — |
| 2 | Unpipelined | — | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
| 3 | Pipelined CN | CN pipeline | Iterative | **FAILED** | — | — |
| 4 | Pipelined CN | CN pipeline | None | **COMPLETED** | -28.86 ns | ~20 MHz |
| 5 | + Syndrome pipeline | Serial popcount | None | **COMPLETED** | -28.98 ns | ~20 MHz |
| 6 | + Balanced popcount | Adder tree | None | **COMPLETED** | **0.0 ns** | **50 MHz** |
| 7a | + LAYER_WRITE split | ADDR/DATA pipeline | Heuristic | **FAILED** | — | — |
| 7b | + LAYER_WRITE split | ADDR/DATA pipeline | None | **FAILED** (resizer) | — | — |
| 7c | + pre_shift | H_BASE lookahead | None | **FAILED** (GPL) | — | — |
| 7d | Run 6 baseline | DIODE_ON_PORTS=in | None | **FAILED** (GRT) | — | — |
| 7e | Run 6 baseline | Reuse wrong synth | None | **FAILED** (resizer) | — | — |
| 7f | Run 6 baseline | Hold margins 0.5/0.3 | None | **FAILED** (GRT) | — | — |
| 7g | Run 6 baseline | Reuse run6_clean synth | None | **FAILED** (resizer) | — | — |
| 7h | Run 6 baseline | **Reuse BP synth** | None | **COMPLETED** | **+3.28 ns** | **50 MHz** |
## Key Lessons Learned (Run 7 Series)
1. **LAYER_WRITE pipeline is not viable**: Any register between col_idx and H_BASE causes either cell explosion (Yosys can't fold constants through registers) or PnR divergence (too many paths for resizer)
2. **Heuristic diode insertion always fails**: Both `RUN_HEURISTIC_DIODE_INSERTION=true` and `DIODE_ON_PORTS=in` cause GRT-0118 congestion
3. **Hold slack margins matter**: 0.5/0.3 inserts 35K hold buffers → GRT failure. 0.4/0.2 inserts 22K → passes
4. **Yosys synthesis is non-deterministic**: Re-synthesizing identical RTL+config produces different netlists with different PnR outcomes. The balanced_popcount synthesis netlist is the only one proven to complete
5. **Config must be consistent**: Reusing synthesis from a run with different config settings causes PnR divergence
6. **Run 6's balanced_popcount synthesis netlist is the golden reference** — all future PnR runs should reuse it
## Wrapper Hardening (Mar 12-13, 2026)
### wrapper_v2 — COMPLETED (LVS fail)
- **Config**: `SYNTH_ELABORATE_ONLY=true`, `FP_PDN_ENABLE_RAILS=false`
- **Result**: DRC clean, but LVS fails — 3 standard cells (inv_2 + 2x conb_1) have floating VPWR/VGND
- **Root cause**: Without power rails, wrapper std cells have no power connection
### wrapper_v3 — ABORTED (208 LVS pin-match errors)
- **Config**: `SYNTH_ELABORATE_ONLY=true`, `FP_PDN_ENABLE_RAILS=true`, `ERROR_ON_LVS_ERROR=true`
- **Result**: DRC clean, XOR clean, power pins connected. Flow aborted at LVS check.
- **LVS issue**: 206 constant-tied output pins merged during Magic SPICE extraction
### wrapper_v4 — COMPLETED (golden wrapper)
- **Config**: Same as v3 but `ERROR_ON_LVS_ERROR=false`
- **Result**: All 69 stages completed. DRC clean (Magic + KLayout). XOR clean.
- **LVS**: 208 pin-match errors (cosmetic — device classes equivalent)
- **Pin merging**: Magic SPICE extraction merges io_oeb[37:0], io_out[37:0], la_data_out[127:0], user_irq[2:1] into shared constant nets, losing individual pin labels
## Precheck Results (Mar 13, 2026)
| # | Check | Result |
|---|-------|--------|
| 1 | License | PASSED (SPDX sub-check: 1727 non-compliant venv files) |
| 2 | Makefile | **PASSED** |
| 3 | Default | **PASSED** |
| 4 | Documentation | **PASSED** |
| 5 | Top Cell | **PASSED** |
| 6 | Consistency | **PASSED** |
| 7 | GPIO-Defines | **PASSED** |
| 8 | XOR | **PASSED** |
| 9 | Magic DRC | **PASSED** |
| 10 | KLayout FEOL | FAILED (SIGSEGV crash, NOT real DRC) |
| 11 | KLayout BEOL | **PASSED** |
| 12 | KLayout Offgrid | **PASSED** |
| 13 | KLayout Metal Density | **PASSED** |
| 14 | KLayout Pin Labels | **PASSED** |
| 15 | KLayout ZeroArea | **PASSED** |
| 16 | Spike Check | **PASSED** |
| 17 | Illegal Cellname | **PASSED** |
| 18 | OEB | **PASSED** |
| 19 | LVS | FAILED (3 cosmetic pin mismatches) |
**17 PASSED, 2 FAILED.** Both failures are non-functional:
- KLayout FEOL: Tool crash (signal 11), not a DRC violation
- LVS: "Top level cell failed pin matching" — 3 cosmetic mismatches:
- `io_oeb[9]` in layout only (Magic kept 1 label for merged constant net)
- `user_irq[2]` in layout only (same issue)
- `vssd2` in netlist only (PDN power net not labeled as port)
- CVC: 0 errors. Device classes: equivalent.
## Gate-Level Simulation Results (Mar 13, 2026)
All 5 cocotb tests passed in GL mode (iverilog + caravel_cocotb, no SDF annotation):
| Test | Status | Sim Time (ns) | Wall Time (s) | GPIO[7:0] | Errors |
|------|--------|---------------|----------------|-----------|--------|
| ldpc_basic | **PASS** | 854,225 | 1,814 | 0xAB | 0 |
| ldpc_noisy | **PASS** | 1,011,550 | 2,720 | 0xAB | 0 |
| ldpc_max_iter | **PASS** | 1,104,525 | 3,393 | 0xAB | 0 |
| ldpc_back_to_back | **PASS** | 1,140,375 | 3,371 | 0xAB | 0 |
| ldpc_demo | **PASS** | 1,251,050 | 3,612 | 0xAB | 0 |
- iverilog compilation: ~2h18m per test (1.1GB sim.vvp), 8.2GB RAM
- Simulation: ~30-60 min per test (5-9GB VCD waveform)
- All tests ran on snoke (247GB RAM), 4 tests in parallel
- GPIO[7:0] = 0xAB is the firmware success code for all tests
- No X-propagation or timing race issues observed
## Wrapper Hardening Attempts (May 7-11, 2026) — Failed LVS Cosmetic-Fix Series
After the May 1 `cf_wrapper_v5` golden run landed (commit `74ad20a` to origin / `1fcdc1d` to gitea) with 208 cosmetic LVS pin-match errors, a series of seven follow-up runs tried to eliminate those errors. **All seven failed.** The errors are a Magic SPICE-extraction limitation, not a hardening defect — no amount of RTL/placement tweaking will change Magic's behavior.
### Timeline
| Run | Date | Strategy | Result |
|-----|------|----------|--------|
| v6 | May 7 | First post-PDN-swap retry (commit `8cc8414` landed config changes); same wrapper RTL | Flow completed but KLayout crashed in final manufacturability step; same 208 LVS errors |
| v7 | May 7 | Same as v6, re-run | Aborted mid-routing on `[DRT-0349]` LEF58_ENCLOSURE warnings — routing never completed |
| v8 | May 8 | `manual_tieoffs.vh` with 206 per-pin `conb_1` cells + `manual_placements.json` placing each cell adjacent to its target pin; mprj moved `[60,15] → [60,200]` to make room | Flow completed; **same 208 LVS errors** — Magic still merged all constant-tied outputs. STA failed on `min_ss_100C_1v60` and `nom_tt_025C_1v80` corners |
| v9 | May 9 | Same as v8 with `ERROR_ON_TR_DRC=false` to push through routing | **1780 routing DRC errors** (deferred). Magic streamout completed but DRC was never clean |
| v10 | May 11 | Same family of placement tweaks | **1362 routing DRC errors** (deferred); same failure mode as v9 |
| v11 | May 11 | One more attempt | Interrupted at step 01 (yosys-jsonheader); no harden process running |
### Why every attempt failed
The 208 LVS errors all come from **Magic SPICE extraction collapsing constant-tied nets**:
- `la_data_out[127:0]` — all 128 bits tied to `1'b0` → Magic extracts as a single GND net → 127 pin labels lost (only one kept arbitrarily, often none)
- `io_out[37:0]` — all 38 bits tied to `1'b0` → same merge
- `io_oeb[37:0]` — all 38 bits tied to `1'b1` → merged into VDD net (Magic keeps the label for `io_oeb[9]` for unknown reasons)
- `user_irq[2:1]` — tied to `2'b0` → merged into GND
The v8 attempt — putting each pin behind its own `sky130_fd_sc_hd__conb_1` cell — does not break the merge because Magic's extractor still resolves each `conb_1` output as the constant `VPWR` or `VGND` and collapses them onto the global power/ground nets at the extracted-SPICE level. Per-pin cells generate distinct logical nets in the Verilog netlist but not distinct extracted nets in the layout. **Netgen itself reports "Device classes equivalent" and "Cell pin lists altered to match"** — the failure is bookkeeping, not electrical.
### Approaches proven non-viable (don't try again)
1. **Per-pin `conb_1` cells in the wrapper Verilog** — v8 disproved this. Magic optimizes them onto the constant nets.
2. **Per-pin manual placement of tieoff cells** — placement doesn't change extraction behavior.
3. **mprj location shifts** to make room for tieoff rows — doesn't help; cosmetic LVS persists.
4. **Pushing routing-DRC tolerance up** (v9, v10) — produces broken layouts (13001800 routing DRC errors), worse than starting state.
### Approaches that *could* work but were not attempted (deferred — too risky pre-deadline)
1. **Drive 206 dummy zero outputs from inside `ldpc_decoder_top`** — would force each wrapper output to come from a distinct extracted macro pin instead of a constant-tied wrapper net. Requires a fresh macro re-harden, which risks breaking Run 6's golden timing on a non-deterministic Yosys run. 46 hour cost, high regression risk.
2. **Post-extraction `.mag` editing** to add per-pin port labels — brittle and tool-specific; would not survive a re-harden.
3. **Formal LVS waiver** (the chosen May 12 path) — document the cosmetic nature of the errors, cite netgen's own "Device classes equivalent" line, and submit alongside the submission packet.
### Key lesson
**The 208 LVS pin-match errors are not fixable with wrapper-only hardening.** Magic SPICE-extraction behavior is the root cause. Future sessions should not re-litigate this — either fix it inside the macro (re-harden risk) or formally waive it.
## Next Steps
- Submit with a formal LVS waiver (see `chip_ignite/docs/LVS_WAIVER.md`)
- Confirm `cf precheck` and `cf verify ldpc_basic --sim gl` still pass on the HEAD wrapper state
- `cf push` before 2026-05-13 deadline