Document the full wrapper hardening trail: - Mar 12-13 wrapper_v2/v3/v4 results, mpw_precheck 17/19, and 5/5 GLS pass - May 7-11 v6-v11 LVS-cosmetic-fix attempts (all seven failed) The v6-v11 series tried to eliminate the 208 cosmetic LVS pin-match errors via per-pin conb_1 tieoffs and placement tweaks. All failed because the errors are a Magic SPICE-extraction limitation (constant- tied output nets collapse into shared power/ground at extract time), not a hardening defect. Documented so future sessions don't re-explore this dead end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
495 lines
27 KiB
Markdown
495 lines
27 KiB
Markdown
# LDPC Decoder Hardening Results
|
||
|
||
## Run 1: `26_02_25_21_11` (Feb 25, 2026) — FAILED
|
||
- **RTL**: Original (unpipelined CN update)
|
||
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=true`, `HEURISTIC_ANTENNA_THRESHOLD=110`
|
||
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||
- **Failure**: `GRT-0118` routing congestion after heuristic diode insertion (66,016 diodes added)
|
||
- **Notes**: Initial global routing passed (0 overflow, 39% routing utilization). Diode insertion nearly doubled cell count, causing re-routing congestion failure.
|
||
|
||
## Run 2: `reuse_synth` (Feb 27, 2026) — COMPLETED (timing violations)
|
||
- **RTL**: Original (unpipelined CN update) — reused synthesis netlist from Run 1
|
||
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true`
|
||
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||
- **Result**: All 70 steps completed. GDS generated. Deferred timing errors.
|
||
|
||
### Physical Results
|
||
| Metric | Result |
|
||
|--------|--------|
|
||
| Magic DRC | **Clean** |
|
||
| KLayout DRC | **Clean** |
|
||
| LVS | **Clean** (0 errors, 0 unmatched) |
|
||
| XOR (Magic vs KLayout) | **Clean** |
|
||
| Illegal overlap | **Clean** |
|
||
| Power grid violations | **0** |
|
||
| Antenna violating nets | 658 |
|
||
| Antenna violating pins | 905 |
|
||
|
||
### Area & Utilization
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Die area | 4,928,000 µm² (4.93 mm²) |
|
||
| Core area | 4,846,670 µm² |
|
||
| Instance count | 184,663 |
|
||
| Instance area | 1,303,260 µm² (1.30 mm²) |
|
||
| Core utilization | 26.9% |
|
||
| Sequential cells | 16,967 |
|
||
| Combinational cells | 61,366 |
|
||
| Timing repair buffers | 23,709 |
|
||
| Fill cells | 415,149 |
|
||
| Tap cells | 69,228 |
|
||
|
||
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
|
||
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) | Setup Violations |
|
||
|--------|----------------|-----------------|----------------|----------------|------------------|
|
||
| nom_tt_025C_1v80 | **-27.13** | -234.9 | -0.32 | -3.76 | 9 |
|
||
| nom_ss_100C_1v60 | **-70.58** | -29,946.3 | 0.06 | 0 | 5,463 |
|
||
| nom_ff_n40C_1v95 | **-10.18** | -86.3 | -0.26 | -12.4 | — |
|
||
| **Worst across all** | **-71.40** | -34,329.1 | -0.47 | -26.4 | — |
|
||
|
||
### Estimated Max Frequency
|
||
- **TT corner**: Critical path ~47 ns → **~21 MHz**
|
||
- **SS corner**: Critical path ~91 ns → **~11 MHz**
|
||
- **FF corner**: Critical path ~30 ns → **~33 MHz**
|
||
|
||
### Power (TT corner)
|
||
| Component | Power (W) |
|
||
|-----------|-----------|
|
||
| Internal | 0.0554 |
|
||
| Switching | 0.0273 |
|
||
| Leakage | ~0.002 mW |
|
||
| **Total** | **0.0827** |
|
||
|
||
### Key Observations
|
||
1. Disabling heuristic diode insertion fixed the routing congestion failure from Run 1
|
||
2. 658 antenna violations remain — iterative antenna repair was not sufficient. May need to re-enable heuristic insertion with a higher threshold or use `DIODE_ON_PORTS`
|
||
3. Setup timing is severely violated — critical path is ~47 ns at TT, far from 20 ns target
|
||
4. This run used the **unpipelined** RTL (synthesis reused from Run 1 which predated the CN pipeline split)
|
||
5. Next run should re-synthesize with pipelined CN update RTL to see if timing improves
|
||
|
||
## Run 3: `pipelined_pnr` (Mar 1, 2026) — FAILED
|
||
- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
|
||
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=true`
|
||
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||
- **Failure**: `GRT-0118` routing congestion during iterative antenna repair (step 36), after 13+ hours of repair loops
|
||
- **Notes**: Iterative antenna repair kept inserting diodes and re-routing until congestion became too high. Same root cause as Run 1 but via different mechanism.
|
||
|
||
## Run 3b: `pipelined_synth` (Feb 28, 2026) — STILL RUNNING
|
||
- **RTL**: Pipelined CN update
|
||
- **Config**: `SYNTH_STRATEGY=AREA 2` — synthesis only
|
||
- **Status**: ABC pass 2 (tech mapping) running 20+ hours. `AREA 2` is far too aggressive for this design size. **Do not use AREA 2 for this design.**
|
||
|
||
## Run 4: `pipelined_noantenna` (Mar 2, 2026) — COMPLETED (timing violations)
|
||
- **RTL**: Pipelined CN update (CN_STAGE1 + CN_STAGE2)
|
||
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_HEURISTIC_DIODE_INSERTION=false`, `RUN_ANTENNA_REPAIR=false`
|
||
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||
- **Result**: All 69 steps completed. GDS generated. Deferred timing errors. No antenna repair attempted.
|
||
|
||
### Physical Results
|
||
| Metric | Result |
|
||
|--------|--------|
|
||
| Magic DRC | **Clean** |
|
||
| KLayout DRC | **Clean** |
|
||
| LVS | **Clean** (0 errors, 0 unmatched) |
|
||
| XOR (Magic vs KLayout) | **Clean** |
|
||
| Illegal overlap | **Clean** |
|
||
| Antenna violating nets | 1,707 (no repair attempted) |
|
||
| Antenna violating pins | 3,319 (no repair attempted) |
|
||
|
||
### Area & Utilization
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Die area | 4,928,000 µm² (4.93 mm²) |
|
||
| Instance count | 183,774 |
|
||
| Instance area | 1,351,790 µm² (1.35 mm²) |
|
||
| Core utilization | 27.9% |
|
||
|
||
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
|
||
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|
||
|--------|----------------|-----------------|----------------|----------------|
|
||
| nom_tt_025C_1v80 | **-28.86** | -348.0 | -0.08 | -0.15 |
|
||
| nom_ss_100C_1v60 | **-74.22** | -20,536.0 | -0.07 | -0.07 |
|
||
| nom_ff_n40C_1v95 | **-11.04** | -93.8 | -0.12 | -2.15 |
|
||
| min_tt_025C_1v80 | -28.39 | -251.0 | 0 | 0 |
|
||
| max_tt_025C_1v80 | -29.36 | -725.1 | -0.24 | -2.15 |
|
||
|
||
### Estimated Max Frequency
|
||
- **TT corner**: Critical path ~49 ns → **~20 MHz**
|
||
- **SS corner**: Critical path ~94 ns → **~11 MHz**
|
||
- **FF corner**: Critical path ~31 ns → **~32 MHz**
|
||
|
||
### Power (TT corner)
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| **Total** | **0.0858 W** |
|
||
|
||
### Key Observations
|
||
1. Pipelined CN update did NOT improve timing — TT WNS is -28.86 ns vs -27.13 ns (unpipelined Run 2). Slightly worse, possibly due to AREA 0 vs AREA 2 synth strategy difference.
|
||
2. Hold violations are much smaller than Run 2 (-0.08 vs -0.32 ns), nearly clean.
|
||
3. Antenna violations increased to 1,707 nets (vs 658 in Run 2) without any repair — AREA 0 produces a less antenna-friendly netlist.
|
||
4. The critical path is still ~47-49 ns, suggesting the bottleneck is NOT the CN update pipeline stage but something else (likely the large mux/barrel shifter or belief update logic).
|
||
5. `SYNTH_STRATEGY=AREA 2` takes 20+ hours for ABC tech mapping on this design — **never use it**. `AREA 0` completed in reasonable time.
|
||
|
||
## Summary Table
|
||
|
||
| Run | RTL | Synth | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|
||
|-----|-----|-------|---------|--------|-------------|---------------|
|
||
| 1 | Unpipelined | AREA 2 | Heuristic 110µm | **FAILED** (congestion) | — | — |
|
||
| 2 | Unpipelined | AREA 2 | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
|
||
| 3 | Pipelined | AREA 0 | Iterative | **FAILED** (congestion) | — | — |
|
||
| 3b | Pipelined | AREA 2 | — (synth only) | Still running (20+ hrs) | — | — |
|
||
| 4 | Pipelined | AREA 0 | None | **COMPLETED** | -28.86 ns | ~20 MHz |
|
||
|
||
## Critical Path Analysis (from Run 4, pipelined_noantenna)
|
||
|
||
### Path Summary
|
||
| Item | Value |
|
||
|------|-------|
|
||
| Startpoint | `u_core.beliefs[0][5]` (beliefs register, bit 5 of element 0) |
|
||
| Endpoint | `syndrome_weight[7]` (MSB of syndrome weight counter) |
|
||
| RTL location | `SYNDROME` state in `ldpc_decoder_core.sv`, lines 363-385 |
|
||
| Slack | **-28.859 ns** (VIOLATED) |
|
||
| Total combinational delay | **47.67 ns** |
|
||
| Logic levels | **222** (171 XOR/XNOR + 51 adder/mux) |
|
||
| Logic vs wire delay | 99.7% logic / 0.3% wire |
|
||
|
||
All 8 worst setup violators fan out from `beliefs[0][5]` to `syndrome_weight[7:0]`.
|
||
|
||
### What the Critical Path Computes
|
||
|
||
The `SYNDROME` state computes the full syndrome check in a **single clock cycle**:
|
||
|
||
1. **Parity computation** (171 XOR levels, 33.9 ns): XOR the sign bits of all beliefs connected to each check node — 7 rows x 32 z-elements x up to 3 columns = 224 parity bits, reading from 256 belief sign bits.
|
||
2. **Population count** (51 adder levels, 13.6 ns): Sum all 224 parity results into an 8-bit `syndrome_cnt`.
|
||
|
||
The `syndrome_cnt = syndrome_cnt + 1` accumulation pattern creates a carry chain dependency that serializes everything.
|
||
|
||
### Delay Breakdown
|
||
| Segment | Delay (ns) | Cells | Description |
|
||
|---------|-----------|-------|-------------|
|
||
| Source CLK-to-Q | 0.795 | 1 (dfxtp_4) | beliefs[0][5] register output |
|
||
| Parity XOR chain | 33.888 | 171 (xor2/xnor2) | XOR reduction across belief sign bits |
|
||
| Popcount adder tree | 13.634 | 51 (and/or/aoi/oai) | 224-bit popcount to 8-bit count |
|
||
| State MUX | 0.148 | 1 (mux2_1) | FSM output mux |
|
||
| Wire (interconnect) | 0.149 | — | 0.3% of total — negligible |
|
||
| **Total** | **48.614** | **222 levels** | |
|
||
|
||
### Proposed Fix: 2-3 Stage Syndrome Pipeline
|
||
|
||
**SYNDROME_S1** (cycle 1, ~16 ns): Compute all 224 parity bits in parallel. Each parity is only 2-3 XOR operations deep (one per connected column). Register the 224-bit `parity_vec`.
|
||
|
||
**SYNDROME_S2** (cycle 2, ~14 ns): Popcount the 224-bit parity vector via balanced adder tree. Register the 8-bit `syndrome_weight` and `syndrome_ok` flag.
|
||
|
||
**SYNDROME_DONE** (cycle 3): Already exists — reads `syndrome_ok`.
|
||
|
||
**Estimated post-fix critical path**: ~14-16 ns (comfortably under 20 ns / 50 MHz).
|
||
**Latency impact**: +1-2 cycles per iteration (negligible at 30 iterations).
|
||
|
||
### Secondary Violations
|
||
|
||
Wishbone address input (`wb_adr_i`) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.
|
||
|
||
## Run 5: `syndrome_pipeline` (Mar 3, 2026) — COMPLETED (timing violations)
|
||
- **RTL**: Pipelined CN + syndrome pipeline (SYNDROME_S1 + SYNDROME_S2 with serial popcount)
|
||
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_ANTENNA_REPAIR=false`
|
||
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||
- **Result**: All 75 steps completed. DRC/LVS clean.
|
||
- **TT Setup WNS**: **-28.98 ns** — no improvement from Run 4
|
||
- **Root cause**: Yosys serializes `syndrome_cnt = syndrome_cnt + 1` loop-carried dependency into ~48 ns chain
|
||
- **Lesson**: Splitting parity + popcount into 2 cycles helps nothing if the popcount itself is still serial
|
||
|
||
## Run 6: `balanced_popcount` (Mar 4, 2026) — COMPLETED (TT timing MET!)
|
||
- **RTL**: Pipelined CN + syndrome pipeline with balanced 4-wide adder tree popcount
|
||
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_ANTENNA_REPAIR=false`
|
||
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||
- **Result**: All 75 steps completed. DRC/LVS clean. **TT timing met!**
|
||
|
||
### Physical Results
|
||
| Metric | Result |
|
||
|--------|--------|
|
||
| Magic DRC | **Clean** |
|
||
| KLayout DRC | **Clean** |
|
||
| LVS | **Clean** (0 errors, 0 unmatched) |
|
||
| Antenna violating nets | 1,687 (no repair attempted) |
|
||
|
||
### Area & Utilization
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Die area | 4,928,000 µm² (4.93 mm²) |
|
||
| Instance count | 186,915 |
|
||
| Instance area | 1,367,580 µm² (1.37 mm²) |
|
||
| Core utilization | 28.2% |
|
||
| Sequential cells | 18,056 |
|
||
| Timing repair buffers | 27,864 |
|
||
|
||
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
|
||
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|
||
|--------|----------------|-----------------|----------------|----------------|
|
||
| nom_tt_025C_1v80 | **0.0** | 0 | -0.45 | -10.5 |
|
||
| nom_ss_100C_1v60 | **-9.18** | -12,474.4 | -0.17 | -0.21 |
|
||
| nom_ff_n40C_1v95 | **0.0** | 0 | -0.37 | -38.6 |
|
||
| max_ss_100C_1v60 | -10.45 | -15,896.8 | -0.44 | -0.87 |
|
||
|
||
### Estimated Max Frequency
|
||
- **TT corner**: **50 MHz — TIMING MET**
|
||
- **SS corner**: Critical path ~40 ns → **~25 MHz** (up from ~11 MHz)
|
||
- **FF corner**: **50 MHz — TIMING MET**
|
||
|
||
### New Critical Path (SS corner)
|
||
| Item | Value |
|
||
|------|-------|
|
||
| Startpoint | `u_core.col_idx[0]` (column index register) |
|
||
| Endpoint | `u_core.beliefs` registers |
|
||
| Slack | -9.18 ns (nom_ss) |
|
||
| Data arrival time | 40.15 ns |
|
||
| Description | Belief update mux path during LAYER_READ/LAYER_WRITE |
|
||
|
||
The syndrome path is NO LONGER critical. The new bottleneck is the column-indexed mux/barrel-shifter path used during belief reads and writes.
|
||
|
||
### Key Observations
|
||
1. **Balanced popcount tree eliminated the syndrome bottleneck** — WNS improved from -28.98 ns to 0.0 ns at TT
|
||
2. TT and FF corners now fully meet 50 MHz timing
|
||
3. SS corner still fails (-9.18 ns) due to a different path: belief update mux indexed by col_idx
|
||
4. Hold violations are minor (-0.45 ns) and can be fixed with post-route optimization
|
||
5. 1,687 antenna violations need to be addressed (antenna repair was disabled)
|
||
|
||
## Updated Summary Table
|
||
|
||
| Run | RTL | Key Change | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|
||
|-----|-----|------------|---------|--------|-------------|---------------|
|
||
| 1 | Unpipelined | — | Heuristic | **FAILED** | — | — |
|
||
| 2 | Unpipelined | — | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
|
||
| 3 | Pipelined CN | CN pipeline | Iterative | **FAILED** | — | — |
|
||
| 4 | Pipelined CN | CN pipeline | None | **COMPLETED** | -28.86 ns | ~20 MHz |
|
||
| 5 | + Syndrome pipeline | Serial popcount | None | **COMPLETED** | -28.98 ns | ~20 MHz |
|
||
| 6 | + Balanced popcount | Adder tree | None | **COMPLETED** | **0.0 ns** | **50 MHz** |
|
||
|
||
## Run 7a: `pipelined_layer2` (Mar 9, 2026) — FAILED
|
||
- **RTL**: Run 6 + LAYER_WRITE split into LAYER_WRITE_ADDR + LAYER_WRITE_DATA
|
||
- **Config**: `CLOCK_PERIOD=20`, `DIODE_ON_PORTS=in`, `HEURISTIC_ANTENNA_THRESHOLD=200`
|
||
- **Failure**: `GRT-0118` routing congestion — heuristic diode insertion on input ports added too many cells
|
||
- **Lesson**: Any heuristic diode insertion causes GRT failure on this design
|
||
|
||
## Run 7b: `pipelined_layer3` (Mar 9, 2026) — FAILED
|
||
- **RTL**: Same as 7a (LAYER_WRITE_ADDR/DATA split)
|
||
- **Config**: `DIODE_ON_PORTS=none`, `RUN_HEURISTIC_DIODE_INSERTION=false`
|
||
- **Failure**: Post-CTS resizer diverged — 2.5+ hours at 100% CPU, memory climbing linearly, never converging
|
||
- **Lesson**: LAYER_WRITE pipeline split creates too many paths for OpenROAD resizer
|
||
|
||
## Run 7c: `pre_shift` (Mar 9, 2026) — FAILED
|
||
- **RTL**: Run 6 + pre-registered H_BASE shift lookahead (`H_BASE[row_idx][col_idx+1]`)
|
||
- **Config**: Same as 7b
|
||
- **Failure**: `GPL-0302` placement density overflow — 150K cells at 41.3% exceeded 40% target
|
||
- **Root cause**: Yosys cannot fold H_BASE constants through registers → full 256:1 write mux explosion (~2x cell count vs Run 6's 83K)
|
||
- **Lesson**: Registering H_BASE shift values prevents Yosys constant folding
|
||
|
||
## Run 7d: `run6_baseline` (Mar 9, 2026) — FAILED
|
||
- **RTL**: Reverted to Run 6 baseline (identical RTL)
|
||
- **Config**: `DIODE_ON_PORTS=in` (inadvertently left from earlier runs), `RUN_HEURISTIC_DIODE_INSERTION=false`
|
||
- **Cells**: 85,500
|
||
- **Failure**: `GRT-0118` routing congestion
|
||
- **Root cause**: `DIODE_ON_PORTS=in` inserts diodes on input ports even when heuristic insertion is disabled
|
||
|
||
## Run 7e: `run6b_nodiode` (Mar 10, 2026) — FAILED
|
||
- **RTL**: Run 6 baseline
|
||
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.5/0.3 (from config.json), reused `run6_baseline` synthesis
|
||
- **Failure**: Post-CTS resizer diverged (9+ GiB memory, 3+ hours, never converged)
|
||
- **Root cause**: Reusing synthesis from a run with different config (`DIODE_ON_PORTS=in`) produces a subtly different netlist that causes PnR divergence
|
||
|
||
## Run 7f: `run6_clean` (Mar 10, 2026) — FAILED
|
||
- **RTL**: Run 6 baseline, clean full run from scratch
|
||
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.5/0.3
|
||
- **Cells**: 85,500
|
||
- **Hold buffers inserted**: 35,506
|
||
- **Failure**: `GRT-0118` routing congestion
|
||
- **Root cause**: Higher hold slack margins (0.5/0.3 vs balanced_popcount's 0.4/0.2) caused 13K extra hold buffers (35K vs 22K), pushing routing congestion over GRT threshold
|
||
|
||
## Run 7g: `run6_fixhold` (Mar 10, 2026) — FAILED
|
||
- **RTL**: Run 6 baseline, reused `run6_clean` synthesis
|
||
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.4/0.2 (matching balanced_popcount)
|
||
- **Failure**: Post-CTS resizer diverged (14+ GiB, 3.5+ hours)
|
||
- **Root cause**: Yosys non-determinism — `run6_clean` synthesis produced a slightly different cell mix that didn't route cleanly despite identical config
|
||
|
||
## Run 7h: `run6_reuse_bp` (Mar 10, 2026) — COMPLETED (reproduces Run 6!)
|
||
- **RTL**: Run 6 baseline, **reused balanced_popcount's actual synthesis netlist**
|
||
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.4/0.2
|
||
- **Result**: All stages completed. DRC/LVS clean. TT timing met!
|
||
- **Hold buffers**: 22,095 (identical to balanced_popcount)
|
||
|
||
### Physical Results
|
||
| Metric | Result |
|
||
|--------|--------|
|
||
| Magic DRC | **Clean** |
|
||
| KLayout DRC | **Clean** |
|
||
| LVS | **Clean** (circuits match uniquely) |
|
||
| Antenna violating nets | 1,687 (repair disabled) |
|
||
| Antenna violating pins | 3,416 (repair disabled) |
|
||
|
||
### Area & Utilization
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Die area | 4,928,000 µm² (4.93 mm²) |
|
||
| Instance count | 186,915 |
|
||
| Instance area | 1,367,580 µm² (1.37 mm²) |
|
||
| Core utilization | 28.2% |
|
||
|
||
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
|
||
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|
||
|--------|----------------|-----------------|----------------|----------------|
|
||
| nom_tt_025C_1v80 | **+3.28** | 0 | -0.45 | -10.5 |
|
||
| nom_ss_100C_1v60 | **-9.18** | -12,474 | -0.17 | -0.21 |
|
||
| nom_ff_n40C_1v95 | **+5.93** | 0 | -0.37 | -38.6 |
|
||
| max_ss_100C_1v60 | -10.45 | -15,897 | -0.44 | -0.87 |
|
||
| min_tt_025C_1v80 | +3.71 | 0 | -0.26 | -1.66 |
|
||
| max_tt_025C_1v80 | +2.90 | 0 | -0.62 | -29.5 |
|
||
|
||
### Key Observations
|
||
1. **Results identical to Run 6** — confirms that the balanced_popcount synthesis netlist is the key ingredient
|
||
2. Yosys non-determinism is significant: re-synthesizing the same RTL with same config produces netlists that fail PnR
|
||
3. Hold violations (1,543 total) are all on input port paths (`wb_dat_i`, `wb_adr_i`), zero reg-to-reg — fixable with input delay constraints
|
||
4. Max slew violations (4,112) and max cap violations (655) concentrated in SS corner
|
||
|
||
## Updated Summary Table
|
||
|
||
| Run | RTL | Key Change | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|
||
|-----|-----|------------|---------|--------|-------------|---------------|
|
||
| 1 | Unpipelined | — | Heuristic | **FAILED** | — | — |
|
||
| 2 | Unpipelined | — | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
|
||
| 3 | Pipelined CN | CN pipeline | Iterative | **FAILED** | — | — |
|
||
| 4 | Pipelined CN | CN pipeline | None | **COMPLETED** | -28.86 ns | ~20 MHz |
|
||
| 5 | + Syndrome pipeline | Serial popcount | None | **COMPLETED** | -28.98 ns | ~20 MHz |
|
||
| 6 | + Balanced popcount | Adder tree | None | **COMPLETED** | **0.0 ns** | **50 MHz** |
|
||
| 7a | + LAYER_WRITE split | ADDR/DATA pipeline | Heuristic | **FAILED** | — | — |
|
||
| 7b | + LAYER_WRITE split | ADDR/DATA pipeline | None | **FAILED** (resizer) | — | — |
|
||
| 7c | + pre_shift | H_BASE lookahead | None | **FAILED** (GPL) | — | — |
|
||
| 7d | Run 6 baseline | DIODE_ON_PORTS=in | None | **FAILED** (GRT) | — | — |
|
||
| 7e | Run 6 baseline | Reuse wrong synth | None | **FAILED** (resizer) | — | — |
|
||
| 7f | Run 6 baseline | Hold margins 0.5/0.3 | None | **FAILED** (GRT) | — | — |
|
||
| 7g | Run 6 baseline | Reuse run6_clean synth | None | **FAILED** (resizer) | — | — |
|
||
| 7h | Run 6 baseline | **Reuse BP synth** | None | **COMPLETED** | **+3.28 ns** | **50 MHz** |
|
||
|
||
## Key Lessons Learned (Run 7 Series)
|
||
|
||
1. **LAYER_WRITE pipeline is not viable**: Any register between col_idx and H_BASE causes either cell explosion (Yosys can't fold constants through registers) or PnR divergence (too many paths for resizer)
|
||
2. **Heuristic diode insertion always fails**: Both `RUN_HEURISTIC_DIODE_INSERTION=true` and `DIODE_ON_PORTS=in` cause GRT-0118 congestion
|
||
3. **Hold slack margins matter**: 0.5/0.3 inserts 35K hold buffers → GRT failure. 0.4/0.2 inserts 22K → passes
|
||
4. **Yosys synthesis is non-deterministic**: Re-synthesizing identical RTL+config produces different netlists with different PnR outcomes. The balanced_popcount synthesis netlist is the only one proven to complete
|
||
5. **Config must be consistent**: Reusing synthesis from a run with different config settings causes PnR divergence
|
||
6. **Run 6's balanced_popcount synthesis netlist is the golden reference** — all future PnR runs should reuse it
|
||
|
||
## Wrapper Hardening (Mar 12-13, 2026)
|
||
|
||
### wrapper_v2 — COMPLETED (LVS fail)
|
||
- **Config**: `SYNTH_ELABORATE_ONLY=true`, `FP_PDN_ENABLE_RAILS=false`
|
||
- **Result**: DRC clean, but LVS fails — 3 standard cells (inv_2 + 2x conb_1) have floating VPWR/VGND
|
||
- **Root cause**: Without power rails, wrapper std cells have no power connection
|
||
|
||
### wrapper_v3 — ABORTED (208 LVS pin-match errors)
|
||
- **Config**: `SYNTH_ELABORATE_ONLY=true`, `FP_PDN_ENABLE_RAILS=true`, `ERROR_ON_LVS_ERROR=true`
|
||
- **Result**: DRC clean, XOR clean, power pins connected. Flow aborted at LVS check.
|
||
- **LVS issue**: 206 constant-tied output pins merged during Magic SPICE extraction
|
||
|
||
### wrapper_v4 — COMPLETED (golden wrapper)
|
||
- **Config**: Same as v3 but `ERROR_ON_LVS_ERROR=false`
|
||
- **Result**: All 69 stages completed. DRC clean (Magic + KLayout). XOR clean.
|
||
- **LVS**: 208 pin-match errors (cosmetic — device classes equivalent)
|
||
- **Pin merging**: Magic SPICE extraction merges io_oeb[37:0], io_out[37:0], la_data_out[127:0], user_irq[2:1] into shared constant nets, losing individual pin labels
|
||
|
||
## Precheck Results (Mar 13, 2026)
|
||
|
||
| # | Check | Result |
|
||
|---|-------|--------|
|
||
| 1 | License | PASSED (SPDX sub-check: 1727 non-compliant venv files) |
|
||
| 2 | Makefile | **PASSED** |
|
||
| 3 | Default | **PASSED** |
|
||
| 4 | Documentation | **PASSED** |
|
||
| 5 | Top Cell | **PASSED** |
|
||
| 6 | Consistency | **PASSED** |
|
||
| 7 | GPIO-Defines | **PASSED** |
|
||
| 8 | XOR | **PASSED** |
|
||
| 9 | Magic DRC | **PASSED** |
|
||
| 10 | KLayout FEOL | FAILED (SIGSEGV crash, NOT real DRC) |
|
||
| 11 | KLayout BEOL | **PASSED** |
|
||
| 12 | KLayout Offgrid | **PASSED** |
|
||
| 13 | KLayout Metal Density | **PASSED** |
|
||
| 14 | KLayout Pin Labels | **PASSED** |
|
||
| 15 | KLayout ZeroArea | **PASSED** |
|
||
| 16 | Spike Check | **PASSED** |
|
||
| 17 | Illegal Cellname | **PASSED** |
|
||
| 18 | OEB | **PASSED** |
|
||
| 19 | LVS | FAILED (3 cosmetic pin mismatches) |
|
||
|
||
**17 PASSED, 2 FAILED.** Both failures are non-functional:
|
||
- KLayout FEOL: Tool crash (signal 11), not a DRC violation
|
||
- LVS: "Top level cell failed pin matching" — 3 cosmetic mismatches:
|
||
- `io_oeb[9]` in layout only (Magic kept 1 label for merged constant net)
|
||
- `user_irq[2]` in layout only (same issue)
|
||
- `vssd2` in netlist only (PDN power net not labeled as port)
|
||
- CVC: 0 errors. Device classes: equivalent.
|
||
|
||
## Gate-Level Simulation Results (Mar 13, 2026)
|
||
|
||
All 5 cocotb tests passed in GL mode (iverilog + caravel_cocotb, no SDF annotation):
|
||
|
||
| Test | Status | Sim Time (ns) | Wall Time (s) | GPIO[7:0] | Errors |
|
||
|------|--------|---------------|----------------|-----------|--------|
|
||
| ldpc_basic | **PASS** | 854,225 | 1,814 | 0xAB | 0 |
|
||
| ldpc_noisy | **PASS** | 1,011,550 | 2,720 | 0xAB | 0 |
|
||
| ldpc_max_iter | **PASS** | 1,104,525 | 3,393 | 0xAB | 0 |
|
||
| ldpc_back_to_back | **PASS** | 1,140,375 | 3,371 | 0xAB | 0 |
|
||
| ldpc_demo | **PASS** | 1,251,050 | 3,612 | 0xAB | 0 |
|
||
|
||
- iverilog compilation: ~2h18m per test (1.1GB sim.vvp), 8.2GB RAM
|
||
- Simulation: ~30-60 min per test (5-9GB VCD waveform)
|
||
- All tests ran on snoke (247GB RAM), 4 tests in parallel
|
||
- GPIO[7:0] = 0xAB is the firmware success code for all tests
|
||
- No X-propagation or timing race issues observed
|
||
|
||
## Wrapper Hardening Attempts (May 7-11, 2026) — Failed LVS Cosmetic-Fix Series
|
||
|
||
After the May 1 `cf_wrapper_v5` golden run landed (commit `74ad20a` to origin / `1fcdc1d` to gitea) with 208 cosmetic LVS pin-match errors, a series of seven follow-up runs tried to eliminate those errors. **All seven failed.** The errors are a Magic SPICE-extraction limitation, not a hardening defect — no amount of RTL/placement tweaking will change Magic's behavior.
|
||
|
||
### Timeline
|
||
|
||
| Run | Date | Strategy | Result |
|
||
|-----|------|----------|--------|
|
||
| v6 | May 7 | First post-PDN-swap retry (commit `8cc8414` landed config changes); same wrapper RTL | Flow completed but KLayout crashed in final manufacturability step; same 208 LVS errors |
|
||
| v7 | May 7 | Same as v6, re-run | Aborted mid-routing on `[DRT-0349]` LEF58_ENCLOSURE warnings — routing never completed |
|
||
| v8 | May 8 | `manual_tieoffs.vh` with 206 per-pin `conb_1` cells + `manual_placements.json` placing each cell adjacent to its target pin; mprj moved `[60,15] → [60,200]` to make room | Flow completed; **same 208 LVS errors** — Magic still merged all constant-tied outputs. STA failed on `min_ss_100C_1v60` and `nom_tt_025C_1v80` corners |
|
||
| v9 | May 9 | Same as v8 with `ERROR_ON_TR_DRC=false` to push through routing | **1780 routing DRC errors** (deferred). Magic streamout completed but DRC was never clean |
|
||
| v10 | May 11 | Same family of placement tweaks | **1362 routing DRC errors** (deferred); same failure mode as v9 |
|
||
| v11 | May 11 | One more attempt | Interrupted at step 01 (yosys-jsonheader); no harden process running |
|
||
|
||
### Why every attempt failed
|
||
|
||
The 208 LVS errors all come from **Magic SPICE extraction collapsing constant-tied nets**:
|
||
|
||
- `la_data_out[127:0]` — all 128 bits tied to `1'b0` → Magic extracts as a single GND net → 127 pin labels lost (only one kept arbitrarily, often none)
|
||
- `io_out[37:0]` — all 38 bits tied to `1'b0` → same merge
|
||
- `io_oeb[37:0]` — all 38 bits tied to `1'b1` → merged into VDD net (Magic keeps the label for `io_oeb[9]` for unknown reasons)
|
||
- `user_irq[2:1]` — tied to `2'b0` → merged into GND
|
||
|
||
The v8 attempt — putting each pin behind its own `sky130_fd_sc_hd__conb_1` cell — does not break the merge because Magic's extractor still resolves each `conb_1` output as the constant `VPWR` or `VGND` and collapses them onto the global power/ground nets at the extracted-SPICE level. Per-pin cells generate distinct logical nets in the Verilog netlist but not distinct extracted nets in the layout. **Netgen itself reports "Device classes equivalent" and "Cell pin lists altered to match"** — the failure is bookkeeping, not electrical.
|
||
|
||
### Approaches proven non-viable (don't try again)
|
||
|
||
1. **Per-pin `conb_1` cells in the wrapper Verilog** — v8 disproved this. Magic optimizes them onto the constant nets.
|
||
2. **Per-pin manual placement of tieoff cells** — placement doesn't change extraction behavior.
|
||
3. **mprj location shifts** to make room for tieoff rows — doesn't help; cosmetic LVS persists.
|
||
4. **Pushing routing-DRC tolerance up** (v9, v10) — produces broken layouts (1300–1800 routing DRC errors), worse than starting state.
|
||
|
||
### Approaches that *could* work but were not attempted (deferred — too risky pre-deadline)
|
||
|
||
1. **Drive 206 dummy zero outputs from inside `ldpc_decoder_top`** — would force each wrapper output to come from a distinct extracted macro pin instead of a constant-tied wrapper net. Requires a fresh macro re-harden, which risks breaking Run 6's golden timing on a non-deterministic Yosys run. 4–6 hour cost, high regression risk.
|
||
2. **Post-extraction `.mag` editing** to add per-pin port labels — brittle and tool-specific; would not survive a re-harden.
|
||
3. **Formal LVS waiver** (the chosen May 12 path) — document the cosmetic nature of the errors, cite netgen's own "Device classes equivalent" line, and submit alongside the submission packet.
|
||
|
||
### Key lesson
|
||
|
||
**The 208 LVS pin-match errors are not fixable with wrapper-only hardening.** Magic SPICE-extraction behavior is the root cause. Future sessions should not re-litigate this — either fix it inside the macro (re-harden risk) or formally waive it.
|
||
|
||
## Next Steps
|
||
- Submit with a formal LVS waiver (see `chip_ignite/docs/LVS_WAIVER.md`)
|
||
- Confirm `cf precheck` and `cf verify ldpc_basic --sim gl` still pass on the HEAD wrapper state
|
||
- `cf push` before 2026-05-13 deadline
|