Compare commits
3 Commits
77103f68c6
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
| f22ee197ab | |||
| 1f4b62454f | |||
| 10ddb70fa0 |
@@ -189,9 +189,306 @@ The `syndrome_cnt = syndrome_cnt + 1` accumulation pattern creates a carry chain
|
|||||||
|
|
||||||
Wishbone address input (`wb_adr_i`) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.
|
Wishbone address input (`wb_adr_i`) has -2.47 ns setup violation. Fixable by registering the address at the decoder boundary.
|
||||||
|
|
||||||
|
## Run 5: `syndrome_pipeline` (Mar 3, 2026) — COMPLETED (timing violations)
|
||||||
|
- **RTL**: Pipelined CN + syndrome pipeline (SYNDROME_S1 + SYNDROME_S2 with serial popcount)
|
||||||
|
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_ANTENNA_REPAIR=false`
|
||||||
|
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||||||
|
- **Result**: All 75 steps completed. DRC/LVS clean.
|
||||||
|
- **TT Setup WNS**: **-28.98 ns** — no improvement from Run 4
|
||||||
|
- **Root cause**: Yosys serializes `syndrome_cnt = syndrome_cnt + 1` loop-carried dependency into ~48 ns chain
|
||||||
|
- **Lesson**: Splitting parity + popcount into 2 cycles helps nothing if the popcount itself is still serial
|
||||||
|
|
||||||
|
## Run 6: `balanced_popcount` (Mar 4, 2026) — COMPLETED (TT timing MET!)
|
||||||
|
- **RTL**: Pipelined CN + syndrome pipeline with balanced 4-wide adder tree popcount
|
||||||
|
- **Config**: `CLOCK_PERIOD=20` (50 MHz), `SYNTH_STRATEGY=AREA 0`, `RUN_ANTENNA_REPAIR=false`
|
||||||
|
- **Die area**: 2800 x 1760 µm (4.93 mm²)
|
||||||
|
- **Result**: All 75 steps completed. DRC/LVS clean. **TT timing met!**
|
||||||
|
|
||||||
|
### Physical Results
|
||||||
|
| Metric | Result |
|
||||||
|
|--------|--------|
|
||||||
|
| Magic DRC | **Clean** |
|
||||||
|
| KLayout DRC | **Clean** |
|
||||||
|
| LVS | **Clean** (0 errors, 0 unmatched) |
|
||||||
|
| Antenna violating nets | 1,687 (no repair attempted) |
|
||||||
|
|
||||||
|
### Area & Utilization
|
||||||
|
| Metric | Value |
|
||||||
|
|--------|-------|
|
||||||
|
| Die area | 4,928,000 µm² (4.93 mm²) |
|
||||||
|
| Instance count | 186,915 |
|
||||||
|
| Instance area | 1,367,580 µm² (1.37 mm²) |
|
||||||
|
| Core utilization | 28.2% |
|
||||||
|
| Sequential cells | 18,056 |
|
||||||
|
| Timing repair buffers | 27,864 |
|
||||||
|
|
||||||
|
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
|
||||||
|
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|
||||||
|
|--------|----------------|-----------------|----------------|----------------|
|
||||||
|
| nom_tt_025C_1v80 | **0.0** | 0 | -0.45 | -10.5 |
|
||||||
|
| nom_ss_100C_1v60 | **-9.18** | -12,474.4 | -0.17 | -0.21 |
|
||||||
|
| nom_ff_n40C_1v95 | **0.0** | 0 | -0.37 | -38.6 |
|
||||||
|
| max_ss_100C_1v60 | -10.45 | -15,896.8 | -0.44 | -0.87 |
|
||||||
|
|
||||||
|
### Estimated Max Frequency
|
||||||
|
- **TT corner**: **50 MHz — TIMING MET**
|
||||||
|
- **SS corner**: Critical path ~40 ns → **~25 MHz** (up from ~11 MHz)
|
||||||
|
- **FF corner**: **50 MHz — TIMING MET**
|
||||||
|
|
||||||
|
### New Critical Path (SS corner)
|
||||||
|
| Item | Value |
|
||||||
|
|------|-------|
|
||||||
|
| Startpoint | `u_core.col_idx[0]` (column index register) |
|
||||||
|
| Endpoint | `u_core.beliefs` registers |
|
||||||
|
| Slack | -9.18 ns (nom_ss) |
|
||||||
|
| Data arrival time | 40.15 ns |
|
||||||
|
| Description | Belief update mux path during LAYER_READ/LAYER_WRITE |
|
||||||
|
|
||||||
|
The syndrome path is NO LONGER critical. The new bottleneck is the column-indexed mux/barrel-shifter path used during belief reads and writes.
|
||||||
|
|
||||||
|
### Key Observations
|
||||||
|
1. **Balanced popcount tree eliminated the syndrome bottleneck** — WNS improved from -28.98 ns to 0.0 ns at TT
|
||||||
|
2. TT and FF corners now fully meet 50 MHz timing
|
||||||
|
3. SS corner still fails (-9.18 ns) due to a different path: belief update mux indexed by col_idx
|
||||||
|
4. Hold violations are minor (-0.45 ns) and can be fixed with post-route optimization
|
||||||
|
5. 1,687 antenna violations need to be addressed (antenna repair was disabled)
|
||||||
|
|
||||||
|
## Updated Summary Table
|
||||||
|
|
||||||
|
| Run | RTL | Key Change | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|
||||||
|
|-----|-----|------------|---------|--------|-------------|---------------|
|
||||||
|
| 1 | Unpipelined | — | Heuristic | **FAILED** | — | — |
|
||||||
|
| 2 | Unpipelined | — | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
|
||||||
|
| 3 | Pipelined CN | CN pipeline | Iterative | **FAILED** | — | — |
|
||||||
|
| 4 | Pipelined CN | CN pipeline | None | **COMPLETED** | -28.86 ns | ~20 MHz |
|
||||||
|
| 5 | + Syndrome pipeline | Serial popcount | None | **COMPLETED** | -28.98 ns | ~20 MHz |
|
||||||
|
| 6 | + Balanced popcount | Adder tree | None | **COMPLETED** | **0.0 ns** | **50 MHz** |
|
||||||
|
|
||||||
|
## Run 7a: `pipelined_layer2` (Mar 9, 2026) — FAILED
|
||||||
|
- **RTL**: Run 6 + LAYER_WRITE split into LAYER_WRITE_ADDR + LAYER_WRITE_DATA
|
||||||
|
- **Config**: `CLOCK_PERIOD=20`, `DIODE_ON_PORTS=in`, `HEURISTIC_ANTENNA_THRESHOLD=200`
|
||||||
|
- **Failure**: `GRT-0118` routing congestion — heuristic diode insertion on input ports added too many cells
|
||||||
|
- **Lesson**: Any heuristic diode insertion causes GRT failure on this design
|
||||||
|
|
||||||
|
## Run 7b: `pipelined_layer3` (Mar 9, 2026) — FAILED
|
||||||
|
- **RTL**: Same as 7a (LAYER_WRITE_ADDR/DATA split)
|
||||||
|
- **Config**: `DIODE_ON_PORTS=none`, `RUN_HEURISTIC_DIODE_INSERTION=false`
|
||||||
|
- **Failure**: Post-CTS resizer diverged — 2.5+ hours at 100% CPU, memory climbing linearly, never converging
|
||||||
|
- **Lesson**: LAYER_WRITE pipeline split creates too many paths for OpenROAD resizer
|
||||||
|
|
||||||
|
## Run 7c: `pre_shift` (Mar 9, 2026) — FAILED
|
||||||
|
- **RTL**: Run 6 + pre-registered H_BASE shift lookahead (`H_BASE[row_idx][col_idx+1]`)
|
||||||
|
- **Config**: Same as 7b
|
||||||
|
- **Failure**: `GPL-0302` placement density overflow — 150K cells at 41.3% exceeded 40% target
|
||||||
|
- **Root cause**: Yosys cannot fold H_BASE constants through registers → full 256:1 write mux explosion (~2x cell count vs Run 6's 83K)
|
||||||
|
- **Lesson**: Registering H_BASE shift values prevents Yosys constant folding
|
||||||
|
|
||||||
|
## Run 7d: `run6_baseline` (Mar 9, 2026) — FAILED
|
||||||
|
- **RTL**: Reverted to Run 6 baseline (identical RTL)
|
||||||
|
- **Config**: `DIODE_ON_PORTS=in` (inadvertently left from earlier runs), `RUN_HEURISTIC_DIODE_INSERTION=false`
|
||||||
|
- **Cells**: 85,500
|
||||||
|
- **Failure**: `GRT-0118` routing congestion
|
||||||
|
- **Root cause**: `DIODE_ON_PORTS=in` inserts diodes on input ports even when heuristic insertion is disabled
|
||||||
|
|
||||||
|
## Run 7e: `run6b_nodiode` (Mar 10, 2026) — FAILED
|
||||||
|
- **RTL**: Run 6 baseline
|
||||||
|
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.5/0.3 (from config.json), reused `run6_baseline` synthesis
|
||||||
|
- **Failure**: Post-CTS resizer diverged (9+ GiB memory, 3+ hours, never converged)
|
||||||
|
- **Root cause**: Reusing synthesis from a run with different config (`DIODE_ON_PORTS=in`) produces a subtly different netlist that causes PnR divergence
|
||||||
|
|
||||||
|
## Run 7f: `run6_clean` (Mar 10, 2026) — FAILED
|
||||||
|
- **RTL**: Run 6 baseline, clean full run from scratch
|
||||||
|
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.5/0.3
|
||||||
|
- **Cells**: 85,500
|
||||||
|
- **Hold buffers inserted**: 35,506
|
||||||
|
- **Failure**: `GRT-0118` routing congestion
|
||||||
|
- **Root cause**: Higher hold slack margins (0.5/0.3 vs balanced_popcount's 0.4/0.2) caused 13K extra hold buffers (35K vs 22K), pushing routing congestion over GRT threshold
|
||||||
|
|
||||||
|
## Run 7g: `run6_fixhold` (Mar 10, 2026) — FAILED
|
||||||
|
- **RTL**: Run 6 baseline, reused `run6_clean` synthesis
|
||||||
|
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.4/0.2 (matching balanced_popcount)
|
||||||
|
- **Failure**: Post-CTS resizer diverged (14+ GiB, 3.5+ hours)
|
||||||
|
- **Root cause**: Yosys non-determinism — `run6_clean` synthesis produced a slightly different cell mix that didn't route cleanly despite identical config
|
||||||
|
|
||||||
|
## Run 7h: `run6_reuse_bp` (Mar 10, 2026) — COMPLETED (reproduces Run 6!)
|
||||||
|
- **RTL**: Run 6 baseline, **reused balanced_popcount's actual synthesis netlist**
|
||||||
|
- **Config**: `DIODE_ON_PORTS=none`, hold margins 0.4/0.2
|
||||||
|
- **Result**: All stages completed. DRC/LVS clean. TT timing met!
|
||||||
|
- **Hold buffers**: 22,095 (identical to balanced_popcount)
|
||||||
|
|
||||||
|
### Physical Results
|
||||||
|
| Metric | Result |
|
||||||
|
|--------|--------|
|
||||||
|
| Magic DRC | **Clean** |
|
||||||
|
| KLayout DRC | **Clean** |
|
||||||
|
| LVS | **Clean** (circuits match uniquely) |
|
||||||
|
| Antenna violating nets | 1,687 (repair disabled) |
|
||||||
|
| Antenna violating pins | 3,416 (repair disabled) |
|
||||||
|
|
||||||
|
### Area & Utilization
|
||||||
|
| Metric | Value |
|
||||||
|
|--------|-------|
|
||||||
|
| Die area | 4,928,000 µm² (4.93 mm²) |
|
||||||
|
| Instance count | 186,915 |
|
||||||
|
| Instance area | 1,367,580 µm² (1.37 mm²) |
|
||||||
|
| Core utilization | 28.2% |
|
||||||
|
|
||||||
|
### Timing (post-route, CLOCK_PERIOD = 20 ns / 50 MHz target)
|
||||||
|
| Corner | Setup WNS (ns) | Setup TNS (ns) | Hold WNS (ns) | Hold TNS (ns) |
|
||||||
|
|--------|----------------|-----------------|----------------|----------------|
|
||||||
|
| nom_tt_025C_1v80 | **+3.28** | 0 | -0.45 | -10.5 |
|
||||||
|
| nom_ss_100C_1v60 | **-9.18** | -12,474 | -0.17 | -0.21 |
|
||||||
|
| nom_ff_n40C_1v95 | **+5.93** | 0 | -0.37 | -38.6 |
|
||||||
|
| max_ss_100C_1v60 | -10.45 | -15,897 | -0.44 | -0.87 |
|
||||||
|
| min_tt_025C_1v80 | +3.71 | 0 | -0.26 | -1.66 |
|
||||||
|
| max_tt_025C_1v80 | +2.90 | 0 | -0.62 | -29.5 |
|
||||||
|
|
||||||
|
### Key Observations
|
||||||
|
1. **Results identical to Run 6** — confirms that the balanced_popcount synthesis netlist is the key ingredient
|
||||||
|
2. Yosys non-determinism is significant: re-synthesizing the same RTL with same config produces netlists that fail PnR
|
||||||
|
3. Hold violations (1,543 total) are all on input port paths (`wb_dat_i`, `wb_adr_i`), zero reg-to-reg — fixable with input delay constraints
|
||||||
|
4. Max slew violations (4,112) and max cap violations (655) concentrated in SS corner
|
||||||
|
|
||||||
|
## Updated Summary Table
|
||||||
|
|
||||||
|
| Run | RTL | Key Change | Antenna | Status | TT Setup WNS | Max Freq (TT) |
|
||||||
|
|-----|-----|------------|---------|--------|-------------|---------------|
|
||||||
|
| 1 | Unpipelined | — | Heuristic | **FAILED** | — | — |
|
||||||
|
| 2 | Unpipelined | — | Iterative | **COMPLETED** | -27.13 ns | ~21 MHz |
|
||||||
|
| 3 | Pipelined CN | CN pipeline | Iterative | **FAILED** | — | — |
|
||||||
|
| 4 | Pipelined CN | CN pipeline | None | **COMPLETED** | -28.86 ns | ~20 MHz |
|
||||||
|
| 5 | + Syndrome pipeline | Serial popcount | None | **COMPLETED** | -28.98 ns | ~20 MHz |
|
||||||
|
| 6 | + Balanced popcount | Adder tree | None | **COMPLETED** | **0.0 ns** | **50 MHz** |
|
||||||
|
| 7a | + LAYER_WRITE split | ADDR/DATA pipeline | Heuristic | **FAILED** | — | — |
|
||||||
|
| 7b | + LAYER_WRITE split | ADDR/DATA pipeline | None | **FAILED** (resizer) | — | — |
|
||||||
|
| 7c | + pre_shift | H_BASE lookahead | None | **FAILED** (GPL) | — | — |
|
||||||
|
| 7d | Run 6 baseline | DIODE_ON_PORTS=in | None | **FAILED** (GRT) | — | — |
|
||||||
|
| 7e | Run 6 baseline | Reuse wrong synth | None | **FAILED** (resizer) | — | — |
|
||||||
|
| 7f | Run 6 baseline | Hold margins 0.5/0.3 | None | **FAILED** (GRT) | — | — |
|
||||||
|
| 7g | Run 6 baseline | Reuse run6_clean synth | None | **FAILED** (resizer) | — | — |
|
||||||
|
| 7h | Run 6 baseline | **Reuse BP synth** | None | **COMPLETED** | **+3.28 ns** | **50 MHz** |
|
||||||
|
|
||||||
|
## Key Lessons Learned (Run 7 Series)
|
||||||
|
|
||||||
|
1. **LAYER_WRITE pipeline is not viable**: Any register between col_idx and H_BASE causes either cell explosion (Yosys can't fold constants through registers) or PnR divergence (too many paths for resizer)
|
||||||
|
2. **Heuristic diode insertion always fails**: Both `RUN_HEURISTIC_DIODE_INSERTION=true` and `DIODE_ON_PORTS=in` cause GRT-0118 congestion
|
||||||
|
3. **Hold slack margins matter**: 0.5/0.3 inserts 35K hold buffers → GRT failure. 0.4/0.2 inserts 22K → passes
|
||||||
|
4. **Yosys synthesis is non-deterministic**: Re-synthesizing identical RTL+config produces different netlists with different PnR outcomes. The balanced_popcount synthesis netlist is the only one proven to complete
|
||||||
|
5. **Config must be consistent**: Reusing synthesis from a run with different config settings causes PnR divergence
|
||||||
|
6. **Run 6's balanced_popcount synthesis netlist is the golden reference** — all future PnR runs should reuse it
|
||||||
|
|
||||||
|
## Wrapper Hardening (Mar 12-13, 2026)
|
||||||
|
|
||||||
|
### wrapper_v2 — COMPLETED (LVS fail)
|
||||||
|
- **Config**: `SYNTH_ELABORATE_ONLY=true`, `FP_PDN_ENABLE_RAILS=false`
|
||||||
|
- **Result**: DRC clean, but LVS fails — 3 standard cells (inv_2 + 2x conb_1) have floating VPWR/VGND
|
||||||
|
- **Root cause**: Without power rails, wrapper std cells have no power connection
|
||||||
|
|
||||||
|
### wrapper_v3 — ABORTED (208 LVS pin-match errors)
|
||||||
|
- **Config**: `SYNTH_ELABORATE_ONLY=true`, `FP_PDN_ENABLE_RAILS=true`, `ERROR_ON_LVS_ERROR=true`
|
||||||
|
- **Result**: DRC clean, XOR clean, power pins connected. Flow aborted at LVS check.
|
||||||
|
- **LVS issue**: 206 constant-tied output pins merged during Magic SPICE extraction
|
||||||
|
|
||||||
|
### wrapper_v4 — COMPLETED (golden wrapper)
|
||||||
|
- **Config**: Same as v3 but `ERROR_ON_LVS_ERROR=false`
|
||||||
|
- **Result**: All 69 stages completed. DRC clean (Magic + KLayout). XOR clean.
|
||||||
|
- **LVS**: 208 pin-match errors (cosmetic — device classes equivalent)
|
||||||
|
- **Pin merging**: Magic SPICE extraction merges io_oeb[37:0], io_out[37:0], la_data_out[127:0], user_irq[2:1] into shared constant nets, losing individual pin labels
|
||||||
|
|
||||||
|
## Precheck Results (Mar 13, 2026)
|
||||||
|
|
||||||
|
| # | Check | Result |
|
||||||
|
|---|-------|--------|
|
||||||
|
| 1 | License | PASSED (SPDX sub-check: 1727 non-compliant venv files) |
|
||||||
|
| 2 | Makefile | **PASSED** |
|
||||||
|
| 3 | Default | **PASSED** |
|
||||||
|
| 4 | Documentation | **PASSED** |
|
||||||
|
| 5 | Top Cell | **PASSED** |
|
||||||
|
| 6 | Consistency | **PASSED** |
|
||||||
|
| 7 | GPIO-Defines | **PASSED** |
|
||||||
|
| 8 | XOR | **PASSED** |
|
||||||
|
| 9 | Magic DRC | **PASSED** |
|
||||||
|
| 10 | KLayout FEOL | FAILED (SIGSEGV crash, NOT real DRC) |
|
||||||
|
| 11 | KLayout BEOL | **PASSED** |
|
||||||
|
| 12 | KLayout Offgrid | **PASSED** |
|
||||||
|
| 13 | KLayout Metal Density | **PASSED** |
|
||||||
|
| 14 | KLayout Pin Labels | **PASSED** |
|
||||||
|
| 15 | KLayout ZeroArea | **PASSED** |
|
||||||
|
| 16 | Spike Check | **PASSED** |
|
||||||
|
| 17 | Illegal Cellname | **PASSED** |
|
||||||
|
| 18 | OEB | **PASSED** |
|
||||||
|
| 19 | LVS | FAILED (3 cosmetic pin mismatches) |
|
||||||
|
|
||||||
|
**17 PASSED, 2 FAILED.** Both failures are non-functional:
|
||||||
|
- KLayout FEOL: Tool crash (signal 11), not a DRC violation
|
||||||
|
- LVS: "Top level cell failed pin matching" — 3 cosmetic mismatches:
|
||||||
|
- `io_oeb[9]` in layout only (Magic kept 1 label for merged constant net)
|
||||||
|
- `user_irq[2]` in layout only (same issue)
|
||||||
|
- `vssd2` in netlist only (PDN power net not labeled as port)
|
||||||
|
- CVC: 0 errors. Device classes: equivalent.
|
||||||
|
|
||||||
|
## Gate-Level Simulation Results (Mar 13, 2026)
|
||||||
|
|
||||||
|
All 5 cocotb tests passed in GL mode (iverilog + caravel_cocotb, no SDF annotation):
|
||||||
|
|
||||||
|
| Test | Status | Sim Time (ns) | Wall Time (s) | GPIO[7:0] | Errors |
|
||||||
|
|------|--------|---------------|----------------|-----------|--------|
|
||||||
|
| ldpc_basic | **PASS** | 854,225 | 1,814 | 0xAB | 0 |
|
||||||
|
| ldpc_noisy | **PASS** | 1,011,550 | 2,720 | 0xAB | 0 |
|
||||||
|
| ldpc_max_iter | **PASS** | 1,104,525 | 3,393 | 0xAB | 0 |
|
||||||
|
| ldpc_back_to_back | **PASS** | 1,140,375 | 3,371 | 0xAB | 0 |
|
||||||
|
| ldpc_demo | **PASS** | 1,251,050 | 3,612 | 0xAB | 0 |
|
||||||
|
|
||||||
|
- iverilog compilation: ~2h18m per test (1.1GB sim.vvp), 8.2GB RAM
|
||||||
|
- Simulation: ~30-60 min per test (5-9GB VCD waveform)
|
||||||
|
- All tests ran on snoke (247GB RAM), 4 tests in parallel
|
||||||
|
- GPIO[7:0] = 0xAB is the firmware success code for all tests
|
||||||
|
- No X-propagation or timing race issues observed
|
||||||
|
|
||||||
|
## Wrapper Hardening Attempts (May 7-11, 2026) — Failed LVS Cosmetic-Fix Series
|
||||||
|
|
||||||
|
After the May 1 `cf_wrapper_v5` golden run landed (commit `74ad20a` to origin / `1fcdc1d` to gitea) with 208 cosmetic LVS pin-match errors, a series of seven follow-up runs tried to eliminate those errors. **All seven failed.** The errors are a Magic SPICE-extraction limitation, not a hardening defect — no amount of RTL/placement tweaking will change Magic's behavior.
|
||||||
|
|
||||||
|
### Timeline
|
||||||
|
|
||||||
|
| Run | Date | Strategy | Result |
|
||||||
|
|-----|------|----------|--------|
|
||||||
|
| v6 | May 7 | First post-PDN-swap retry (commit `8cc8414` landed config changes); same wrapper RTL | Flow completed but KLayout crashed in final manufacturability step; same 208 LVS errors |
|
||||||
|
| v7 | May 7 | Same as v6, re-run | Aborted mid-routing on `[DRT-0349]` LEF58_ENCLOSURE warnings — routing never completed |
|
||||||
|
| v8 | May 8 | `manual_tieoffs.vh` with 206 per-pin `conb_1` cells + `manual_placements.json` placing each cell adjacent to its target pin; mprj moved `[60,15] → [60,200]` to make room | Flow completed; **same 208 LVS errors** — Magic still merged all constant-tied outputs. STA failed on `min_ss_100C_1v60` and `nom_tt_025C_1v80` corners |
|
||||||
|
| v9 | May 9 | Same as v8 with `ERROR_ON_TR_DRC=false` to push through routing | **1780 routing DRC errors** (deferred). Magic streamout completed but DRC was never clean |
|
||||||
|
| v10 | May 11 | Same family of placement tweaks | **1362 routing DRC errors** (deferred); same failure mode as v9 |
|
||||||
|
| v11 | May 11 | One more attempt | Interrupted at step 01 (yosys-jsonheader); no harden process running |
|
||||||
|
|
||||||
|
### Why every attempt failed
|
||||||
|
|
||||||
|
The 208 LVS errors all come from **Magic SPICE extraction collapsing constant-tied nets**:
|
||||||
|
|
||||||
|
- `la_data_out[127:0]` — all 128 bits tied to `1'b0` → Magic extracts as a single GND net → 127 pin labels lost (only one kept arbitrarily, often none)
|
||||||
|
- `io_out[37:0]` — all 38 bits tied to `1'b0` → same merge
|
||||||
|
- `io_oeb[37:0]` — all 38 bits tied to `1'b1` → merged into VDD net (Magic keeps the label for `io_oeb[9]` for unknown reasons)
|
||||||
|
- `user_irq[2:1]` — tied to `2'b0` → merged into GND
|
||||||
|
|
||||||
|
The v8 attempt — putting each pin behind its own `sky130_fd_sc_hd__conb_1` cell — does not break the merge because Magic's extractor still resolves each `conb_1` output as the constant `VPWR` or `VGND` and collapses them onto the global power/ground nets at the extracted-SPICE level. Per-pin cells generate distinct logical nets in the Verilog netlist but not distinct extracted nets in the layout. **Netgen itself reports "Device classes equivalent" and "Cell pin lists altered to match"** — the failure is bookkeeping, not electrical.
|
||||||
|
|
||||||
|
### Approaches proven non-viable (don't try again)
|
||||||
|
|
||||||
|
1. **Per-pin `conb_1` cells in the wrapper Verilog** — v8 disproved this. Magic optimizes them onto the constant nets.
|
||||||
|
2. **Per-pin manual placement of tieoff cells** — placement doesn't change extraction behavior.
|
||||||
|
3. **mprj location shifts** to make room for tieoff rows — doesn't help; cosmetic LVS persists.
|
||||||
|
4. **Pushing routing-DRC tolerance up** (v9, v10) — produces broken layouts (1300–1800 routing DRC errors), worse than starting state.
|
||||||
|
|
||||||
|
### Approaches that *could* work but were not attempted (deferred — too risky pre-deadline)
|
||||||
|
|
||||||
|
1. **Drive 206 dummy zero outputs from inside `ldpc_decoder_top`** — would force each wrapper output to come from a distinct extracted macro pin instead of a constant-tied wrapper net. Requires a fresh macro re-harden, which risks breaking Run 6's golden timing on a non-deterministic Yosys run. 4–6 hour cost, high regression risk.
|
||||||
|
2. **Post-extraction `.mag` editing** to add per-pin port labels — brittle and tool-specific; would not survive a re-harden.
|
||||||
|
3. **Formal LVS waiver** (the chosen May 12 path) — document the cosmetic nature of the errors, cite netgen's own "Device classes equivalent" line, and submit alongside the submission packet.
|
||||||
|
|
||||||
|
### Key lesson
|
||||||
|
|
||||||
|
**The 208 LVS pin-match errors are not fixable with wrapper-only hardening.** Magic SPICE-extraction behavior is the root cause. Future sessions should not re-litigate this — either fix it inside the macro (re-harden risk) or formally waive it.
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
- Implement syndrome pipeline (SYNDROME_S1 + SYNDROME_S2) to cut critical path from ~49 ns to ~16 ns
|
- Submit with a formal LVS waiver (see `chip_ignite/docs/LVS_WAIVER.md`)
|
||||||
- Register Wishbone address input to fix secondary violation
|
- Confirm `cf precheck` and `cf verify ldpc_basic --sim gl` still pass on the HEAD wrapper state
|
||||||
- Re-synthesize with AREA 0 and run PnR to verify timing improvement
|
- `cf push` before 2026-05-13 deadline
|
||||||
- Consider increasing die area for antenna repair headroom
|
|
||||||
- Consider `SYNTH_STRATEGY=AREA 1` as middle ground between AREA 0 and AREA 2
|
|
||||||
|
|||||||
@@ -2,14 +2,11 @@
|
|||||||
//
|
//
|
||||||
// Layered scheduling processes one base-matrix row at a time.
|
// Layered scheduling processes one base-matrix row at a time.
|
||||||
// For each row, we:
|
// For each row, we:
|
||||||
// 1. Read VN beliefs for all Z columns connected to this row
|
// 1. LAYER_READ (8 cycles): Read beliefs, subtract old messages → vn_to_cn
|
||||||
// 2. Subtract old CN->VN messages to get VN->CN messages
|
// 2. CN_STAGE1 (1 cycle): Sign/mag extract, min-find (registered)
|
||||||
// 3. Run CN min-sum update
|
// 3. CN_STAGE2 (1 cycle): Extrinsic output generation
|
||||||
// 4. Add new CN->VN messages back to VN beliefs
|
// 4. LAYER_WRITE (8 cycles): Write beliefs + update CN->VN messages
|
||||||
// 5. Write updated beliefs back
|
// Total: 18 cycles/layer × 7 layers + 3 (syndrome) = 129 cycles/iteration
|
||||||
//
|
|
||||||
// This converges ~2x faster than flooding and needs only one message memory
|
|
||||||
// (CN->VN messages for current layer, overwritten each layer).
|
|
||||||
|
|
||||||
module ldpc_decoder_core #(
|
module ldpc_decoder_core #(
|
||||||
parameter N_BASE = 8,
|
parameter N_BASE = 8,
|
||||||
@@ -116,8 +113,9 @@ module ldpc_decoder_core #(
|
|||||||
IDLE,
|
IDLE,
|
||||||
INIT, // Initialize beliefs from channel LLRs, zero messages
|
INIT, // Initialize beliefs from channel LLRs, zero messages
|
||||||
LAYER_READ, // Read Z beliefs for each of DC columns in current row
|
LAYER_READ, // Read Z beliefs for each of DC columns in current row
|
||||||
CN_UPDATE, // Run min-sum CN update on gathered messages
|
CN_STAGE1, // Pipeline stage 1: sign/mag extract, min-find
|
||||||
LAYER_WRITE, // Write updated beliefs and new CN->VN messages
|
CN_STAGE2, // Pipeline stage 2: extrinsic output generation
|
||||||
|
LAYER_WRITE, // Write beliefs + update CN->VN messages
|
||||||
SYNDROME_S1, // Syndrome pipeline stage 1: compute parity bits
|
SYNDROME_S1, // Syndrome pipeline stage 1: compute parity bits
|
||||||
SYNDROME_S2, // Syndrome pipeline stage 2: popcount parity vector
|
SYNDROME_S2, // Syndrome pipeline stage 2: popcount parity vector
|
||||||
SYNDROME_DONE, // Read registered syndrome result
|
SYNDROME_DONE, // Read registered syndrome result
|
||||||
@@ -131,9 +129,16 @@ module ldpc_decoder_core #(
|
|||||||
logic [2:0] col_idx; // current column being read/written (0..N_BASE-1)
|
logic [2:0] col_idx; // current column being read/written (0..N_BASE-1)
|
||||||
logic [4:0] effective_max_iter;
|
logic [4:0] effective_max_iter;
|
||||||
|
|
||||||
// Working registers for current layer CN update
|
// Working registers for current layer
|
||||||
logic signed [Q-1:0] vn_to_cn [DC][Z]; // VN->CN messages for current row
|
logic signed [Q-1:0] vn_to_cn [DC][Z];
|
||||||
logic signed [Q-1:0] cn_to_vn [DC][Z]; // new CN->VN messages (output of min-sum)
|
logic signed [Q-1:0] cn_to_vn [DC][Z];
|
||||||
|
|
||||||
|
// CN pipeline stage 1 intermediate registers
|
||||||
|
logic [DC-1:0] s1_signs [Z];
|
||||||
|
logic s1_sign_xor [Z];
|
||||||
|
logic [Q-2:0] s1_min1 [Z];
|
||||||
|
logic [Q-2:0] s1_min2 [Z];
|
||||||
|
logic [2:0] s1_min1_idx [Z];
|
||||||
|
|
||||||
// Syndrome pipeline registers
|
// Syndrome pipeline registers
|
||||||
logic [M_BASE*Z-1:0] parity_vec; // 224-bit registered parity results
|
logic [M_BASE*Z-1:0] parity_vec; // 224-bit registered parity results
|
||||||
@@ -165,14 +170,15 @@ module ldpc_decoder_core #(
|
|||||||
case (state)
|
case (state)
|
||||||
IDLE: if (start) state_next = INIT;
|
IDLE: if (start) state_next = INIT;
|
||||||
INIT: state_next = LAYER_READ;
|
INIT: state_next = LAYER_READ;
|
||||||
LAYER_READ: if (col_idx == N_BASE - 1) state_next = CN_UPDATE;
|
LAYER_READ: if (col_idx == N_BASE - 1) state_next = CN_STAGE1;
|
||||||
CN_UPDATE: state_next = LAYER_WRITE;
|
CN_STAGE1: state_next = CN_STAGE2;
|
||||||
|
CN_STAGE2: state_next = LAYER_WRITE;
|
||||||
LAYER_WRITE: begin
|
LAYER_WRITE: begin
|
||||||
if (col_idx == N_BASE - 1) begin
|
if (col_idx == N_BASE - 1) begin
|
||||||
if (row_idx == M_BASE - 1)
|
if (row_idx == M_BASE - 1)
|
||||||
state_next = SYNDROME_S1;
|
state_next = SYNDROME_S1;
|
||||||
else
|
else
|
||||||
state_next = LAYER_READ; // next row
|
state_next = LAYER_READ;
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
SYNDROME_S1: state_next = SYNDROME_S2;
|
SYNDROME_S1: state_next = SYNDROME_S2;
|
||||||
@@ -183,7 +189,7 @@ module ldpc_decoder_core #(
|
|||||||
else if (iter_cnt >= effective_max_iter)
|
else if (iter_cnt >= effective_max_iter)
|
||||||
state_next = DONE;
|
state_next = DONE;
|
||||||
else
|
else
|
||||||
state_next = LAYER_READ; // next iteration
|
state_next = LAYER_READ;
|
||||||
end
|
end
|
||||||
DONE: if (!start) state_next = IDLE;
|
DONE: if (!start) state_next = IDLE;
|
||||||
default: state_next = IDLE;
|
default: state_next = IDLE;
|
||||||
@@ -269,43 +275,86 @@ module ldpc_decoder_core #(
|
|||||||
col_idx <= col_idx + 1;
|
col_idx <= col_idx + 1;
|
||||||
end
|
end
|
||||||
|
|
||||||
CN_UPDATE: begin
|
// =============================================================
|
||||||
// Min-sum update for all Z check nodes in current row
|
// CN Pipeline Stage 1: Extract signs/mags, find min1/min2
|
||||||
// Each CN has DC=8 incoming messages (one per column)
|
// =============================================================
|
||||||
|
CN_STAGE1: begin
|
||||||
for (int z = 0; z < Z; z++) begin
|
for (int z = 0; z < Z; z++) begin
|
||||||
// Min-sum: pass individual VN->CN messages directly
|
logic [DC-1:0] signs_w;
|
||||||
cn_min_sum(vn_to_cn[0][z], vn_to_cn[1][z],
|
logic sign_xor_w;
|
||||||
vn_to_cn[2][z], vn_to_cn[3][z],
|
logic [Q-2:0] mags_w [DC];
|
||||||
vn_to_cn[4][z], vn_to_cn[5][z],
|
logic [Q-2:0] min1_w, min2_w;
|
||||||
vn_to_cn[6][z], vn_to_cn[7][z],
|
int min1_idx_w;
|
||||||
cn_to_vn[0][z], cn_to_vn[1][z],
|
|
||||||
cn_to_vn[2][z], cn_to_vn[3][z],
|
sign_xor_w = 1'b0;
|
||||||
cn_to_vn[4][z], cn_to_vn[5][z],
|
for (int i = 0; i < DC; i++) begin
|
||||||
cn_to_vn[6][z], cn_to_vn[7][z]);
|
logic [Q-1:0] abs_val;
|
||||||
|
signs_w[i] = vn_to_cn[i][z][Q-1];
|
||||||
|
if (vn_to_cn[i][z][Q-1]) begin
|
||||||
|
abs_val = ~vn_to_cn[i][z] + 1'b1;
|
||||||
|
mags_w[i] = (abs_val[Q-1]) ? {(Q-1){1'b1}} : abs_val[Q-2:0];
|
||||||
|
end else begin
|
||||||
|
mags_w[i] = vn_to_cn[i][z][Q-2:0];
|
||||||
end
|
end
|
||||||
col_idx <= '0; // prepare for LAYER_WRITE
|
sign_xor_w = sign_xor_w ^ signs_w[i];
|
||||||
end
|
end
|
||||||
|
|
||||||
|
min1_w = {(Q-1){1'b1}};
|
||||||
|
min2_w = {(Q-1){1'b1}};
|
||||||
|
min1_idx_w = 0;
|
||||||
|
for (int i = 0; i < DC; i++) begin
|
||||||
|
if (mags_w[i] < min1_w) begin
|
||||||
|
min2_w = min1_w;
|
||||||
|
min1_w = mags_w[i];
|
||||||
|
min1_idx_w = i;
|
||||||
|
end else if (mags_w[i] < min2_w) begin
|
||||||
|
min2_w = mags_w[i];
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
s1_signs[z] = signs_w;
|
||||||
|
s1_sign_xor[z] = sign_xor_w;
|
||||||
|
s1_min1[z] = min1_w;
|
||||||
|
s1_min2[z] = min2_w;
|
||||||
|
s1_min1_idx[z] = min1_idx_w[2:0];
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
// =============================================================
|
||||||
|
// CN Pipeline Stage 2: Compute extrinsic outputs + pre-register
|
||||||
|
// first LAYER_WRITE shift value
|
||||||
|
// =============================================================
|
||||||
|
CN_STAGE2: begin
|
||||||
|
for (int z = 0; z < Z; z++) begin
|
||||||
|
for (int j = 0; j < DC; j++) begin
|
||||||
|
logic [Q-2:0] mag_out;
|
||||||
|
logic sign_out;
|
||||||
|
|
||||||
|
mag_out = (j[2:0] == s1_min1_idx[z]) ? s1_min2[z] : s1_min1[z];
|
||||||
|
mag_out = (mag_out > 5'd1) ? (mag_out - 5'd1) : 5'd0;
|
||||||
|
sign_out = s1_sign_xor[z] ^ s1_signs[z][j];
|
||||||
|
|
||||||
|
cn_to_vn[j][z] <= sign_out ? (~{1'b0, mag_out} + 1'b1) : {1'b0, mag_out};
|
||||||
|
end
|
||||||
|
end
|
||||||
|
col_idx <= '0;
|
||||||
|
end
|
||||||
|
|
||||||
|
// =============================================================
|
||||||
|
// LAYER_WRITE: Write beliefs and update CN->VN messages
|
||||||
|
// =============================================================
|
||||||
LAYER_WRITE: begin
|
LAYER_WRITE: begin
|
||||||
// Write back: update beliefs and store new CN->VN messages
|
|
||||||
// Skip unconnected columns (H_BASE == -1)
|
|
||||||
if (H_BASE[row_idx][col_idx] >= 0) begin
|
if (H_BASE[row_idx][col_idx] >= 0) begin
|
||||||
for (int z = 0; z < Z; z++) begin
|
for (int z = 0; z < Z; z++) begin
|
||||||
int bit_idx;
|
|
||||||
int shifted_z;
|
int shifted_z;
|
||||||
logic signed [Q-1:0] new_msg;
|
int bit_idx;
|
||||||
logic signed [Q-1:0] old_extrinsic;
|
|
||||||
|
|
||||||
shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
|
shifted_z = (z + H_BASE[row_idx][col_idx]) % Z;
|
||||||
bit_idx = int'(col_idx) * Z + shifted_z;
|
bit_idx = int'(col_idx) * Z + shifted_z;
|
||||||
new_msg = cn_to_vn[col_idx][z];
|
|
||||||
old_extrinsic = vn_to_cn[col_idx][z];
|
|
||||||
|
|
||||||
// belief = extrinsic (VN->CN) + new CN->VN message
|
beliefs[bit_idx] <= sat_add(vn_to_cn[col_idx][z],
|
||||||
beliefs[bit_idx] <= sat_add(old_extrinsic, new_msg);
|
cn_to_vn[col_idx][z]);
|
||||||
|
msg_cn2vn[row_idx][col_idx][z] <= cn_to_vn[col_idx][z];
|
||||||
// Store new message for next iteration
|
|
||||||
msg_cn2vn[row_idx][col_idx][z] <= new_msg;
|
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
@@ -386,78 +435,7 @@ module ldpc_decoder_core #(
|
|||||||
end
|
end
|
||||||
|
|
||||||
// =========================================================================
|
// =========================================================================
|
||||||
// Min-sum CN update function
|
// Saturating arithmetic (Yosys-compatible)
|
||||||
// =========================================================================
|
|
||||||
|
|
||||||
// Offset min-sum for DC=8 inputs (individual ports for iverilog compatibility)
|
|
||||||
// For each output j: sign = XOR of all other signs, magnitude = min of all other magnitudes - offset
|
|
||||||
task automatic cn_min_sum(
|
|
||||||
input logic signed [Q-1:0] in0, in1, in2, in3,
|
|
||||||
in4, in5, in6, in7,
|
|
||||||
output logic signed [Q-1:0] out0, out1, out2, out3,
|
|
||||||
out4, out5, out6, out7
|
|
||||||
);
|
|
||||||
logic signed [Q-1:0] ins [DC];
|
|
||||||
logic [DC-1:0] signs;
|
|
||||||
logic [Q-2:0] mags [DC];
|
|
||||||
logic sign_xor;
|
|
||||||
logic [Q-2:0] min1, min2;
|
|
||||||
int min1_idx;
|
|
||||||
logic signed [Q-1:0] outs [DC];
|
|
||||||
|
|
||||||
ins[0] = in0; ins[1] = in1; ins[2] = in2; ins[3] = in3;
|
|
||||||
ins[4] = in4; ins[5] = in5; ins[6] = in6; ins[7] = in7;
|
|
||||||
|
|
||||||
// Extract signs and magnitudes
|
|
||||||
// Note: -32 (100000) has magnitude 32 which overflows 5-bit field to 0.
|
|
||||||
// Clamp to 31 (max representable magnitude) to avoid corruption.
|
|
||||||
sign_xor = 1'b0;
|
|
||||||
for (int i = 0; i < DC; i++) begin
|
|
||||||
logic [Q-1:0] abs_val;
|
|
||||||
signs[i] = ins[i][Q-1];
|
|
||||||
if (ins[i][Q-1]) begin
|
|
||||||
abs_val = ~ins[i] + 1'b1;
|
|
||||||
// If abs_val overflowed (input was most negative), clamp
|
|
||||||
mags[i] = (abs_val[Q-1]) ? {(Q-1){1'b1}} : abs_val[Q-2:0];
|
|
||||||
end else begin
|
|
||||||
mags[i] = ins[i][Q-2:0];
|
|
||||||
end
|
|
||||||
sign_xor = sign_xor ^ signs[i];
|
|
||||||
end
|
|
||||||
|
|
||||||
// Find two smallest magnitudes
|
|
||||||
min1 = {(Q-1){1'b1}};
|
|
||||||
min2 = {(Q-1){1'b1}};
|
|
||||||
min1_idx = 0;
|
|
||||||
for (int i = 0; i < DC; i++) begin
|
|
||||||
if (mags[i] < min1) begin
|
|
||||||
min2 = min1;
|
|
||||||
min1 = mags[i];
|
|
||||||
min1_idx = i;
|
|
||||||
end else if (mags[i] < min2) begin
|
|
||||||
min2 = mags[i];
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
// Compute extrinsic outputs with offset correction
|
|
||||||
for (int j = 0; j < DC; j++) begin
|
|
||||||
logic [Q-2:0] mag_out;
|
|
||||||
logic sign_out;
|
|
||||||
|
|
||||||
mag_out = (j == min1_idx) ? min2 : min1;
|
|
||||||
// Offset correction (subtract 1 in integer representation)
|
|
||||||
mag_out = (mag_out > 1) ? (mag_out - 1) : {(Q-1){1'b0}};
|
|
||||||
sign_out = sign_xor ^ signs[j];
|
|
||||||
|
|
||||||
outs[j] = sign_out ? (~{1'b0, mag_out} + 1) : {1'b0, mag_out};
|
|
||||||
end
|
|
||||||
|
|
||||||
out0 = outs[0]; out1 = outs[1]; out2 = outs[2]; out3 = outs[3];
|
|
||||||
out4 = outs[4]; out5 = outs[5]; out6 = outs[6]; out7 = outs[7];
|
|
||||||
endtask
|
|
||||||
|
|
||||||
// =========================================================================
|
|
||||||
// Saturating arithmetic helpers (Yosys-compatible: no return, no complex concat)
|
|
||||||
// =========================================================================
|
// =========================================================================
|
||||||
|
|
||||||
function automatic logic signed [Q-1:0] sat_add(
|
function automatic logic signed [Q-1:0] sat_add(
|
||||||
|
|||||||
Reference in New Issue
Block a user