Skip to content

Instantly share code, notes, and snippets.

@pdp7
Last active May 15, 2026 13:40
Show Gist options
  • Select an option

  • Save pdp7/d21b7b796b075b4ba3b95441cf86a3d8 to your computer and use it in GitHub Desktop.

Select an option

Save pdp7/d21b7b796b075b4ba3b95441cf86a3d8 to your computer and use it in GitHub Desktop.
sashiko-v4-findings-handoff.md

Sashiko v4 review findings: verification and fix plan

Handoff document for working on fixes to v4 of the Ssqosid/CBQRI/RQSC patch series. Self-contained, no external memory required.

Current status (2026-05-14)

Fixes folded into b4/ssqosid-cbqri-rqsc (the v5-prep branch in the main worktree). Cover commit carries the v4-to-v5 changelog; b4 sees revision 5 with base ef5f46b63023. Series stays at 18 patches, builds clean under ARCH=riscv LLVM=1 W=1, every modified commit passes checkpatch.pl --strict (cover-letter commit trips the known b4-tracking-marker false positive, which is excluded per repo convention).

Reviews covered 10 of 18 patches: P03, P07, P08, P09, P11, P12, P13, P15, P16, P18. Reply drafts written for all 10 in /home/pdp7/dev/linux/v5-replies/. P16 and P18 arrived later than the initial batch; their findings are folded into v5 the same way.

sashiko.dev exposes reviews for all 18 patches (the mailing list only saw the 10 with findings). The other 8: P01/P02/P17 are clean; P04/ P05/P06/P10 had intermediate concerns dismissed in the multi-stage pipeline; P14 failed all four review attempts with a tool timeout and has no LLM coverage at all. The P10/P14 manual follow-up below covers the gap.

All real-and-meaningful findings have been addressed except the two documented limitations:

  • P12 #4 / P15 #3 monitor-only CCs and BW gated on L3 OCCUP, blocked on the same cbqri_l3_mon_dom decoupling refactor.
  • P13 #1 group-removal MB_MIN reset, blocked on a new resctrl_arch_* callback in fs/resctrl.

P10/P14 manual follow-up (2026-05-14)

The mailing list mbox is missing reviews on P01/P02/P04/P05/P06/P10/ P14/P17 (sashiko.dev marked their email as Skipped or Failed). Manual review of the two interesting patches:

  • P10 (BC mon device ops) sashiko raised three internal concerns dismissed in stage gating: (a) UAF on bc->mbm_total_states early publish, already closed by the P15 #1 ordering fix; (b) lockless list traversal in cbqri_find_only_mon_bc(), already closed by P15 #2 plus the function being attach-time only and called under cbqri_domain_list_lock; (c) "counter snapshot during RMID reset causes massive bandwidth spike" — false positive. Per CBQRI Table 10, CONFIG_EVENT with a non-zero EVT_ID resets the hardware counter to 0, which matches the driver's prev_ctr=0 on the software side. The reset_rmid path at cbqri_resctrl.c:281-287 already documents this.

  • P14 (MB_WGHT) real bug found: the P09 #2 cache-staging refactor kcalloc-zeroes mweight_cache, and cbqri_apply_bc_field() reads both halves of bc_bw_alloc on every CONFIG_LIMIT. The first MB_MIN domain init therefore commits Mweight=0 to every RCID. Per CBQRI 4.5 a weight of 0 is a hard cap, no opportunistic bandwidth allowed — so every RCID is starved at its Rbwb reservation (typically 1 bwblk for non-zero RCIDs) for the ~32ms window until MB_WGHT domain init catches up. Fix: pre-seed mweight_cache[*] = FIELD_MAX(MWEIGHT_MASK) in cbqri_probe_bc(). Folded into the P09 commit ("riscv_cbqri: Add bandwidth controller probe and allocation device ops"), cbqri_devices.c:774.

Series context

  • Cover msg-id (v4 send): 20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org
  • Local mbox (v4 send): v4-mbox/20260510-...mbx
  • Branch (v4 send, archived): tt-fustini/ssqosid-cbqri-rqsc-v4
  • Branch (v5 prep, current): b4/ssqosid-cbqri-rqsc
  • HEAD as of update: f3ba981889e6 (P18 "riscv: enable resctrl filesystem for Ssqosid")
  • Pre-fixes backup: refs/backup/b4-ssqosid-cbqri-rqsc-pre-fixes (bb7fce761c9d)
  • Patch count: 18 (cover + 18 patches)

Series structure:

  1. dt-bindings: riscv: Add Ssqosid extension
  2. riscv: detect the Ssqosid extension
  3. riscv: add support for srmcfg CSR from Ssqosid
  4. fs/resctrl: Add resctrl_is_membw() helper
  5. fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
  6. fs/resctrl: Let bandwidth resources default to min_bw at reset
  7. riscv_cbqri: Add capacity controller probe and allocation device ops
  8. riscv_cbqri: Add capacity controller monitoring device ops
  9. riscv_cbqri: Add bandwidth controller probe and allocation device ops
  10. riscv_cbqri: Add bandwidth controller monitoring device ops
  11. riscv_cbqri: resctrl: Add cache allocation via capacity block mask
  12. riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
  13. riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
  14. riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
  15. riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
  16. ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table
  17. ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC)
  18. riscv: enable resctrl filesystem for Ssqosid

Sashiko-bot review coverage

Sashiko reviewed 8 of 18 patches: P03, P07, P08, P09, P11, P12, P13, P15. P15 review arrived later (2026-05-12 21:26 UTC) than the initial batch and is captured separately in the local mbox file sash15.

No coverage on P01, P02, P04, P05, P06, P10 (BC mon), P14 (MB_WGHT), P16, P17, P18.

Coverage gap to note:

  • P10 shares cbqri_probe_feature / EVT_ID issue surface with P09, and the FIELD_MODIFY/GENMASK hazard with P08.
  • P14 shares cache-vs-HW divergence shape with P09/P13.

Worth eyeballing P10/P14 manually before responding to the list.

Verification verdicts (all 38 findings)

Format: [Severity per sashiko] short title -> verdict | file:line references | suggested fix.

P03: riscv: add support for srmcfg CSR from Ssqosid

  1. [High] cpu_srmcfg early-boot bypass -> REAL. arch/riscv/kernel/qos.c:12 zero-inits cpu_srmcfg in .bss. riscv_srmcfg_init() at line 30 (arch_initcall) seeds U32_MAX, but any context switches before that run with cached 0 matching thread.srmcfg=0, skipping the CSR write at arch/riscv/include/asm/qos.h:32. Hardware CSR retains implementation-defined boot value. Fix: DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX; at qos.c:12.

  2. [High] RCID/MCID conflation -> REAL. arch/riscv/include/asm/qos.h:29 tests the full thread.srmcfg word for zero. A task in default RCID with specific MCID has thread.srmcfg != 0 and bypasses cpu_srmcfg_default override. x86 evaluates tsk->closid and tsk->rmid independently against per-CPU defaults at arch/x86/include/asm/resctrl.h:100-128. Fix: extract RCID and MCID fields via FIELD_GET(SRMCFG_RCID_MASK) and FIELD_GET(SRMCFG_MCID_MASK), compare each against RESCTRL_RESERVED_CLOSID = 0 / RESCTRL_RESERVED_RMID = 0, and substitute from cpu_srmcfg_default per field.

  3. [High] Boot CPU stale across suspend -> REAL. riscv_srmcfg_online() cpuhp callback at qos.c:24 only fires on CPU online. Boot CPU never offlines during suspend, so its cpu_srmcfg cache retains pre-suspend value. Hardware CSR may have reset on resume (Ssqosid v1.0 leaves it implementation-defined). Next __switch_to_srmcfg sees cache match, skips write. Fix: register a CPU PM notifier to invalidate cpu_srmcfg to U32_MAX on CPU_PM_ENTER / resume.

  4. [Medium] Redundant seed loop -> REAL minor. qos.c:39 for_each_online_cpu seeds U32_MAX, then line 42 cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, ...) invokes the same callback on every already-online CPU. Race claim is stretched (both writes set the same sentinel) but the loop is dead code. Fix: drop the loop.

  5. [Medium] Initcall returns positive hotplug state ID -> FALSE POSITIVE. do_one_initcall() in init/main.c ignores positive returns. No warning is logged. Cleanup: explicit return 0 is nicer.

P07: riscv_cbqri: Add capacity controller probe and allocation device ops

  1. [High] controller_destroy() leaks I/O mapping -> REAL. drivers/resctrl/cbqri_devices.c:897-902: only kfree(ctrl->...), missing iounmap(ctrl->base) and release_mem_region(ctrl->addr, ctrl->size). riscv_cbqri_unregister_last() at line 914 invokes destroy on a successfully-probed controller. Fix: add iounmap + release_mem_region in destroy.

  2. [High] Stale RCID in AT=CODE secondary probe -> REAL. cbqri_probe_feature() at cbqri_devices.c:501: first probe at line 527 clears RCID, but the AT secondary at line 548 restores saved_reg without re-clearing RCID before writing. Fix: add reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK; after the FIELD_MODIFY of AT (line 551).

  3. [High] BLOCK_MASK no pre-clear before READ_LIMIT capture -> REAL. cbqri_apply_cache_config() at cbqri_devices.c:249-255: captures saved_cbm via READ_LIMIT without pre-clearing the staging register. Silent firmware no-op leaves stale data in saved_cbm, later revert (line 287-290) writes garbage. cbqri_read_cache_config() at line 335 does the pre-clear dance; the writer path doesn't. Fix: cbqri_set_cbm(ctrl, 0); before line 250.

  4. [High] cache_size from offline CPUs during early boot -> REAL mild. cbqri_probe_cc() at cbqri_devices.c:618-630 resolves cache_size via cpu_online_mask. ACPI parse runs at subsys_initcall so SMP is up by then; defensive code at line 647-651 declines mon_capable when cache_size=0. Real edge case is maxcpus= boot param restricting initial bringup; cache_size stays 0 even after CPUs come online later. Fix: add CPU online callback to retry cache_size resolution and late-enable mon_capable if it succeeds.

  5. [Medium] rcid_count/mcid_count bounds check -> FALSE POSITIVE. Already capped at drivers/acpi/riscv/rqsc.c:71-79. Future DT path would need its own check. Defensive add at register_controller() would prevent regression from future discovery layers.

P08: riscv_cbqri: Add capacity controller monitoring device ops

  1. [High] RV32 GENMASK 64-bit corruption -> FALSE POSITIVE. RISCV_ISA_SSQOSID requires 64BIT per arch/riscv/Kconfig:596, so RV32 is unreachable. Macro hygiene cleanup worth doing for portability. Optional fix: change CBQRI_MON_CTL_OP_MASK etc. at drivers/resctrl/cbqri_internal.h:79-82 to GENMASK_ULL.

  2. [High] cbqri_probe_feature misapplied to MON_CTL -> REAL (partial). The EVT_ID concern is real: probe writes operation without clearing EVT_ID at cbqri_devices.c:526-528. Bootloader junk in EVT_ID can cause false "monitoring not supported" detection. The reserved-bits concern (AT field write into a no-AT register) is cosmetic WPRI per RISC-V conventions. Fix: clear EVT_ID in addition to RCID at line 527 (cross-cuts to P09 #4).

  3. [Medium] Secondary AT=CODE doesn't re-clear RCID -> REAL. Same root cause as P07 #2, applied to MON_CTL (which uses the field-position as MCID). Fix is the same.

P09: riscv_cbqri: Add bandwidth controller probe and allocation device ops

  1. *[High] verify-fail leaves committed=true -> REAL (defensible). cbqri_apply_bc_field() at cbqri_devices.c:384-385 sets *committed = true BEFORE verify; if verify fails at line 400-404 returning -EIO, caller at line 451-452 still caches the rejected value. Comment at line 380-383 defends this as "hardware probably has val even if our verify failed". Pessimistic case: cache out of sync. Fix: move *committed = true to after verify succeeds, or change the comment to reflect the optimistic assumption explicitly.

  2. [High] Initial READ_LIMIT no sentinel pre-clear -> REAL. cbqri_apply_bc_field() at line 370: READ_LIMIT loads staging from closid's state. Silent firmware no-op leaves stale RCID fields. set() only modifies one field; CONFIG_LIMIT commits the stale unmodified field, cross-RCID corruption. Verify-side at line 393 uses set(ctrl, ~val) sentinel; read-side should do equivalent before line 370. Fix: pre-write a sentinel for the other field before READ_LIMIT. Tricky because we don't know which field; might be easier to pin the other field to its known cached value via set_other(cached).

  3. [High] Latched faulted permanently bricks controller -> REAL. cbqri_wait_busy_flag() at cbqri_devices.c:85 latches ctrl->faulted = true. Line 93 clears on success but runtime ops at line 112/160/199 early-return BEFORE the wait, so the wait that could clear the fault is never reached. Fix: drop the early-return on ctrl->faulted. Or, give it an escape valve: allow one retry on faulted controllers, clearing faulted only if the retry succeeds.

  4. [Medium] BC monitor probe missing EVT_ID clear -> REAL. Same as P08 #2 against BC. Fixed by the same cbqri_probe_feature refactor.

  5. [Medium] mweight bounds check -> REAL. cbqri_apply_rbwb() at line 421-422 rejects rbwb > U16_MAX. cbqri_apply_mweight_config() at line 458-468 has no bound. The 8-bit MWEIGHT_MASK = GENMASK(27, 20) silently truncates values

    255, then verify mismatches and returns -EIO instead of the expected -EINVAL. Fix: if (mweight > 0xFF) return -EINVAL; at line 461.

P11: riscv_cbqri: resctrl: Add cache allocation via capacity block mask

  1. [High] CDP_NUM_TYPES iteration on non-CDP -> DOWNGRADE (Low). cbqri_resctrl.c:738-746: comment claims "CDP_NUM_TYPES is 1 on non-CDP controllers" but CDP_NUM_TYPES is a compile-time constant = 3 (per include/linux/resctrl.h:76). Loop writes 3x per RCID. cbqri_cc_alloc_op() at cbqri_devices.c:127-129 masks AT to 0 on non-AT controllers, so no hardware fault. The waste is real (~3x MMIO per RCID at domain init) and the comment is wrong. Fix: rewrite the loop to iterate exactly the number of distinct slots needed (1 for non-CDP, 2 for CDP), or drop the iteration and let the AT-mirror in cbqri_apply_cache_config handle CODE/DATA. Update or delete the misleading comment.

  2. [High] rcid_count vs 4096 truncation -> FALSE POSITIVE. CBQRI_MAX_RCID = 1024 per include/linux/riscv_cbqri.h:25 is capped at drivers/acpi/riscv/rqsc.c:71-73, well below SRMCFG_RCID_MASK = GENMASK(11,0) (4095).

  3. [Medium] exposed_alloc_capable cleanup on init failure -> REAL minor. cbqri_resctrl_setup() at cbqri_resctrl.c:1253 sets cbqri_resctrl_inited = true only AFTER resctrl_init() succeeds. On failure path, cbqri_resctrl_teardown() early-returns at line 1234 because inited=false, leaving exposed_alloc_capable = true. Practically unobservable (resctrl FS not mounted on failure). Fix: set inited=true earlier, or reset exposed_alloc_capable directly in the failure paths.

  4. [Medium] wait_event between initcalls -> FALSE POSITIVE. device_initcall_sync(__cacheinfo_ready) runs at level 6 strictly before late_initcall(cbqri_arch_late_init) at level 7. By the time wait_event(wait_cacheinfo_ready, cacheinfo_ready) runs at cbqri_resctrl.c:1259, cacheinfo_ready is already true. Cleanup: drop the wait_event and the producer initcall entirely.

  5. [Medium] synchronize_rcu inside hotplug callback -> FALSE POSITIVE. list_del_rcu + synchronize_rcu is the same pattern used in arch/x86/kernel/cpu/resctrl/core.c for domain teardown. Upstream-blessed.

  6. [Medium] cbqri_domain_list_lock redundant -> REAL but defensible. cpuhp serializes online/offline, but the mutex provides lockdep_assert_held in helpers at line 1032 and 1193. Keep for the lockdep value.

  7. [Low] Kconfig linker hazard -> FALSE POSITIVE. arch/riscv/Kconfig:599 select RISCV_CBQRI_DRIVER if RESCTRL_FS under RISCV_ISA_SSQOSID makes (SSQOSID=y, RESCTRL_FS=y, CBQRI_DRIVER=n) unreachable.

  8. [Low] kzalloc_obj doesn't exist -> FALSE POSITIVE. Macro defined at include/linux/slab.h:1039, widely used in kunit/iucv/net.

P12: riscv_cbqri: resctrl: Add L3 cache occupancy monitoring

  1. [High] AB-BA deadlock on offline -> REAL High. cbqri_detach_cpu_from_l3_mon() at cbqri_resctrl.c:1199 runs under cpus_write_lock (cpuhp) and calls cancel_delayed_work_sync(&mon_dom->cqm_limbo). cqm_handle_limbo() worker at fs/resctrl/monitor.c:797 starts with cpus_read_lock(). Worker blocks waiting for hotplug, hotplug blocks waiting for worker. fs/resctrl itself uses non-_sync cancel_delayed_work at rdtgroup.c:4353 to dodge this. Fix: switch to cancel_delayed_work (non-_sync).

  2. [High] cancel_delayed_work_sync on zeroed mbm_over -> REAL High. INIT_DELAYED_WORK(&d->mbm_over, ...) at rdtgroup.c:4454-4458 only fires when resctrl_is_mbm_enabled(). CBQRI L3 OCCUP without a paired BC leaves mbm_over zero-initialized. Sync-cancel on a zeroed timer/work struct can panic in del_timer_sync. Fix: guard the cancel on resctrl_is_mbm_enabled() or check mon_dom->mbm_over.work.func is non-NULL.

  3. [High] Hotplug rollback leak -> REAL High. cbqri_attach_cpu_to_cap_ctrl() at cbqri_resctrl.c:1135 does cpumask_set_cpu BEFORE cbqri_attach_cpu_to_l3_mon() at line 1143. If l3_mon attach fails, error returns at line 1145 without removing the cpu from the mask or freeing the newly-allocated ctrl_domain. cpuhp doesn't invoke this state's offline on a failed online. Fix: undo the cpumask_set (and possibly destroy a freshly-created domain) on the error path.

  4. [High] Monitoring-only controllers ignored -> REAL High. cbqri_resctrl_online_cpu() at cbqri_resctrl.c:1307-1308 skips on !ctrl->alloc_capable. Mon-only controllers never get attached and no mon domain is created. Fix: also attempt attach when ctrl->mon_capable is set, even if alloc_capable is false. Split or extend the conditional.

  5. [High] resctrl_arch_reset_rmid never called for L3 OCCUP -> REAL mild. mon_event_read(..., first=true) at fs/resctrl/rdtgroup.c:3335 is gated on resctrl_is_mbm_event(). L3 OCCUP always gets first=false. The commit message claim that resctrl_arch_reset_rmid() re-arms on recycle is wrong, but functional impact is mild because cbqri_init_mon_counters() pre-configures the slot and OCCUP is a current-state read (not accumulator). Fix: rewrite the commit message to be accurate; optionally remove resctrl_arch_reset_rmid() impl for OCCUP path since it's dead.

  6. [Medium] resctrl_arch_rmid_read -EIO on nohz_full -> REAL High. fs/resctrl/ctrlmondata.c dispatches via smp_call_function_any(..., 1) in hardirq context for nohz_full CPUs. CBQRI's if (irqs_disabled()) return -EIO; at cbqri_resctrl.c:312 rejects unconditionally. Reason CBQRI rejects: cbqri_mon_op() takes a sleeping mutex. Fix options: (a) document that CBQRI doesn't support nohz_full monitoring; (b) provide a non-sleeping path with bounded busy-wait (CBQRI ops complete in ~1ms); (c) defer to a workqueue and return -EAGAIN with a hint.

  7. [Medium] Duplicate CONFIG_EVENT MMIO -> REAL minor. resctrl_online_mon_domain() internally calls resctrl_arch_reset_rmid_all() (at cbqri_resctrl.c:284-294) which writes CONFIG_EVENT for OCCUP+MBM_TOTAL for every MCID. cbqri_init_mon_counters() at cbqri_devices.c:811-820 writes the same CONFIG_EVENT/OCCUP for every MCID immediately after. Each op is ~1ms; ~2x the latency at attach. Fix: drop cbqri_init_mon_counters() (the reset_rmid_all path covers it), OR drop reset_rmid_all's per-MCID work and rely on init_mon_counters. The former is simpler.

  8. [Low] cbqri_resctrl_counters[] dead -> FALSE POSITIVE. Array IS read at cbqri_resctrl.c:849 to decide whether to enable QOS_L3_MBM_TOTAL_EVENT_ID.

P13: riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb

  1. [High] Group-removal leaks MB_MIN allocation -> REAL High. closid_free() at fs/resctrl/rdtgroup.c:209 just frees the bitmap bit. No arch hook to reset state. rbwb_cache[freed_closid] retains its old value and contaminates the sum-check in cbqri_apply_rbwb() at cbqri_devices.c:426-440 (note: that loop skips its OWN closid but includes all others, so the stale value inflates the sum for others). Fix: hook into group destroy. Either:

    • Add a resctrl_arch_reset_default(closid, rid) hook that core calls on group destroy, and implement it for CBQRI.
    • Or, walk-and-reset rbwb_cache for unallocated closids before each sum check.
    • Or, on group destroy, have fs/resctrl call resctrl_arch_update_one with the default before freeing.
  2. [High] Multi-BC per NUMA silently ignored -> REAL. struct cbqri_resctrl_dom at cbqri_resctrl.c: carries a single hw_ctrl pointer. cbqri_attach_cpu_to_one_bw_res() at line 1151 finds the existing domain by prox_dom and just adds the cpu; the second BC sharing the same prox_dom is dropped. Unusual hardware, but silent. Fix options: (a) reject the second BC at register time with a pr_warn; (b) extend struct cbqri_resctrl_dom to track multiple hw_ctrl pointers and broadcast applies.

  3. [High] MRBWB underflow to 0 -> FALSE POSITIVE. cbqri_probe_bc() at cbqri_devices.c:700-704 rejects mrbwb < rcid_count, so mrbwb - (rcid_count - 1) >= 1 is guaranteed.

  4. [Low] Reset-loop comment -> REAL minor. Comment at cbqri_resctrl.c:631-635 says "intermediate sum may exceed MRBWB". Actually the hardware sum strictly decreases during the walk. The reason check_sum=false is needed is that the cache-based sum check at cbqri_devices.c:426-440 may transiently fail when neighbor cache values are still inflated. Fix: reword the comment.

P15: riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring

(Late review, arrived 2026-05-12 21:26 UTC. Local mbox at sash15.)

  1. [High] Use-After-Free / NULL pointer dereference due to premature domain exposure and TOCTOU race -> REAL High. cbqri_attach_cpu_to_l3_mon() calls resctrl_online_mon_domain() before cbqri_init_bc_mon_counters(). If BC init fails partway through, it frees bc->mbm_total_states and NULLs it while a concurrent sysfs reader has already passed the WARN_ON_ONCE(!bc->mbm_total_states) check and is waiting on bc->lock. When the reader gets the lock it dereferences NULL. Fix: reorder so BC init runs before domain online.

  2. [High] List traversal race leading to Use-After-Free during CPU hotplug -> REAL High. cbqri_find_ctrl_domain() walks ctrl_domains with plain list_for_each_entry. The offline path uses list_del_rcu + synchronize_rcu but the readers don't hold rcu_read_lock, so synchronize_rcu doesn't actually wait for them. fs/resctrl protects both sides with rdtgroup_mutex; CBQRI's offline runs from cpuhp without it. Fix: take cbqri_domain_list_lock around the reader walks, drop the RCU on the writer side.

  3. [Medium] Bandwidth monitoring is silently disabled if the L3 capacity controller lacks occupancy monitoring -> REAL but coupled with P12 #4. The resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID) block sits inside if (ctrl->mon_capable) where ctrl is the L3 CC. A system with a mon-capable BC but mon-incapable L3 CC loses mbm_total_bytes entirely. Same underlying coupling as P12 #4 (mon-only-CC requires decoupling mon_domain from L3 CC's hw_ctrl). Deferred to the same follow-up.

  4. [Medium] Severe bandwidth overcounting on platforms with multiple L3 domains due to lack of topology matching -> REAL. The single paired BC pairs with every L3 domain, so userspace tools summing across L3 domains see Nx the real bandwidth. Cover letter acknowledges the assumption but doesn't gate on actual L3 domain count. Fix: count distinct L3 cache_ids in cbqri_resctrl_pick_counters() and refuse to expose mbm_total_bytes when > 1.

P16: ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table

(Late review, arrived 2026-05-12 21:46 UTC. Local mbox at sash16. Spec at riscv-specs/riscv-rqsc.pdf, RQSC v0.9.2 stable.)

  1. [High] Parser ignores rqsc->num field, may fail fatally on trailing padding -> Defensive / discard. Loop bound by node < end, not rqsc->num. Spec does not document trailing padding allowance. Current strict-abort behaviour is defensible; no current firmware ships padded tables.

  2. [High] Unvalidated resource descriptor types -> REAL, addressed in v5. RQSC v0.9.2 Tables 4-6 define Resource Type (Cache/Memory) and Resource ID Type (Processor Cache / Memory Range / etc.) fields. v4 consumed res0->id1 based only on controller type, blindly trusting res[0] was the matching primary resource. v5 validates res0->type and res0->id_type per controller type, warning and skipping mismatches.

  3. [High] node->flags ignored -> FALSE POSITIVE. Sashiko assumed standard ACPI bit 0 = enabled convention. RQSC v0.9.2 Table 3 defines bit 0 as "supports allocating zero capacity blocks or zero bandwidth blocks to an RCID" — a capability hint, not an enabled bit. RQSC defines no enabled flag at all. Current parser correctly ignores flags.

  4. [High] Hardcoded sizeof(*node) resource offset / no revision check -> REAL, addressed in v5. v4 had no revision check; future RQSC revisions extending acpi_rqsc_node would invalidate hardcoded sizeof(*node) offsets. v5 rejects tables with revision != 1 (the value from RQSC v0.9.2 Table 1) with -ENOTSUPP at function entry.

  5. [Medium] Missing validation of node->length against table boundary -> Defensive / discard. Malformed node->length overshooting end is silently terminated by the loop. Every byte the parser reads is bound-checked, so no UAF or OOB read. ACPI tables come from signed firmware. Per the false-positive guide, defensive against trusted firmware does not warrant a check.

P18: riscv: enable resctrl filesystem for Ssqosid

(Late review, arrived 2026-05-12 22:56 UTC. Local mbox at sash18.)

  1. [Medium] Commit message inaccuracy and unnecessary conditional select -> REAL, addressed in v5. The v4 commit message said "RISCV_CBQRI_DRIVER ... depends on RESCTRL_FS" which was wrong (the driver depends only on SSQOSID). The select RISCV_CBQRI_DRIVER if RESCTRL_FS conditional did not match ARM64 MPAM's unconditional pattern. v5 drops the if RESCTRL_FS and rewrites the commit message to describe the new wiring. The resctrl filesystem integration stays gated by RISCV_CBQRI_RESCTRL_FS (silent, default y when both prereqs are enabled).

Final tally

  • False positive (12): P03 #5, P07 #5, P08 #1, P11 #2, P11 #4, P11 #5, P11 #7, P11 #8, P12 #8, P13 #3, P16 #1, P16 #3, P16 #5.
  • Real but mild / defensible / cleanup (10): P03 #4, P07 #4, P09 #1, P11 #1, P11 #3, P11 #6, P12 #5, P12 #7, P13 #4, P08 #2 (reserved-bits part only).
  • Real meaningful (24): P03 #1, P03 #2, P03 #3, P07 #1, P07 #2, P07 #3, P08 #2 (EVT_ID part), P08 #3, P09 #2, P09 #3, P09 #4, P09 #5, P12 #1, P12 #2, P12 #3, P12 #4, P12 #6, P13 #1, P13 #2, P15 #1, P15 #2, P15 #3, P15 #4, P16 #2, P16 #4, P18 #1.

(P16 #1 and P16 #5 are defensive against trusted firmware and classified as false positives by the strict false-positive guide.)

Cross-cutting refactor candidates

cbqri_probe_feature() hygiene

P07 #2, P08 #2 (EVT_ID), P08 #3, P09 #4 all stem from one function: cbqri_probe_feature() at cbqri_devices.c:501. The probe was designed for ALLOC_CTL (which has an AT field) and reused for MON_CTL (which doesn't). It also fails to clear EVT_ID on every iteration and fails to re-clear RCID on the AT=CODE secondary iteration.

Refactor goal: make cbqri_probe_feature defensively clear both RCID and EVT_ID on every iteration. Optionally, split the AT-detection into a separate function that's only called for ALLOC_CTL, leaving MON_CTL probes as a simple "is the operation supported?" query.

This single refactor closes 4 findings.

cancel_delayed_work usage

P12 #1 and P12 #2 both stem from sync-cancel on delayed_work structs that may be uninitialized or may deadlock against worker functions taking cpus_read_lock.

Fix: switch to cancel_delayed_work (non-sync) following fs/resctrl's pattern at rdtgroup.c:4353. Adds explicit handling for the "work may still be running after cancel" case (the worker takes domain_list_lock in CBQRI's case, so the post-cancel race is bounded by our own lock).

Final disposition

All fixes are folded into their originating patches on branch b4/ssqosid-cbqri-rqsc (the v5-prep branch) via git commit --fixup= plus git rebase --autosquash. Tip: 18 patches, same shape as v4 send.

Landed (real-meaningful findings closed)

ID Fix
P03 #1 DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX; in arch/riscv/kernel/qos.c
P03 #2 __switch_to_srmcfg() evaluates RCID and MCID independently against cpu_srmcfg_default
P03 #3 CPU PM notifier in riscv_srmcfg_init() invalidates cpu_srmcfg on CPU_PM_EXIT and CPU_PM_ENTER_FAILED
P03 #4 Dropped redundant for_each_online_cpu seed loop
P07 #1 cbqri_controller_destroy() now does iounmap() and release_mem_region()
P07 #2 cbqri_probe_feature() refactor clears OP, AT, RCID, and EVT_ID on every iteration
P07 #3 cbqri_apply_cache_config() pre-clears cc_block_mask before READ_LIMIT capture
P08 #2 EVT_ID part: same cbqri_probe_feature() refactor
P08 #3 Same cbqri_probe_feature() refactor
P09 #1 *committed=true moved after verify in cbqri_apply_bc_field()
P09 #2 mweight_cache added; cbqri_apply_bc_field() now seeds both fields of bc_bw_alloc from authoritative software caches, eliminating the silent-READ_LIMIT-noop corruption window
P09 #3 Removed early-return on ctrl->faulted from all three runtime ops; flag is now informational only
P09 #4 Same cbqri_probe_feature() refactor
P09 #5 if (mweight > FIELD_MAX(...)) return -EINVAL; in cbqri_apply_mweight_config()
P12 #1 cancel_delayed_work_sync -> cancel_delayed_work to break AB-BA against cqm_handle_limbo
P12 #2 cancel_delayed_work(&mon_dom->mbm_over) guarded on resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)
P12 #3 Hotplug rollback in cbqri_attach_cpu_to_cap_ctrl(): undo cpumask_set_cpu and tear down a freshly-created ctrl_domain when L3 mon attach fails
P12 #6 Comments at the irqs_disabled() rejects in resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() reference MPAM parity
P13 #2 BC register-time rejection of duplicate prox_dom with pr_warn
P15 #1 Pair the BC and initialise its per-MCID accumulators before resctrl_online_mon_domain() exposes the L3 mon_domain to userspace
P15 #2 Take cbqri_domain_list_lock around the list walks in resctrl_arch_rmid_read() and resctrl_arch_reset_rmid(); drop list_del_rcu/synchronize_rcu from the writer-side detach paths
P15 #4 cbqri_resctrl_pick_counters() counts distinct L3 cache_ids and disables mbm_total_bytes when more than one L3 domain is present
P16 #2 acpi_parse_rqsc() validates res0->type and res0->id_type per controller type before consuming res0->id1 (CACHE/PROCESSOR_CACHE for capacity, MEMORY/MEMORY_RANGE for bandwidth)
P16 #4 acpi_parse_rqsc() rejects tables whose revision is not 1 (RQSC v0.9.2 Table 1) with -ENOTSUPP at function entry
P18 #1 arch/riscv/Kconfig: select RISCV_CBQRI_DRIVER unconditionally (drop if RESCTRL_FS), matching ARM64 MPAM pattern; commit message rewritten to describe the new wiring

Landed (minor / cleanup findings closed)

ID Fix
P11 #1 Reworded the misleading CDP_NUM_TYPES is 1 on non-CDP comment in cbqri_init_domain_ctrlval()
P12 #5 Rewrote P12 commit message: dropped the inaccurate paragraph claiming resctrl_arch_reset_rmid() re-arms CONFIG_EVENT on RMID recycle. The OCCUP branch is reached only via resctrl_arch_reset_rmid_all() on mon-event config writes, not from a core RMID-recycle path
P13 #4 Reworded the reset-loop comment to describe cache-based sum check

Deferred / acknowledged limitations

P12 #4 monitor-only CCs / P15 #3 BW gated on L3 OCCUP. Both stem from the same coupling: mon paths read hw_ctrl via the ctrl_domain wrapper, so a CC that advertises mon_capable without alloc_capable cannot back a mon_domain, and the L3 MBM_TOTAL hookup is gated on the L3 CC being mon_capable. Decoupling needs a separate cbqri_l3_mon_dom wrapper holding hw_ctrl and paired_bc directly. Estimated ~115 lines net, confined to drivers/resctrl/. CBQRI spec permits these configurations but no shipping RISC-V silicon today exposes a mon-only CC. Decision: leave a pr_warn_once in cbqri_resctrl_pick_caches() to surface the limitation, address in a follow-up if/when hardware demands it.

P13 #1 group-removal MB_MIN reset. The leak (rbwb_cache[freed_closid] stays inflated and contaminates the sum check) cannot be closed in this series because fs/resctrl/rdtgroup.c::closid_free() has no arch-side hook. Needs a new resctrl_arch_* callback in fs/resctrl/ that core invokes on group destroy, plus a CBQRI implementation that resets rbwb_cache[closid] = 0. That's a core-side change, out of scope for an arch-driver series.

False positives (no action needed)

P03 #5, P07 #5, P08 #1, P11 #2, P11 #4, P11 #5, P11 #7, P11 #8, P12 #8, P13 #3, P16 #1, P16 #3, P16 #5 are all false positives or already-prevented by existing code. The reply to sashiko-bot (sashiko@lists.linux.dev) explains each briefly. Notable:

  • P16 #3 (node->flags) was disproved by reading RQSC v0.9.2 Table 3: bit 0 is a "supports zero allocation" capability, not an enabled bit. RQSC defines no enabled flag.
  • P16 #1 and P16 #5 are defensive against malformed firmware tables; the false-positive guide treats trusted-firmware defence as non-reportable.

Real-but-mild items addressed in passing

P03 #4 (delete redundant loop), P07 #4 (defensive comment kept), P09 #1 (timing fix done as part of cache work), P11 #3 (kept; rare failure path), P11 #6 (kept for lockdep value), P12 #5 (commit-msg fix at v5 send), P12 #7 (left as-is; ~1ms duplicate MMIO at attach is acceptable), P13 #4 (comment fix landed).

Cross-cutting refactors actually applied

  • cbqri_probe_feature() hygiene (P07 #2, P08 #2 EVT_ID, P08 #3, P09 #4): the function now builds the OP write from saved_reg masked clean of OP | AT | RCID | EVT_ID, then ORs in only the intended fields. Both the initial probe and the AT-CODE secondary probe go through the same builder. Single refactor, four findings closed.
  • cbqri_apply_bc_field() cache-driven staging (P09 #2): replaces the function-pointer set/get API with a (field, val) enum API, reads both rbwb_cache[closid] and mweight_cache[closid], stages both via cbqri_set_bc_bw_alloc(), then CONFIG_LIMIT. No initial READ_LIMIT round trip needed.
  • cancel_delayed_work_sync -> cancel_delayed_work (P12 #1, P12 #2): follows fs/resctrl/rdtgroup.c:4353 pattern.

Reply to sashiko-bot

sashiko@lists.linux.dev. The reply should:

  • Acknowledge each landed finding with a short note on the fix.
  • For false positives, cite the file:line that already prevents the scenario sashiko worried about.
  • For P12 #4 and P13 #1, explain the deferral rationale.
  • For real-but-mild items, either note the fix or explain why it stays as-is.

Workflow notes

  • Use a worktree (per repo convention): branch off b4/ssqosid-cbqri-rqsc at f3ba981889e6.
  • Build with ARCH=riscv LLVM=1 W=1 (via ./llvm.sh).
  • Per-commit verification: checkpatch.pl --strict, ./build-matrix.sh.
  • Don't bundle fixes across logical patches; each fix lands in the patch that introduced the affected code, via interactive rebase.
  • v3 (#45) findings history in ~/.claude/projects/-home-pdp7-dev-linux/memory/project_sashiko_45_pending_findings.md for reference but most items there are already closed.
@pdp7
Copy link
Copy Markdown
Author

pdp7 commented May 14, 2026

From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: linux-riscv@lists.infradead.org,
 linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid
In-Reply-To: <20260512225649.4A949C2BCB0@smtp.kernel.org>
References: <20260512225649.4A949C2BCB0@smtp.kernel.org>

On Tue, May 12, 2026 at 10:56:48PM +0000, sashiko-bot@kernel.org wrote:
> - [Medium] The commit message incorrectly claims that
>   `RISCV_CBQRI_DRIVER` depends on `RESCTRL_FS`, leading to an
>   unnecessary conditional select.

Thanks for the review.

> Is this description accurate?
>
> The driver configuration shows that RISCV_CBQRI_DRIVER only depends on
> RISCV && RISCV_ISA_SSQOSID. The filesystem integration is decoupled
> into a separate silent config, so the driver does not actually depend
> on the filesystem.
>
> [...]
>
> This differs from platforms like ARM64, where ARM64_MPAM
> unconditionally selects ARM64_MPAM_DRIVER regardless of the
> filesystem configuration.

You are right on both counts.  v5 drops the conditional and matches
the ARM64 MPAM pattern:

  -	select RISCV_CBQRI_DRIVER if RESCTRL_FS
  +	select RISCV_CBQRI_DRIVER

The driver is now compiled whenever SSQOSID is enabled, regardless
of RESCTRL_FS.  The actual resctrl filesystem integration code
(cbqri_resctrl.c) stays gated by RISCV_CBQRI_RESCTRL_FS, a silent
option that defaults to y when both RISCV_CBQRI_DRIVER and
RESCTRL_FS are enabled, so a user without RESCTRL_FS still gets
controller probing without the FS hookup.

Commit message rewritten in v5 to describe the new wiring
accurately.

Thanks,
Drew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment