✎ Midia Reshadi's Blog

  • ⌂ Home
GitHub LinkedIn Google Scholar
  • June 2026

    Safe Rust on the GPU, at the PTX level: what cuda-oxide actually emits

    A PTX/SASS-level comparison of cuda-oxide (safe Rust to PTX) against clang and nvcc on three GPU kernels — where the overhead comes from, whether it amortizes, and what it costs at runtime on an L4. 3 figures.

    cuda-oxide Rust PTX GPU
  • June 2026

    Attention kernels on MI300X: a utilization story

    Benchmarking FlashAttention kernels on AMD MI300X — AITER vs Triton vs PyTorch SDPA, with hardware counters explaining the throughput ranking. 60 data points, 3 figures.

    MI300X FlashAttention AITER Triton
  • May 2026

    Why Outer-Product Beats Vendor Sparse Libraries by up to 198x for LLM Attention Sparsity

    Benchmarking dense dataflow strategies and sparse SpMSpM algorithms on NVIDIA T4 and AMD MI300x — 222 data points, 7 figures.

    Triton SpMSpM MI300x LLM