Skip Navigation


Following publications are related to Ookami:


  1. VPIC 2.0: Next Generation Particle-in-Cell Simulations; Bird, Tan, Luedtke, Harrell, Taufer, Albright; 2021
  2. Ookami: Deployment and Initial Experiences; Burford, Calder, Carlson, Chapman, Coskun, Curtis, Feldman, Harrison, Kang, Michalowicz, Raut, Siegmann, Wood, Deleon, Jones, Simakov, White, Oryspayev; PEARC '21

  3. Comparing the behavior of OpenMP Implementations with various Applications on two different Fujitsu A64FX platforms; Michalowicz, Raut, Kang, Curtis, Oryspayev, Chapman; PEARC '21
  4. MoB2 under Pressure: Superconducting Mo Enhanced by Boron; Quan, Lee, Pickett; 2021
  5. A64FX performance: experience on Ookami; Shahneous Bari, Chapman, Curtis,  Harrison, Siegmann, Simakov, Jones; 2021
  6. Porting and Evaluation of a Distributed Task-driven Stencil-based Application; Raut, Anderson, Araya-Polo, Meng;  PMAM 2021
  7. Comparing OpenMP Implementations with Applications Across A64FX Platforms; Michalowicz, Raut, Kang, Curtis, Chapman, Oryspayev;  IWOMP 2021
  8. Educating HPC users in the use of advanced computing technology; Siegmann, Calder, Feldman, Harrison; SC'21 EduHPC
  9. Experiences with Porting the FLASH Code to Ookami, an HPE Apollo 80 A64FX Platform; Feldman, Michalowicz, Siegmann, Curtis, Calder, Harrison; HPC Asia 2022
  10. OpenSHMEM Active Message Extension for Task-Based Programming; Lu, Curtis, Chapman; 2022
  11. Analysis of Vector Particle-In-Cell (VPIC) memory usage optimizations on cutting-edge computer architectures; Tan, Bird, Chen,  Luedtke, Albright, Taufer; Journal of Computational Science; 2022
  12. Dirac lines and loop at the Fermi level in the time-reversal symmetry breaking superconductor LaNiGa2; Badger, Quan, Staab, Sumita, Rossi, Devlin, Neubauer, Shulman, Fettinger, Klavins, Kauzlarich, Aoki, Vishik, Pickett, Taufour; communications physics; 2022
  13. Parthenon – a performance portable block-structured adaptive mesh refinement framework; Grete, Dolence, Miller, Brown, Ryan , Gaspar, Glines, Swaminarayan, Lippuner, Solomon, Shipman, Junghans, Holladay, Stone; 2022
  14. Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems; Ramesh, Hashmi, Xu, Shafi, Ghazimirsaeed, Bayatpour, Subramoni, Panda; IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC) 2021
  15. Quantum calcium-ion affective influences measured by EEG; Ingber; 2022
  16. Hybrid Classical-Quantum Computing: Applications to Statistical Mechanics of Neocortical Interactions; Ingber; 2021
  17. Exploring Source-to-Source Compiler Transformation of OpenMP SIMD Constructs for Intel AVX and Arm SVE Vector Architectures; Flynn, Yi, Yan; The 13th International Workshop on Programming Models and Applications for Multicores and Manycores be held in conjunction with PPoPP 2022
  18. FOURST: A code generator for FFT-based fast stencil computations; Ahmad, Javanmard, Croisdale, Gregory, Ganapathi, Pouchet, Chowdhury; 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 99-108
  19. Friends and foes: Sinophobia was viral in Chinese language communities on Twitter during the early COVID-19 pandemic; Zhang, Lin, Wang, Fan; 2022
  20. Developing Accurate Slurm Simulator; Simakov, Deleon, Lin, Hoffmann, Mathias; PEARC'22
  21. On Using Linux Kernel Huge Pages with FLASH, an Astrophysical Simulation Code; Calder, Feldman, Siegmann, Dey, Curtis, Chheda, Harrison; IEEE Cluster - EAHPC Workshop 2022
  22. Performance of an Astrophysical Radiation Hydrodynamics Code under Scalable Vector Extension Optimization; Smolarski, Swesty  Calder; IEEE Cluster, EAHPC Workshop 2022
  23. Bring the BitCODE - Moving Compute and Data in Distributed Heterogeneous Systems; Lu, Pena, Shamis, Churavy, Chapman, Poole; 2022
  24. From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types; Daiß, Singanaboina, Diehl, Kaiser, Pflüger; 2022
  25. Assessing the State of Autovectorization Support based on SVE; Brank, Pleiter; IEEE Cluster, EAHPC Workshop 2022
  26. Improved Distributed-memory Triangle Counting by Exploiting the Graph Structure; Gosh; IEEE 2022
  27. OpenMP Advisor: A Compiler Tool for Heterogenous Architectures; Mishra, Malik, Lin, Chapman; 2023
  28. Modern server ARM processors for supercomputers: A64FX and others. Initial data of benchmarks; Kuzminsky; 2022
  29. Examining the Connectivity of Antarctic Krill on the West Antarctic Peninsula: Implications for Pygoscelis Penguin Biogeography and Population Dynamics; Gallagher, Dinniman, Lynch; 2023
  30. Are we ready for broader adoption of ARM in the HPC community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM Systems; Simakov, Deleon, White, Jones, Furlani, Siegmann,  Harrison; HPC Asia 2023
  31. Performance Study on CPU-based Machine Learning with PyTorch; Chheda, Curtis, Siegmann, Chapman; HPC Asia 2023
  32. Shared memory parallelism in Modern C++ and HPX; Diehl,  Brandt, Kaiser; 2023
  33. Interoperable PGAS Programming Models for Exascale Supercomputing; Lu; 2023
  34. Program Transformation for Automatic GPU-Offloading using OpenMP; Mishra; 2023
  35. HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs; Zhang, Smith, Sun, Tian, Soifer, Yu, Song, He, Tao; 2023
  36. Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku; Diehl, Daiß, Huck, Marcello, Shiber, Kaiser, Pfluger; 2023
  37. Asynchronous Many-Task Systems and Applications: First International Workshop; Diehl, Thoman, Kaiser, Kale; 2023
  38. CPU Architecture Modelling and Design; Brank, Pleiter; 2023
  39. Cyberinfrastructure for Sustainability Sciences; Song, Merwade, Wang, Witt, Kumar, Irwin, Zhao, Walton; 2023
  40. Human mobility patterns are associated with experienced partisan segregation in US metropolitan areas; Zhang, Cheng, Li, Jiang; 2023
  41. Quantifying Antarctic krill connectivity across the West Antarctic Peninsula and its role in large-scale Pygoscelis penguin population dynamics; Gallagher, Dinniman, Lynch; 2023
  42. Sinophobia was popular in Chinese language communities on Twitter during the early COVID-19 pandemic; Zhang, Lin, Wang, Fan; 2023
  43. Efficient Auto-Vectorization for Control-flow Dependent Loops through Data Permutation; Paktinatkeleshteri; 2023
  44. From Molecular Dynamics to Oceanography - Ookami Graduate Students Porting and Tuning Science Codes for A64FX; Kaushik, Wang, Ma, Carlson, Curtis, Harrison, Siegmann; 2023
  45. A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code; Feldman, Chheda, Dey, Siegmann, Curtis, Harrison; 2023
  46. LM4HPC: Towards Effective Language Model Application in High-Performance Computing; Emani, de Supinski; 2023
  47. Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger; Diehl, Daiss, Brandt, Kheirkhahan, Kaiser, Taylor, Leidel; 2023
  48. The General Atomic and Molecular Electronic Structure System (GAMESS): Novel Methods on Novel Architectures; Zahariev, Xu, Westheimer, Webb, Vallejo, Tiwari, Sundriyal, Sosonkina, Shen, Schoendorff, Schlinsog, Sattasathuchana, Ruedenberg, Roskop, Rendell, Poole, Piecuch, Pham, Mironov, Mato, Leonard, Leang, Ivanic, Hayes, Harville, Gururangan, Guidez, Gerasimov, Friedl, Ferreras, Elliott, Datta, Cruz, Carrington, Bertoni, Barca, Alkan, Gordon; 2023
  49. Efficient Auto-Vectorization for Control-flow Dependent Loops through Data Permutation; Rouzbeh, de Carvalho João P. L., Ehsan, Nelson; 2023
  50. Parameterization of Quantum Interactions;  Ingber; 2023
  51. Ookami: An A64FX Computing Resource; Calder, Siegmann, Feldman, Chheda, Smolarski, Swesty, Curtis, Dey, Carlson, Michalowicz, Harrison, 2023
  52. Cross-Feature Transfer Learning For Efficient Tensor Program Generation; Verma, Raskar, Emani, Chapman; 2024
  53. Impact of Write-Allocate Elimination on Fujitsu A64FX; Kang, Gosh, Kandemir, Marquez; 2024
  54. First Impressions of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip for Scientific Workloads; Simankov, Jones, Furlani, Siegmann, Harrison; 2024
  55. Parallel C++ Efficient and Scalable High-Performance Parallel Programming Using HPX; Diehl, Brandt, Kaiser; 2024
  56. Explore as a Storm, Exploit as a Droplet: A Unified Search Technique for the Ansor Optimizer; Canesche, Verma, Quintao Pereira; 2024
  57. Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java; Diehl, Brandt, Morris, Gupta, Kaiser; 2023
  58. Quantifying potential marine debris sources and potential threats to penguins on the West Antarctic Peninsula;Gallagher, Cimino, Dinniman, Lynch; 2024
  59. Anti-Coulomb ion-ion interactions: A theoretical and computational study; Wills, Mannino, Losada, Mayo, Soler, Fernandez-Serra; 2024
  60. Parallel assembly of finite element matrices on multicore computers; Krysl; 2024
  61. First Impressions of the Sapphire Rapids Processor with HBM for Scientific Workloads; Siegmann, Harrison, Carlson, Chheda, Curtis, Coskun, Gonzalez, Wood, Simakov ; 2024
  62. Performance-Portable Tensor Transpositions in MLIR; Lakshminarasimhan, Hall, Sadayappan; 2024
  63. A64FX Enables Engine Decarbonization Using Deep Learning; Ristow Hadlich, Verma, Curtis, Siegmann, Assanis; 2024
  64. From array expressions to predictable portable high-performance: foundations for no-code HPC on arrays; Mullin, Hains; 2024
  65. Explore as a Storm, Exploit as a Droplet: A Unified Search Technique for the MetaSchedule; Canesche, Verma, Quintao Pereira; 2024
  66. Accelerating LULESH using HPX – the C++ Standard Library for Parallelism and Concurrency; Singanaboina, Wei, Seiras, Syskakis, Richardson, Cook, Kaiser; 2024
  67. Hardware-Software Co-design of Efficient and Scalable Deep Learning; Zhang; 2024
  68. Dynamics of Jet Expansion and Impingement Across a Spectrum of Nozzle Pressure Ratios; Martinus, Tumuklu; 2024
  69. Benchmarking with Supernovae: A Performance Study of the FLASH Code; Martin, Feldman, Calder, Curtis, Siegmann, Carlson, Gonzalez, Wood, Harrison, Coskun; 2024
  70. Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels; Nassyr, Pleiter; 2024
  71. Enhancing Code Portability, Problem Scale, and Storage Efficiency in Exascale Applications; Tan; 2024
  72. Towards a Scalable and Efficient PGAS-based Distributed OpenMP; Shan, Araya-Polo, Chapman; 2024
  73. On the Scalability of Computing Genomic Diversity Using SparkLeBLAST: A Feasibility Study; Prabhu, Moussad, Youssef, Vatai, Feng; 2024



  1. Pure Deflagrations of Hybrid CONe White Dwarf Progenitors; C. Feldman, D. Willcox, D. Townsley, A. Calder; AAS; 2021


Other publications

  1. Kernel module for the A64FX hardware barrier
  2. PEARC 2022 - Birds of a feather session: NSF innovative computing technology testbed community exchange