Publications
Publication | Type | Year |
---|---|---|
ALBADross: Active Learning Based Anomaly Diagnosis for Production HPC SystemsCluster 2022
|
Conference Proceeding – 2022 Conference Proceeding | 2022 |
Concise ML ExplanationsNsard
|
Presentation (non-conference) – 2022 Presentation (non-conference) | 2022 |
Concise ML ExplanationsNsard
|
Display or Poster (non-conference) – 2022 Display or Poster (non-conference) | 2022 |
Using Monitoring Data to Improve HPC Performance via Network-Data-Driven AllocationIeee Hpec
|
Conference Presentation – 2021 Conference Presentation | 2021 |
Using Monitoring Data to Improve HPC Performance via Network-Data-Driven AllocationIeee Hpec
|
Conference Proceeding – 2021 Conference Proceeding | 2021 |
E2EWatch: End-to-end Anomaly Diagnosis Framework for Production HPC SystemsEuro-Par 2021
|
Conference Presentation – 2021 Conference Presentation | 2021 |
Strategies for Matching Process Models to Observational Data89th MORS Symposium |
Conference Presentation – 2021 Conference Presentation | 2021 |
E2EWatch: End-to-end Anomaly Diagnosis Framework for Production HPC SystemsEuro-Par 2021 |
Conference Paper – 2021 Conference Paper | 2021 |
CoMTE: Counterfactual Explanations for Multivariate Time SeriesICAPAI 2021 International Conference on Applied Artificial Intelligence |
Conference Presentation – 2021 Conference Presentation | 2021 |
Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC SystemsISC High Performance |
Conference Proceeding – 2021 Conference Proceeding | 2021 |
Counterfactual Explanations for Multivariate Time SeriesICAPAI 2021 International Conference on Applied Artificial Intelligence |
Conference Proceeding – 2021 Conference Proceeding | 2021 |
Explainable Machine Learning Frameworks for Managing HPC SystemsMLCS 2020 at SC 2020 |
Conference Presentation – 2020 Conference Presentation | 2020 |
Explainable Machine Learning Frameworks for Managing HPC Systems2nd workshop on Machine Learning for Computing Systems |
Conference Paper – 2020 Conference Paper | 2020 |
A Machine Learning Approach to Understanding HPC Application Performance VariationSupercomputing Conference |
Conference Paper – 2019 Conference Paper | 2019 |
AD for Machine Learning Approach to Understanding HPC Application Performance Variation PosterSupercomputing Conference |
Conference Paper – 2019 Conference Paper | 2019 |
Machine Learning for System Software ? Can Computers Manage Computers?Sandia Machine Learning Deep Learning |
Presentation (non-conference) – 2019 Presentation (non-conference) | 2019 |
Taxonomist: Application Detection through Rich Monitoring DataMachine Learning Course |
Presentation (non-conference) – 2019 Presentation (non-conference) | 2019 |
Level-Spread: A New Job Allocation Policy for Dragonfly NetworksPadal |
Conference Paper – 2019 Conference Paper | 2019 |
Statistical Models of Dengue FeverAusDM 2018The 16th Australasian Data Mining Conference |
Conference Paper – 2018 Conference Paper | 2018 |
Statistical Models of Dengue FeverThe 16th Australasian Data Mining Conference (AusDM 2018) |
Conference Paper – 2018 Conference Paper | 2018 |
Adverse Event Prediction Using Graph-Augmented Temporal Analysis: Final Report |
SAND Report – 2018 SAND Report | 2018 |
Online Diagnosis of Performance Variation in HPC Systems Using Machine LearningIEEE Transactions on Parallel and Distributed Systems |
Journal Article – 2018 Journal Article | 2018 |
Taxonimist: Application Detection through Rich Monitoring DataEuro-Par |
Conference Paper – 2018 Conference Paper | 2018 |
Level-Spread: A New Job Allocation Policy for Dragonfly NetworksIPDPS - IEEE International Parallel & Distributed Processing Symposium |
Conference Paper – 2018 Conference Paper | 2018 |
Event Prediction Using Graph-Augmented Temporal AnalysisCoDA 2018Conference on Data Analytics |
Conference Paper – 2018 Conference Paper | 2018 |
Document Title | Type | Year |