Characterizing the Performance of Executing Many-tasks on Summit

Turilli, Matteo; Merzky, Andre; Naughton, Thomas; Elwasif, Wael; Jha, Shantenu

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1909.03057 (cs)

[Submitted on 8 Sep 2019]

Title:Characterizing the Performance of Executing Many-tasks on Summit

Authors:Matteo Turilli, Andre Merzky, Thomas Naughton, Wael Elwasif, Shantenu Jha

View PDF

Abstract:Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:1909.03057 [cs.DC]
	(or arXiv:1909.03057v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1909.03057

Submission history

From: Matteo Turilli [view email]
[v1] Sun, 8 Sep 2019 20:58:19 UTC (1,020 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2019-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Matteo Turilli
André Merzky
Shantenu Jha

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Characterizing the Performance of Executing Many-tasks on Summit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Characterizing the Performance of Executing Many-tasks on Summit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators