research-article

VARQ: virtual advance reservations for queues

Authors:
Daniel Charles Nurmi

University of California Santa Barbara, Santa Barbara, USA

University of California Santa Barbara, Santa Barbara, USA
View Profile

,
Rich Wolski

University of California Santa Barbara, Santa Barbara, USA

University of California Santa Barbara, Santa Barbara, USA
View Profile

,
John Brevik

California State University Long Beach, Long Beach, USA

California State University Long Beach, Long Beach, USA
View Profile

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computingJune 2008Pages 75–86https://doi.org/10.1145/1383422.1383433

Published:23 June 2008Publication History

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing

Pages 75–86

ABSTRACT

In high-performance computing (HPC) settings, in which multiprocessor machines are shared among users with potentially competing resource demands, processors are allocated to user workload using space sharing. Typically, users interact with a given machine by submitting their jobs to a centralized batch scheduler that implements a site-specific policy designed to maximize machine utilization while providing tolerable turn-around times. To these users, the functioning of the batch scheduler and the policies it implements are both critical operating system components since they control how each job is serviced. In practice, while most HPC systems experience good utilization levels, the amount of time experienced by individual jobs waiting to begin execution has been shown to be highly variable and difficult to predict, leading to user confusion and/or frustration.

One method for dealing with this uncertainty that has been proposed is to allow users who are willing to plan ahead to make "advanced reservations" for processor resources. To date, however, few HPC centers provide an advanced reservation capability to their general user populations since previous research indicates that diminished machine utilization will occur if and when advanced reservations are introduced.

In this work, we describe VARQ, a new method for job scheduling that provides users with probabilistic "virtual" advanced reservations using only existing best effort batch schedulers. VARQ functions as an overlay, submitting jobs that are indistinguishable from the normal workload serviced by a scheduler. We describe the statistical methods we use to implement VARQ, detail an empirical evaluation of its effectiveness in a number of HPC settings, and explore the potential future impact of VARQ should it become widely used. Without requiring HPC sites to support advanced reservations, we find that VARQ can implement a reservation capability probabilistically and that the effects of this probabilistic approach are unlikely to negatively affect resource utilization.

References

F. Berman, G. Fox, and T. Hey. Grid Computing: Making the Global Infrastructure a Reality. Wiley and Sons, 2003.]] Google ScholarDigital Library
J. Brevik, D. Nurmi, and R. Wolski. Predicting bounds on queuing delay for batch-scheduled parallel machines. In Proceedings of PPoPP 2006, March 2006.]] Google ScholarDigital Library
A. Bucur and D. Epema. The performance of processor co-allocation in multicluster systems. In 3rd IEEE/ACM Int'l Symp. on Cluster Computing and the GRID (CCGrid2003.]] Google ScholarDigital Library
S. Clearwater and S. Kleban. Heavy-tailed distributions in supercomputer jobs. Technical Report SAND2002-2378C, Sandia National Labs, 2002.]]Google Scholar
C. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke. A resource management architecture for metacomputing systems. In International Parallel Processing Symp. -- Workshop on Job Scheduling Strategies for Parallel Processing, 1998.]] Google ScholarDigital Library
A. Downey. Predicting queue times on space-sharing parallel computers. In Proceedings of the 11th International Parallel Processing Symposium, April 1997.]] Google ScholarDigital Library
A. Downey. Using queue time predictions for processor allocation. In Proceedings of the 3rd Workshop on Job Scheduling Strategies for Parallel Processing, April 1997.]] Google ScholarDigital Library
C. Ernemann, V. Hamscher, U. Schwiegelshohn, R. Yahyapour, and A. Streit. On advantages of grid computing for parallel job scheduling. In 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002, pages 39--47.]] Google ScholarDigital Library
C. Ernemann, V. Hamscher, and R. Yahyapour. Economic scheduling in grid computing, 2002.]]Google Scholar
D. G. Feitelson. A survey of scheduling in multiprogrammed parallel systems.]]Google Scholar
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling -- a status report, 2004.]]Google Scholar
I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, Inc., 1998.]] Google ScholarDigital Library
J. Gehring and T. Preiss. Scheduling a metacomputer with uncooperative sub-schedulers. In Proc. JSSPP, pages 179.]] Google ScholarDigital Library
M. Harchol-Balter. The effect of heavy-tailed job size distributions on computer system design. In Proceedings of ASA-IMS Conference on Applications of Heavy Tailed Distributions in Economics, Engineering and Statistics, June 1999.]]Google Scholar
F. Heine, M. Hovestadt, O. Kao, and A. Streit. On the impact of reservations from the grid on planning-based resource management. In International Workshop on Grid Computing Security and Resource Management (GSRM 2005) at ICCS 2005, Atlanta, USA, Springer, LNCS 3516, pages 155--162.]] Google ScholarDigital Library
D. Jackson, Q. Snell, and M. Clement. Core algorithms of the maui scheduler. In 7th Workshop on Job Scheduling Strategies for Parallel Processing, 2001.]] Google ScholarDigital Library
D. Lifka. The ANL/IBM SP scheduling system, volume 949. Springer-Verlag, 1995.]] Google ScholarDigital Library
Maui scheduler home page -- http://www.clusterresources.com/products/maui/.]]Google Scholar
A. W. Mu'alem and D. G. Feitelson. Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. In IEEE Trans. Parallel and Distributed Syst. 12(6), Jun 2001, pages 529--543.]] Google ScholarDigital Library
C. Ng, P. Buonadonna, B. N. Chun, A. C. Snoeren, , and A. Vahdat. Addressing strategic behavior in a deployed microeconomic resource allocator. In In Proceedings of the 3rd Workshop on Economics of Peer-to-Peer Systems, 2005.]] Google ScholarDigital Library
D. Nurmi, J. Brevik, and R. Wolski. Qbets: Queue bounds estimation from time series. In Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP), June 2007.]] Google ScholarDigital Library
The qbets web page -- http://nws.cs.ucsb.edu/batchq.]]Google Scholar
J. Shneidman, C. Ng, D. C. Parkes, A. AuYoung, A. C. Snoeren, and A. Vahdat. Why markets could (but don't currently) solve resource allocation problems in systems. In Proceedings of the 10th USENIX Workshop on Hot Topics in Operating Systems, 2005.]] Google ScholarDigital Library
L. Smarr and C. E. Catlett. Metacomputing, 1992.]]Google Scholar
W. Smith, I. Foster, and V. Taylor. Scheduling with advanced reservations. In Parallel and Distributed Processing Symposium (IPDPS 2000), pages 127--132.]] Google ScholarDigital Library
W. Smith, V. E. Taylor, and I. T. Foster. Using run-time predictions to estimate queue wait times and improve scheduler performance. In IPPS/SPDP '99/JSSPP '99: Proceedings of the Job Scheduling Strategies for Parallel Processing, pages 202--219, London, UK, 1999. Springer-Verlag.]] Google ScholarDigital Library
Q. Snell, M. Clement, D. Jackson, and C. Gregory. The performance impact of advance reservation meta-scheduling. In 6th Workshop on Job Scheduling Strategies for Parallel Processing, pages 137--153, 2000.]] Google ScholarDigital Library
The teragrid user portal -- http://portal.teragrid.org.]]Google Scholar

Index Terms

VARQ: virtual advance reservations for queues
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

Design and Implementation of a Local Scheduling System with Advance Reservation for Co-allocation on the Grid
CIT '06: Proceedings of the Sixth IEEE International Conference on Computer and Information Technology

While advance reservation is an essential capability for co-allocating several resources on Grid environments, it is not obvious how it can co-exist with priority-based First Come First Served scheduling, that is widely used as local scheduling policy ...
Read More
Resource availability-aware advance reservation for parallel jobs with deadlines

Advance reservation is important to guarantee the quality of services of jobs by allowing exclusive access to resources over a defined time interval on resources. It is a challenge for the scheduler to organize available resources efficiently and to ...
Read More
Quality of Service on the Grid Via Metascheduling with Resource Co-Scheduling and Co-Reservation

Assuring predictable resources (processors, memory, storage) for applications running on the Grid is a critical factor for the success of the Grid for solving real-life problems. We extend the Globus Resource Management Architecture to provide ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing
June 2008
252 pages
ISBN:9781595939975
DOI:10.1145/1383422
General Chairs:
Manish Parashar
Rutgers University, USA
,
Karsten Schwan
Georgia Institute of Technology, USA
,
Program Chairs:
Jon Weissman
National e-Science Center, Edinburgh, University of Minnesota, USA
,
Domenico Laforenza
Information Science and Technology Institute, CNR, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
advance reservation
batch queue scheduling
super-computing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate166of966submissions,17%
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 317
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VARQ: virtual advance reservations for queues

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design and Implementation of a Local Scheduling System with Advance Reservation for Co-allocation on the Grid

Resource availability-aware advance reservation for parallel jobs with deadlines

Quality of Service on the Grid Via Metascheduling with Resource Co-Scheduling and Co-Reservation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

VARQ: virtual advance reservations for queues

HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design and Implementation of a Local Scheduling System with Advance Reservation for Co-allocation on the Grid

Resource availability-aware advance reservation for parallel jobs with deadlines

Quality of Service on the Grid Via Metascheduling with Resource Co-Scheduling and Co-Reservation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media