Abstract
Many approaches have been described for the parallel loop scheduling problem for shared-memory systems, but little work has been done on the data-dependent loop scheduling problem (nested loops with loop carried dependencies). In this paper, we propose a general model for the data-dependent loop scheduling problem on distributed as well as shared memory systems. In order to achieve load balancing and low runtime scheduling and communication overhead, our model is based on a loop task graph and the notion of critical path. In addition, we develop a heuristic algorithm based on our model and on genetic algorithms to test the reliability of the model. We test our approach on different scenarios and benchmarks. The results are very encouraging and suggest a future parallel compiler implementation based on our model.
- {1} J. Aguilar, E. Gelenbe, Task assignment and transaction clustering heuristics for distributed systems. Inform. Sci.: Inform. Comput. Sci. 97 (1)(1997) 199-219. Google ScholarDigital Library
- {2} J. Aguilar, E. Leiss, A general adaptive parallel loop scheduling For distributed and shared memory systems, Technical Report, Department of Computer Sciences, University of Houston, June 2000.Google Scholar
- {3} A. Bull, Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments, Lecture Notes in Computer Science, Vol. 1470, Springer, Berlin, 1998, pp. 377-382. Google Scholar
- {4} F. Chein, T. O'Neil, E. Sha, Optimizing overall loop schedules using prefetching and partitioning, IEEE Trans. Parallel Distributed Systems 11 (6) (2000) 604-614. Google ScholarDigital Library
- {5} H. El-Rewini, T. Lewis, H. Ali, Task Scheduling in Parallel and Distributed Systems, Prentice-Hall, Englewood Cliff, NJ, 1994. Google Scholar
- {6} G. Nanliken, G. Belloch, Space-efficient scheduling of nested parallelism, ACM Trans. Programming Language Systems 21 (1) (1999) 138-169. Google ScholarDigital Library
- {7} A. Nisbet, GAPS: genetic algorithm optimized parallelisation, Technical Report, Department of Computer Sciences, University of Manchester, UK, 1998.Google Scholar
- {8} N. Passos, M. Sha, Achieving full parallelisation using multi-dimensional retiming. IEEE Trans. Parallel Distributed Systems 7 (11)(1996) 1150-1163. Google ScholarDigital Library
- {9} A. Srinivasan, J. Anderson. A simple prool technique for priority scheduled systems, Inform. Process. Lett. 77 (2-4) (2001) 63-70. Google Scholar
- {10} Z. Szczerbinski, Optimal Distribution of Loops Containing no Dependence Cycles, Lecture Notes in Computer Science, Vol 1595. Springer, Berlin, 1999, pp. 1254-1257. Google Scholar
- {11} S. Tongsima, E. Sha, N. Passos, Communication-sensitive loop scheduling for DSP applications. IEEE Trans. Signal Process. 45 (5) (1997) 1309-1322. Google ScholarDigital Library
- {12} S. Tongsima, C. Chantrapornchai, E. Sha, Probabilistic Loop Scheduling Considering Communication Overhead, Lecture Notes in Computer Science, Vol. 1459, Springer, Berlin, 1998, pp. 158-179. Google Scholar
- {13} S. Tongsima, E. Sha, C. Chantrapornchai, D. Surma, N. Luiz, Probabilistic loop scheduling for applications with uncertain execution time, IEEE Trans. Comput. 49 (1) (2000) 65-80. Google ScholarDigital Library
- {14} J. Xue, Communication-minimal tiling of uniform dependence loops, J. Parallel Distributed Comput. 42 (1997) 42-59. Google ScholarDigital Library
- {15} C. Wang, K. Parhi, Resource-constrained loop list scheduler for DSP algorithms, J. VLSI Signal Process. 11 (1-2) (1995) 75-96. Google ScholarDigital Library
- {16} Y. Yan, C. Jin, X. Zhang, Adaptively scheduling parallel loops in distributed shared-memory systems. IEEE Trans. Parallel Distributed Systems 8 (1)(1997) 70-81. Google ScholarDigital Library
Index Terms
- Data dependent loop scheduling based on genetic algorithms for distributed and shared memory systems
Recommendations
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum ...
Parallel loop generation and scheduling
Loop tiling is an efficient loop transformation, mainly applied to detect coarse-grained parallelism in loops. It is a difficult task to apply n-dimensional non-rectangular tiles to generate parallel loops. This paper offers an efficient scheme to apply ...
Optimizing irregular shared-memory applications for distributed-memory systems
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programmingIn prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...
Comments