In this paper, we propose a domain decomposition scheme that seeks to minimize total parallel execution time by considering the relative importance of two competing concerns — balancing the load and minimizing communication —for a particular application and architecture. A simulated annealing approach is used to optimize an objective function with components that measure both load balance and communication requirements. We develop an analytical model of execution time based upon afinite element code executed on the Intel Paragon. This model is used to compare partitions with varying degrees of load imbalance. Most literature in the area of decomposition methods heavily emphasizes load balancing over the minimization of communication. Our results indicate that this restrictive approach to load balancing can be relaxed without performance degradation. Further, our results indicate that the degree of relaxation possible is dependent upon the target machine and the application; neither one can be neglected.