The input-queued (IQ) switch architecture has received much attention in the research community and with implementers because it scales well with the line speed and the switch size. The main reason for this is that the memory bandwidth requirement for an input-queued switch is minimal, making it less expensive to implement compared to an output-queued or a shared-memory switch. To get a good delay and throughput performance from an IQ switch requires the use of efficient packet scheduling algorithms for matching input and output ports. For example, the maximum weight matching (MWM) algorithm is known to deliver a throughput of up to 100% and to provide low delays. But MWM is complex to implement at high line rates and scales poorly with the switch size. Many algorithms which approximate the performance of MWM have been proposed; but, they are still too complex to implement and/or do not provide a good performance compared to MWM. This paper proposes an innovative algorithm, called APSARA, which aims to bridge the gap between good performance and ease of implementation. The main idea is to use limited parallelism to find a matching in a single iteration, as compared to the O(N/sup 3/) iterations needed by MWM in the worst case for an N/spl times/N switch. We prove that APSARA achieves a throughput of up to 100% and extensive simulations show that its delay performance is nearly as good as that of MWM.