I need bit slice DES encrypting implementation on Nvidia CUDA that works as fast as 50,000,000 calls per second on Nvidia card 9500 GT. In other words, implementation must be capable to encrypt at least 50,000,000 64-bit words per second on 9500 GT. This is not possible if DES algorithm implemented straightforward. Probably, exactly bit slice method should be here. But the only requirement is speed.