Toward performance portability for CPUS and GPUS through algorithmic compositions