we utilized the GPU in our option pricing module. its calculation speed compared to CPU is about 20%, however its parallel calculation power helps us to reduce overall latency when underlying price changed.
the latency cost in memory copy betwen GPU and CPU is a bit high, about 30 - 50 microseconds.
i feel FPGA most likely will achieve the same performane if we use it the same way as GPU. in FPGA case, might that it is more efficient in power consumption.
FPGA should be able to achieve better performance if it bypasses unnecessary os communications and utilize parallel caculation via physical logic blocks where CPU cannot do.