c - SIMD SSE2 instructions in assembly -
i'm rewriting programme used 64 bit words utilize 128 bit words. trying utilize simd sse2 intrinsics intel. new program, uses simd intrinsics, 60% percent slower original when had expected to around twice fast. when looked @ assembly code each of them, similar , same length. however, object code (compiled file) 60% longer.
i ran callgrind on 2 programs, told me how many instruction reads there per line. found simd version of programme had fewer instructions reads same action in original version. ideally, should happen, doesn't create sense because simd version takes longer run.
my question: sse2 intrinsics convert more assembly instructions? sse2 instructions take longer run? or there other reason new programme slow?
additional notes: programming in c, on linux mint, , compiling gcc -o3 -march=native.
c assembly simd sse2
No comments:
Post a Comment