MSVC ARM64 optimizations in Visual Studio 2022 17.8
MSVC ARM64 optimizations in Visual Studio 2022 17.8 Jiong Wang (ARM Ltd) Hongyon Suauthai (ARM) January 9th, 20240 1 Visual Studio 2022 17.8 has been released recently (download it here). While there is already a blog “Visual Studio 17.8 now available!” covering new features and improvements, we would like to share more information with you about what is new for the MSVC ARM64 backend in this blog. In the last couple of months, we have been improving code-generation for the auto-vectorizer so that it can generate Neon instructions for more cases. Also, we have optimized instruction selection for a few scalar code-generation scenarios, for example short circuit evaluation, comparison against immediate, and smarter immediate split for logic instruction. Auto-Vectorizer supports conversions between floating-point and integer The following conversions between floating-point and integer types are common in real-world code. Now, they are all enabled in the ARM64 backend and hooked up with the auto-vectorizer. From To Instruction double float fcvtn double int64_t fcvtzs double uint64_t fcvtzu float double fcvtl float int32_t fcvtzs float uint32_t fcvtzu int64_t double scvtf uint64_t double ucvtf int32_t float scvtf uint32_t float ucvtf For example: void test (double * __restrict a, unsigned long long * __restrict b) { for (int i = 0; i < 2; i++) { a[i] = (double)b[i]; } } In Visual Studio 2022 17.7, the code-generation was the following in which both the computing throughput and load/store bandwidth utilization were suboptimal due to scalar instructions being used. ldp x9, x8, [x1] ucvtf d17, x9 ucvtf d16, x8 stp d17, d16, [x0] In Visual Studio 2022 17.8.2, the code-generation has been optimized into: ldr q16,[x1] ucvtf v16.2d,v16.2d str q16,[x0] A single pair of Q register load & store plus SIMD instructions are used now. The above example is a conversion between double and 64-bit integer, so both types are the same size. There was another issue in the ARM64 backend preventing auto-vectorization on conversion between different sized types and it has been fixed as well. MSVC also auto-vectorizes the following example now: void test_df_to_sf (float * __restrict a, double * __restrict b, int * __restrict c) { for (int i = 0; i < 4; i++) { a[i] = (float) b[i]; c[i] = ((int)a[i])
