-O3: reduce code size.
-DTF_LITE_STATIC_MEMORY: cause bugs on some cores.
+DTFLITE_EMULATE_FLOAT: robuster to emulate float cucalation by fix-point.
Signed-off-by: jihandong <jihandong@xiaomi.com>
The second argument of vgetq_lane_s32(__a, __b) needs to be initialized before compilation, so unroll the for loop. and correct the passed parameters.
Signed-off-by: xinhaiteng <xinhaiteng@xiaomi.com>
The complete implementation is placed separately in mLearning/tflite-micro/operators/neon, delete this part.
Signed-off-by: xinhaiteng <xinhaiteng@xiaomi.com>
Cortex-A compilation options are added to tflite-micro and cmsis-nn, and new operator compilation environments are configured.
Signed-off-by: xinhaiteng <xinhaiteng@xiaomi.com>
VELAPLATFO-25411
On the basis of CMSIS-NN, neon was used to optimize the Add operator, which calculates the offset and addition of eight input and output data in one loop.
Signed-off-by: xinhaiteng <xinhaiteng@xiaomi.com>
Based on CMSIS-NN, the Conv operator was optimized. Using Neon acceleration, multiply 8 input data and 8 filter data in a single loop; Using Im2col technology, convert the output data into a matrix, calculate 2 rows of input data and 4 rows of filter data in a single large loop, and obtain 2x4 output data.
Signed-off-by: xinhaiteng <xinhaiteng@xiaomi.com>