Both the derivative and thedetermination of the roots can be calculated in parallel. The CS algorithm requires two synchronizations, with a short sequential part between them and a final sequential calculation afterwards. The runtime in the original configuration with 75data points for the curve is rather short (about 21 μs), but can be expanded up to 7500 data points (about 800 μs).In our measurements core A0 runs the master task and A1, A2 and A3 the slave tasks. In order to compensate the slowermemory access from the slave tasks, we additionally implemented an optimized version where the master task calculates abigger share of the data than the slave tasks. All tasks have the highest priority as required by the supercore pattern.