No doubt that the RT kernel sounds better. And the cyclictest numbers prove it, with cpuset enabled (user running on last 2 CPUs), and mpd playing music.
Generic Kernel
sudo cyclictest -t4 -p 90 -N -s -i 10000 -l 10000 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 2247) P:90 I:10000 C: 10000 Min: 11528 Act: 28286 Avg: 30429 Max: 878427
T: 1 ( 2248) P:90 I:10500 C: 9533 Min: 11171 Act: 24541 Avg: 24396 Max: 67930
T: 2 ( 2249) P:90 I:11000 C: 9094 Min: 11630 Act: 30500 Avg: 31576 Max: 493441
T: 3 ( 2250) P:90 I:11500 C: 8706 Min: 10571 Act: 26922 Avg: 24326 Max: 63533
RT Kernel
sudo cyclictest -t4 -p 90 -N -s -i 10000 -l 10000 -q
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1797) P:90 I:10000 C: 10000 Min: 9734 Act: 25065 Avg: 24552 Max: 63429
T: 1 ( 1798) P:90 I:10500 C: 9529 Min: 9631 Act: 24415 Avg: 22432 Max: 48408
T: 2 ( 1799) P:90 I:11000 C: 9095 Min: 9978 Act: 27283 Avg: 24627 Max: 52668
T: 3 ( 1800) P:90 I:11500 C: 8702 Min: 9763 Act: 26262 Avg: 22468 Max: 60311
For people not familiar with using the Cyclictest, the only important number is max latency. All times are in microseconds. The lower the max latency, the better for SQ. With the generic kernel, it is all over the place and can be as long as 1,000 milliseconds (1 second). With the RT kernel, the times are more consistent, with max latency around 50 to 70 milliseconds. A good result.