Use Extra Global Buffers
Do you have high fan-out Clock Enables, or IOB Tri-states?
Drive them through a unused BUFG to lower skew and higher performance
BUFGs have less than 1ns Skew to clock and CE inputs
Have to instantiate in HDL for non-clock signals