A cheap method is to use a compressor with a long attack time and very high compression ratio - 100mS will be too long to take out dynamic peaks, but will be short enough to ensure you don't endure long term damaging peaks. (You'll till get peaks - just not as high, and therefore less damaging to your ears and to your speakers).
There cannot be a perfect system for doing what you want, because no piece of equipment can reliably tell the difference between unwanted spikes, and dynamics which you actually want to keep.
|