SoundMagic FX - Dissertation

FREQUENCY-DOMAIN PROCESSES

REPRESENTING TIME DOMAIN DATA IN THE FREQUENCY DOMAIN

The Fast Fourier Transform

The Fourier transform is the workhorse of frequency-domain representation. Its sole purpose is to transform time-domain samples into amplitude and phase data. Time-stretching by FFT is one of the best methods available for stretching a sound, while keeping artifact levels low. All it essentially requires is that the FFT synthesis channels are larger than the analysis channels by some arbitrary stretch factor. Because we are in the spectral domain, we can fix frequency information, and this makes the relationship between the stretched sound and the original samples much closer than a simple tape-speed variation. However, at larger stretch factors (say 10x and upwards), the artifacts inherent in a sampled representation of sound become quite noticeable. The problem is that because we are constantly treading a fine line between frequency resolution and transient resolving, when we stretch a variant morphology out, the exact changes in amplitude are hard to pin down. Therefore, we get an effect where these changes are slightly out of synchronisation with each other, causing a reverberatory percept. Furthermore, when stretching out noise-based sounds, their previously rapidly-changing spectral energies are now 'smeared' to the point where the ear distinguishes them as being series of inharmonic spectra. While this may be a nuisance, it can be exploited as a means of traversing the noise to note continuum of the spectral typologies.

The first example of this is where a rapidly changing noise-like morphology, namely a handful of leaves, is stretched out to 60 times its original length, creating not only a reduction in gestural energy, but also a change in the spectral typology field to that of an inharmonic spectra, and therefore a concomitant change in the object/substance field. (Sound Example 11). Of course, if a sound already has an inharmonic typology, then the time-stretch will tend to keep its substance field intact, although the dimensions may have changed somewhat. These next example uses the metallic stroke of a tam-tam as a basis of a sixteen times time-stretch. (Sound Example 12) Finally, we can hear the effect it has on the utterance field, which if the listener is unable to grasp the slow-changes of vowel formants, may reduce the communication to that of paralanguage. Notice how the diphthong in 'die' changes to 'dah', and 'we are' changes to 'weir' (Sound Example 13).

LINEAR PREDICTION VS FAST FOURIER TRANSFORM

FFT channels do not correctly represent the underlying spectrum of a sound. By the use of linear prediction, we can create a filter which closely matches the true underlying spectrum-it does this by modelling the resonant regions of the spectrum. We must realise that in order to find this filter, we are actually trying to predict the samples of a signal based on previous samples. In speech waveforms, where spectral changes occur in a smooth, interpolative fashion, the linear predictor is ideal.

In this prediction method there will of course be some error, due to the micro-fluctuant aspect of recorded signals. Mathematically, however, we can limit this error to the minimum, or in other words, we can create a 'best fit' filter. This is achieved by setting the derivate of the squared error to 0. These error values, interestingly enough, closely resemble the 'excitation signal' of an acoustic source. In speech signals, this excitation signal is the vibration of the glottis (for voiced speech) or noise-like breath sounds (for unvoiced speech).

The all-pole filter is a good model for speech, and in fact the filter coefficients that are derived closely describe the vocal tract resonance characteristics. If we run the inverse of this all-pole filter over the input, this results in a spectral flattening, to leave the us with the residual (excitation signal). The power of linear predictive coding comes in the ability to manipulate the residual or the formant filters. For example, by replacing the residual of voiced speech with white noise, we can effect a change from talking to whispering. This allows us to practice source-filter cross-synthesis, where the formant data from one sound is applied to the spectrum of another sound. This obviously initiates change in the object/substance field, and also in the gesture field to the effect where the listener can believe that the formant changes may be an attempt at communication. Indeed if speech formants are used as the input sound, we can end up with an interesting hybrid of speech and some other percept. The next example, from Chrysalis, demonstrates the application of the strong filtering inherent in the sound of a rolling metal ball to the spectrally flat sound of a piece of wood being scraped (Sound Example 14).

SPECTRAL BLURRING

Spectral blurring 'smears' spectral information over a given time-frame. More specifically, analysis of the sound is taken at a given time span, and then new synthesis channels are created using linear interpolations between the original analysis channels. Several notes on the perception of this effect: firstly temporal detail (e.g. transients, micro-fluctuations) tend to be lost, much the way as small detail is lost in a blurred photograph. More specifically, however, sounds that were originally noise-based tend to end up sounding more 'metallic', due to the way that the rapidly fluctuating analysis channels become 'frozen' in time (or, at least, change slowly enough for the ear to perceive the inherently inharmonic makeup of the spectrum). This next example, from Ombrisages, demonstrates the blurring of a noise-based sound to give a slightly pitched, more continuous percept (Sound Example 15).