EURASIP Journal on Applied Signal Processing 2003:10, 941–952 c ○ 2003 Hindawi Publishing Corporation Physically Informed Signal Processing Methods for Piano Sound Synthesis: A Research Overview

This paper reviews recent developments in physics-based synthesis of piano. The paper considers the main components of the instrument, that is, the hammer, the string, and the soundboard. Modeling techniques are discussed for each of these elements, together with implementation strategies. Attention is focused on numerical issues, and each implementation technique is described in light of its e ﬃ ciency and accuracy properties. As the structured audio coding approach is gaining popularity, the authors argue that the physical modeling approach will have relevant applications in the ﬁeld of multimedia communication.


INTRODUCTION
Sounds produced by acoustic musical instruments can be described at the signal level, where only the time evolution of the acoustic pressure is considered and no assumptions on the generation mechanism are made. Alternatively, source models, which are based on a physical description of the sound production processes [1,2], can be developed.
Physics-based synthesis algorithms provide semantic sound representations since the control parameters have a straightforward physical interpretation in terms of masses, springs, dimensions, and so on. Consequently, modification of the parameters leads in general to meaningful results and allows more intuitive interaction between the user and the virtual instrument. The importance of sound as a primary vehicle of information is being more and more recognized in the multimedia community. Particularly, source models of sounding objects (not necessarily musical instruments) are being explored due to their high degree of interactivity and the ease in synchronizing audio and visual synthesis [3].
The physical modeling approach also has potential applications in structured audio coding [4,5], a coding scheme where, in addition to the parameters, the decoding algorithm is transmitted to the user as well. The structured audio orchestral language (SAOL) became a part of the MPEG-4 standard, thus it is widely available for multimedia applications. Known problems in using physical models for coding purposes are primarily concerned with parameter estimation. Since physical models describe specific classes of instruments, automatic estimation of the model parameters from an audio signal is not a straightforward task: the model structure which is best suited for the audio signal has to be chosen before actual parameter estimation. On the other hand, once the model structure is determined, a small set of parameters can describe a specific sound. Casey [6] and Serafin et al. [7] address these issues.
In this paper, we review some of the strategies and algorithms of physical modeling, and their applications to piano simulation. The piano is a particularly interesting instrument, both for its prominence in western music and for its complex structure [8]. Also, its control mechanism is simple (it basically reduces to key velocity), and physical control devices (MIDI keyboards) are widely available, which is not the case for other instruments. The source-based approach can be useful not only for synthesis purposes but also for gaining a better insight into the behavior of the instruments. However, as we are interested in efficient algorithms, the features modeled are only those considered to have audible effects. In general, there is a trade-off between the accuracy and the simplicity of the description. The optimal solution may vary depending on the needs of the user.
The models described here are all based on digital waveguides. The waveguide paradigm has been found to be the most appropriate for real-time synthesis of a wide range of musical instruments [9,10,11]. As early as in 1987, Garnett [12] presented a physical waveguide piano model. In his model, a semiphysical lumped hammer is connected to a digital waveguide string and the soundboard is modeled by a set of waveguides, all connected to the same termination.
In 1995, Smith and Van Duyne [13,14] presented a model based on commuted synthesis. In their approach, the soundboard response is stored in an excitation table and fed into a digital waveguide string model. The hammer is modeled as a linear filter whose parameters depend on the hammer-string collision velocity. The hammer filter parameters have to be precalculated and stored for all notes and hammer velocities. This precalculation can be avoided by running an auxiliary string model connected to a nonlinear hammer model in parallel, and, based on the force response of the auxiliary model, designing the hammer filters in real time [15].
The original motivation for commuted synthesis was to avoid the high-order filter which is needed for high quality soundboard modeling. As low-complexity methods have been developed for soundboard modeling (see Section 5), the advantages of the commuted piano with respect to the direct modeling approach described here are reduced. Also, due to the lack in physical description, some effects, such as the restrike (ribattuto) of the same string, cannot be precisely modeled with the commuted approach. Describing the com-muted synthesis in detail is beyond the scope of this paper, although we would like to mention that it is a comparable alternative to the techniques described here.
As part of a collaboration between the University of Padova and Generalmusic, Borin et al. [16] presented a complete real-time piano model in 1997. The hammer was treated as a lumped model, with a mass connected in parallel to a nonlinear spring, and the strings were simulated using digital waveguides, all connected to a single-lumped load. Bank [17] introduced in 2000 a similar physical model, based on the same functional blocks, but with slightly different implementation. An alternative approach was used for the solution of the hammer differential equation. Independent string models were used without any coupling, and the influence of the soundboard on decay times was taken into account by using high-order loss filters. The use of feedback delay networks was suggested for modeling the radiation of the soundboard.
This paper addresses the design of each component of a piano model (i.e., hammer, string, and soundboard). Discussion is carried on with particular emphasis on real-time applications, where the time complexity of algorithms plays a key role. Perceptual issues are also addressed since a precise knowledge of what is relevant to the human ear can drive the accuracy level of the design. Section 2 deals with general aspects of piano acoustics. In Section 3, the hammer is discussed and numerical techniques are presented to overcome the computability problems in the nonlinear discretized system. Section 4 is devoted to string modeling, where the problems of parameter estimation are also addressed. Finally, Section 5 deals with the soundboard, where various alternative techniques are described and the use of the multirate approach is proposed.

ACOUSTICS AND MODEL STRUCTURE
Piano sounds are the final product of a complex synthesis process which involves the entire instrument body. As a result of this complexity, each piano note exhibits its unique sound features and nuances, especially in high quality instruments. Moreover, just varying the impact force on a single key allows the player to explore a rich dynamic space. Accounting for such dynamic variations in a wavetable-based synthesizer is not trivial: dynamic postprocessing filters which shape the spectrum according to key velocity can be designed, but finding a satisfactory mapping from velocity to filter response is far from being an easy task. Alternatively, a physical model, which mimics as closely as possible the acoustics of the instrument, can be developed.
The general structure of the piano is displayed in Figure 1a: an iron frame is attached to the upper part of the wooden case and the strings are extended upon this in a direction nearly perpendicular to the keyboard. The keyboardside end of the string is connected to the tuning pins on the pin block, while the other end, passing the bridge, is attached to the hitch-pin rail of the frame. The bridge is a thin wooden bar that transmits the string vibration to the soundboard, which is located under the frame. Since the physical modeling approach tries to simulate the structure of the instrument rather than the sound itself, the blocks in the piano model resemble the parts of a real piano. The structure is displayed in Figure 1b. The first model block is the excitation, the hammer strike. Its output propagates to the string, which determines the fundamental frequency of the tone. The quasiperiodic output signal is filtered through a postprocessing block, covering the radiation effects of the soundboard. Figure 1b shows that the hammerstring interaction is bidirectional since the hammer force depends on the string displacement [8]. On the other hand, there is no feedback from the radiator to the string. Feedback and coupling effects on the bridge and the soundboard are taken into account in the string block. The model differs from a real piano in the fact that the two functions of the soundboard, namely, to provide a terminating impedance to the strings and to radiate sound, are located in separate parts of the model. As a result, it is possible to treat radiation as a linear filtering operation.

THE HAMMER
We will first discuss the physical aspects of the hammerstring interaction, then concentrate on various modeling approaches and implementation issues.

Hammer-string interaction
As a first approximation, the piano hammer can be considered a lumped mass connected to a nonlinear spring, which is described by the equation where F(t) is the interaction force and y h (t) is the hammer displacement. The hammer mass is represented by m h . Experiments on real instruments have shown (see, e.g., [18,19,20]) that the hammer-string contact can be described by the following formula:  [19,20]. The hammer mass m h is given in kg.
where ∆y(t) = y h (t) − y s (t) is the compression of the hammer felt, y s (t) is the string position, k is the hammer stiffness coefficient, and p is the stiffness exponent. The condition ∆y(t) > 0 corresponds to the hammer-string contact, while the condition ∆y(t) ≤ 0 indicates that the hammer is not touching the string. Equations (1) and (2) result in a nonlinear differential system of equations for y h (t). Due to the nonlinearity, the tone spectrum varies dynamically with hammer velocity. Typical values of hammer parameters can be found in [19,20]. Example values are listed in Table 1. However, (2) is not fully satisfactory in that real piano hammers exhibit hysteretic behavior. That is, contact forces during compression and during decompression are different, and a one-to-one law between compression and force does not correspond to reality. A general description of the hysteresis effect of piano felts was provided by Stulov [21]. The idea, coming from the general theory of mechanics of solids, is that the stiffness k of the spring in (2) has to be replaced by a time-dependent operator which introduces memory in the nonlinear interaction. Thus, the first part of (2) (when ∆y(t) > 0) is replaced by where h r (t) = ( /τ)e −t/τ is a relaxation function that accounts for the "memory" of the material and the * operator represents convolution. Previous studies [22] have shown that a good fit to real data can be obtained by implementing h r as a first-order lowpass filter. It has to be noted that informal listening tests indicate that taking into account the hysteresis in the hammer model does not improve the sound quality significantly.

Implementation approaches
The hammer models described in Section 3.1 can be discretized and coupled to the string in order to provide a full physical description. However, there is a mutual dependence between (2) and (1), that is, the hammer position y h (n) at discrete time instant n should be known for computing the force F(n), and vice versa. The same problem arises when (3) is used instead of (2). This implicit relationship can be made explicit by assuming that F(n) ≈ F(n − 1), thus inserting a fictitious delay element in a delay-free path. Although this approximation has been extensively used in the literature (see, e.g., [19,20]), it is a potential source of instability.
The theory of wave digital filters addresses the problem of noncomputable loops in terms of wave variables. Every component of a circuit is described as a scattering element with a reference impedance, and delay-free loops between components are treated by "adapting" reference impedances. Van Duyne et al. [23] presented a "wave digital hammer" model, where wave variables are used. More severe computability problems can arise when simulating nonlinear dynamic exciters since the linear equations used to describe the system dynamics are tightly coupled with a nonlinear map. Borin et al. [24] have recently proposed a general strategy named "K method" for solving noncomputable loops in a wide class of nonlinear systems. The method is fully described in [24] along with some application examples. Here, only the basic principles are outlined.
Whichever the discretization method is, the hammer compression ∆y(n) at time n can be written as where p(n) is the linear combination of past values of the variables (namely, y h , y s , and F) and K is a coefficient whose value depends on the numerical method in use. The interaction force F(n) at discrete time instant n, computed either by (2) or (3), is therefore described by the implicit relation F(n) = f (p(n) + KF(n)). The K method uses the implicit function theorem to solve the following implicit relation: The new nonlinear map h defines F as a function of p, hence instantaneous dependencies across the nonlinearity are dropped. The function h can be precomputed and stored in a lookup table for efficient implementation. Bank [25] presented a simpler but less general method for avoiding artifacts caused by fictitious delay insertion. The idea is that the stability of the discretized hammer model with a fictitious delay can always be maintained by choosing a sufficiently large sampling rate f s if the corresponding continuous-time system is stable. As f s → ∞, the discretetime system will behave as the original differential equation. Doubling the sampling rate of the whole string model would double the computation time as well. However, if only the hammer model operates at double rate, the computational complexity is raised only by a negligible amount. Therefore, in the proposed solution, the hammer operates at twice sampling rate of the string. Data is downsampled using simple averaging and upsampled using linear interpolation. The multirate hammer has been found to result in well-behaving force signals at a low-computational cost. As the hammer model is a nonlinear dynamic system, the stability bounds are not trivial to derive in a closed form. In practice, stability is maintained up to an impact velocity ten times higher than the point where the straightforward approach (e.g., used in [19,20]) turns unstable. Figure 2 shows a typical force signal in a hammer-string contact. The overall contact duration is around 2 ms and the pulses in the signal are produced by reflections of force  waves at string terminations. The K method and the multirate hammer produce very similar force signals. On the other hand, inserting a fictitious delay element drives the system towards instability (the spikes are progressively amplified). In general, the multirate method provide results comparable to the K method for hammer parameters realistic for pianos, while it does not require that precomputed lookup tables be stored. On the other hand, when low-sampling rates (e.g., f s = 11.025 kHz) or extreme hammer parameters are used (i.e., k is ten times the value listed in Table 1), the system stability cannot be maintained by upsampling by a factor of 2. In such cases, the K method is the appropriate solution.
The computational approaches presented in this section are applicable to a wide class of mechanical interactions between physical objects [26].

THE STRING
Many different approaches have been presented in the literature for string modeling. Since we are considering techniques suitable for real-time applications, only the digital waveguide [9,10,11] is described here in detail. This method is based on the time-domain solution of the one-dimensional wave equation. The velocity distribution of the string v(x, t) can be seen as the sum of two traveling waves: where x denotes the spatial coordinate, t is time, c is the propagation speed, and v + and v − are the traveling wave components. Spatial and time-domain sampling of (6) results in a simple delay-line representation. Nonideal, lossy, and stiff strings can also be modeled by the method. If linearity and time invariance of the string are assumed, all the distributed losses and dispersion can be consolidated to one end of the digital waveguide [9,10,11]. In the case of one polarization of a piano string, the system takes the form shown in Figure 3, where M represents the length of the string in spatial sampling intervals, M in denotes the position of the force input, and H r (z) refers to the reflection filter. This structure is capable of generating a set of quasiharmonic, exponentially decaying sinusoids. Note that the four delay lines of Figure 3 can be simplified to a two-delay line structure for more efficient implementation [13]. Accurate design of the reflection filter plays a key role for creating realistic sounds. To simplify the design, H r (z) is usually split into three separate parts: where H l (z) accounts for the losses, H d (z) for the dispersion due to stiffness, and H f d (z) for finetuning the fundamental frequency. Using allpass filters H d (z) for simulating dispersion ensures that the decay times of the partials are controlled by the loss filter H l (z) only. The slight phase difference caused by the loss filter is negligible compared to the phase response of the dispersion filter. In this way, the loss filter and the dispersion filter can be treated as orthogonal with respect to design.
The string needs to be fine tuned because delay lines can implement only an integer phase delay and this provides too low resolution for the fundamental frequencies. Fine tuning can be incorporated in the dispersion filter design or, alternatively, a separate fractional delay filter H f d (z) can be used in series with the delay line. Smith and Jaffe [9,27] suggested to use a first-order allpass filter for this purpose. Välimäki et al. [28] proposed an implementation based on low-order Lagrange interpolation filters. Laakso et al. [29] provided an exhaustive overview on this topic.

Loss filter design
First, the partial envelopes of the recorded note have to be calculated. This can be done by sinusoidal peak tracking with short time Fourier transform implementation [28] or by heterodyne filtering [30]. A robust way of calculating decay times is fitting a line by linear regression on the logarithm of the amplitude envelopes [28]. The magnitude specification g k for the loss filter can be computed as follows: where f k and τ k are the frequency and the decay time of the kth partial, and f s is the sampling rate. Fitting a filter to the g k coefficients is not trivial since the error in the decay times is a nonlinear function of the filter magnitude error. If the magnitude response exceeds unity, the digital waveguide loop becomes unstable. To overcome this problem, Välimäki et al. [28,30] suggested the use of a one-pole loop filter whose transfer function is The advantage of this filter is that stability constraints for the waveguide loop, namely, a 1 < 0 and 0 < g < 1, are relatively simple. As for the design, Välimäki et al. [28,30] used a simple algorithm for minimizing the magnitude error in the mean squares sense. However, the overall decay time of the synthesized tone did not always coincide with the original one.
As a general solution for loss filter design, Smith [9] suggested to minimize the error of decay times of the partials rather than the error of the filter magnitude response. This assures that the overall decay time of the note is preserved and the stability of the feedback loop is maintained. Moreover, optimization with respect to decay times is perceptually more meaningful. The methods described hereafter are all based on this idea.
Bank [17] developed a simple and robust method for one-pole loop filter design. The approximate analytical formulas for decay times τ k of a digital waveguide with a onepole filter are as follows: where c 1 and c 3 are computed from the parameters of the one-pole filter of (8): where f 0 is the fundamental frequency and ϑ k = 2π f k / f s is the digital frequency of the kth partial in radians. Equation (9) shows that the decay rate σ k = 1/τ k is a second-order polynomial of frequency ϑ k with even order terms. This simplifies the filter design since c 1 and c 3 are easily determined by polynomial regression from the prescribed decay times. A weighting function of w k = τ 4 k has to be used to minimize the error with respect to τ k . Parameters g and a 1 of the one-pole loop filter are easily computed via the inverse of (10) from coefficients c 1 and c 3 .
In most cases, the one-pole loss filter yields good results. Nevertheless, when precise rendering of the partial envelopes is required, higher-order filters have to be used. However, computing analytical formulas for the decay times with highorder filters is a difficult task. A two-step procedure was suggested by Erkut [31]; in this case, a high-order polynomial is fit to the decay rates σ k = 1/τ k , which contains only terms of even order. Then, a magnitude specification is calculated from the decay rate curve defined by the polynomial, and this magnitude response is used as a specification for minimumphase filter design.
Another approach was proposed by Bank [17] who suggested the transformation of the specification. As the goal is to match decay times, the magnitude specification g k is transformed into a form g k,tr = 1/(1 − g k ) which approximates τ k , and a transformed filter H tr (z) is designed for the new specification by least squares filter design. The loss filter H l (z) is then computed by the inverse transform H l (z) = 1−1/H tr (z).
Bank and Välimäki [32] presented a simpler method for high-order filter design based on a special weighting function. The resulting decay times of the digital waveguide are computed from the magnitude responseĝ k = |H(e jϑk )| of the loss filter byτ k = d(ĝ k ) = −1/( f 0 lnĝ k ). This function is approximated by its first-order Taylor series around the Accordingly, the error with respect to decay times can be approximated by the weighted mean square error The weighted error e WLS can be easily minimized by standard filter design algorithms, and leads to a good match with respect to decay times. All of these techniques for high-order loss filter design have been found to be robust in practice. Comparing them is left for future work.
Borin et al. [16] have used a different approach for modeling the decay time variations of the partials. In their implementation, second-order FIR filters are used as loss filters that are responsible for the general decay of the note. Small variations of the decay times are modeled by connecting all the string models to a common termination which is implemented as a filter with a high number of resonances. This also enables the simulation of the pedal effect since now all the strings are coupled to each other (see Section 4.3). An advantage of this method compared to high-order loop filters is the smaller computational complexity. On the other hand, the partial envelopes of the different notes cannot be controlled independently.
Although optimizing the loss filter with respect to decay times has been found to give perceptually adequate results, we remark that the loss filter design can be helped via perceptual studies. The audibility of the decay-time variations for the one-pole loss filter was studied by Tolonen and Järveläinen [33]. The study states that relatively large deviations (between −25% and +40%) in the overall decay time of the note are not perceived by listeners. Unfortunately, theoretical results are not directly applicable for the design of high-order loss filters as the tolerance for the decay time variations of single partials is not known.

Dispersion simulation
Dispersion is due to stiffness which causes piano strings to deviate from ideal behavior. If the dispersive correction term in the wave equation is small, its first-order effect is to increase the wave propagation speed c( f ) with frequency. This phenomenon causes string partials to become inharmonic. If the string parameters are known, then the frequency of the kth stretched partial can be computed as where the value of the inharmonicity coefficient B depends on the parameters of the string (see, e.g., [34]). Phase delay specification D d ( f k ) for the dispersion filter H d (z) can be computed from the partial frequencies: where N is the total length of the waveguide delay line and D l ( f k ) is the phase delay of the loss filter H l (z). The phase specification of the dispersion filter becomes φ pre ( Van Duyne and Smith [35] proposed an efficient method for simulating dispersion by cascading equal first-order allpass filters in the waveguide loop; however, the constraint of using equal first-order sections is too severe and does not allow accurate tuning of inharmonicity.
Rocchesso and Scalcon [36] proposed a design method based on [37]. Starting from a target phase response, l points { f k } k=1,...,l are chosen on the frequency axis corresponding to the points where string partials should be located. The filter order is chosen to be n < l. For each partial k, the method computes the quantities where φ pre ( f ) is the prescribed allpass response. Filter coefficients a j are computed by solving the system n j=1 a j sin β k + 2jπ f k = − sin β k , k = 1, . . . , l. (15) A least-squared equation error (LSEE) is used to solve the overdetermined system (15). It was shown in [36] that several tens of partials can be correctly positioned for any piano key, with the allpass filter order not exceeding 20. Moreover, the fine tuning of the string is automatically taken into account in the design. Figure 4 plots results obtained using a filter order of 18. Note that the pure tone frequency JND (just noticeable difference) has been used in Figure 4b as a reference as no accurate studies of partial JNDs for piano tones are available to our knowledge. Since the computational load for H d (z) is heavy, it is important to find criteria for accuracy and order optimization with respect to human perception. Rocchesso and Scalcon [38] studied the dependence of the bandwidth of perceived inharmonicity (i.e., the frequency range in which misplacement of partials is audible) on the fundamental frequency by performing listening tests with decaying piano tones. The bandwidth has been found to increase almost linearly on a Järveläinen et al. [39] also found that inharmonicity is more easily perceived at low frequencies even when coefficient B for bass tones is lower than for treble tones. This is probably due to the fact that beats are used by listeners as cues for inharmonicity, and even low B's produce enough mistuning in higher partials of low tones. These findings can help in the allpass filter design procedure, although a number of issues still need further investigation. Figure 5: The multirate resonator bank.
As high-order dispersion filters are needed for modeling low notes, the computational complexity is increased significantly. Bank [17] proposed a multirate approach to overcome this problem. Since the lowest tones do not contain significant energy in the high-frequency region anyway, it is worthwhile to run the lowest two or three octaves of the piano at half-sampling rate of the model. The outputs of the low notes are summed before upsampling, therefore only one interpolation filter is required.

Coupled piano strings
String coupling occurs at two different levels. First of all, two or three slightly mistuned strings are sounded together when a single piano key is pressed (except for the lowest octave) and complicated modulation of the amplitudes is brought about. This results in beating and two-stage decay, the first refers to an amplitude modulation overlaid on the exponential decay, and the latter means that tone decays are faster in the early part than in the latter. These phenomena were studied by Weinreich as early as in 1977 [40]. At the second level, the presence of the bridge and the action of the soundboard is known to originate important coupling effects even between different tones. In fact, the bridge-soundboard system connects strings together and acts as a distributed driving-point impedance for string terminations.
The simplest way for modeling beating and two-stage decay is to use two digital waveguides in parallel for a single note. Varying by the used type of coupling, many different solutions have been presented in the literature, see, for example, [14,41].
Bank [17] presented a different approach for modeling beating and two-stage decay, based on a parallel resonator bank. In a subsequent study, the computational complexity of the method was decreased by an order of ten by applying multirate techniques, making the approach suitable for realtime implementations [42]. In this approach, second-order resonators R 1 (z) · · · R k (z) are connected to the basic string model S v (z) in parallel, rather than using a second waveguide. The structure is depicted in Figure 5. The idea comes from the observation that the behavior of two coupled strings can be described by a pair of exponentially damped sinusoids [40]. In this model, one sinusoid of the mode pair is simulated by one partial of the digital waveguide and the other one by one of the resonators R k (z). The transfer functions of the resonators are as follows: a k = A k e jϕk , p k = e j(2π fk/ fs)−1/ fsτk , where A k , ϕ k , f k , and τ k refer to the initial amplitude, initial phase, frequency, and decay-time parameters of the kth resonator, respectively. The overline stands for complex conjugation, Re indicates the real part of a complex variable, and f s is the sampling frequency. An advantage of the structure is that the resonators R k (z) are implemented only for those partials whose beating and two-stage decay are prominent. The others will have simple exponential decay, determined by the digital waveguide model S v (z). Five to ten resonators have been found to be enough for high-quality sound synthesis. The resonator bank is implemented by the multirate approach, running the resonators at a much lower-sampling rate, for example, the 1/8 or 1/16 part of the original sampling frequency.
It is shown in [42] that when only half of the downsampled frequency band is used for resonators, no lowpass filtering is needed before downsampling. This is due to the fact that the excitation signal is of lowpass character leading to aliasing less than −20 dB. As the role of the excitation signal is to set the initial amplitudes and phases of the resonators, the result of this aliasing is a less than 1 dB change in the resonator amplitudes, which has been found to be inaudible. On the other hand, the interpolation filters after upsampling cannot be neglected. However, they are not implemented for all notes separately; the lower-sampling rate signals of the different strings are summed before interpolation filtering (this is not depicted in Figure 5). Their specification is relatively simple (e.g., 5 dB passband ripple) since their passband errors can be easily corrected by changing the initial amplitudes and phases of the resonators. This results in significantly lower-computational cost, compared to the methods which use coupled waveguides.
Generally, the average computational cost of the method for one note is less than five multiplications per sample. Moreover, the parameter estimation gets simpler since only the parameters of the mode pairs have to be found by, for example, the methods presented in [17,41], and there is no need for coupling filter design. Stability problems of a coupled system are also avoided. The method presented here shows that combining physical and signal-based approaches can be useful in reducing computational complexity.
Modeling the coupling between strings of different tones is essential when the sustain pedal effect has to be simulated. Garnett [12] and Borin et al. [16] suggested connecting the strings to the same lumped terminating impedance. The impedance is modeled by a filter with a high number of peaks. For that, the use of feedback delay networks [43,44] is a good alternative. Although in real pianos the bridge connects to the string as a distributed termination, thus coupling different strings in different ways, the simple model of Borin et al. was able to produce a realistic sustain pedal effect [45].

RADIATION MODELING
The soundboard radiates and filters the string waves that reach the bridge, and radiation patterns are essential for describing the "presence" of a piano in a musical context. However, now we are concentrating on describing the sound pressure generated by the piano at a certain locus in the listening space, that is, the directional properties of radiation are not taken into account. Modeling the soundboard as a linear postprocessing stage is an intrinsically weak approach since on a real piano it also accounts for coupling between strings and affects the decay times of the partials. However, as already stated in Section 2, our modeling strategy keeps the radiation properties of the soundboard separated from its impedance properties. The latter are incorporated in the string model, and have already been addressed in Sections 4.1 and 4.3; here we will concentrate on radiation.
A simple and efficient radiation model was presented by Garnett [12]. The waveguide strings were connected to the same termination and the soundboard was simulated by connecting six additional waveguides to the common termination. This can be seen as a predecessor of using feedback delay networks for soundboard simulation. Feedback delay networks have been proven to be efficient in simulating room reverberation since they are able to produce high-modal density at a low-computational cost [43]. For an overview, see the work of Rocchesso and Smith [44]. Bank [17] applied feedback delay networks with shaping filters for the simulation of piano soundboards. The shaping filters were parametrized in such a way that the system matched the overall magnitude response of a real piano soundboard. A drawback of the method is that the modal density and the quality factors of the modes are not fully controllable. The method has proven to yield good results for high piano notes, where simulating the attack noise (the knock) of the tone is the most important issue.
The problem of soundboard radiation can also be addressed from the point of view of filter design. However, as the soundboard exhibits high-modal density, a high-order filter has to be used. For f s = 44.1 kHz, a 2000 tap FIR filter was necessary to achieve good results. The filter order did not decrease significantly when IIR filters were used.
To resolve the high-computational complexity, a multirate soundboard model was proposed by Bank et al. [46]. The structure of the model is depicted in Figure 6. The string signal is split into two parts. The part below 2.2 kHz is downsampled by a factor of 8 and filtered by a high-order filter H low (z) precisely synthesizing the amplitude and phase response of the soundboard for the low frequencies. The part above 2.2 kHz is filtered by a low-order filter, modeling the overall magnitude response of the soundboard at high frequencies. The signal of the high-frequency chain is delayed by N samples to compensate for the latency of decimation and interpolation filters of the low-frequency chain.
The filters H low (z) and H high (z) are computed as follows. First, a target impulse response H t (z) is calculated by measuring the force-pressure transfer function of a real piano soundboard. Then, this is lowpass-filtered and downsampled by a factor of 8 to produce an FIR filter H low (z). The impulse response of the low-frequency chain is now subtracted from the target response H t (z) providing a residual response containing energy above 2.2 kHz. This residual response is made minimum phase and windowed to a short length (50 tap). The multirate soundboard model outlined here consumes 100 operations per cycle and produces a spectral character similar to that of a 2000-tap FIR filter. The only difference is that the attack of high notes sounds sharper since the energy of the soundboard response is concentrated to a short time period above 2.2 kHz. This could be overcome by using feedback delay networks for H high (z), which is left for future research.
The parameters of the multirate soundboard model cannot be interpreted physically. However, this does not lead to any drawbacks since the parameters of the soundboard cannot be changed by the player in real pianos either. Having a purely physical model, for example, based on finite differences [47], would lead to unacceptable high-computational costs. Therefore, implementing a black box model block as a part of a physical instrument model seems to be a good compromise.

CONCLUSIONS
This paper has reviewed the main stages of the development of a physical model for the piano, addressing computational aspects and discussing problems that not only are related to piano synthesis but arise in a broad class of physical models of sounding objects.
Various approaches have been discussed for dealing with nonlinear equations in the excitation block. We have pointed out that inaccuracies at this stage can lead to severe instability problems. Analogous problems arise in other mechanical and acoustical models, such as impact and friction between two sounding objects, or reed-bore interaction. The two alternative solutions presented for the piano hammer can be used in a wide range of applications.
Several filter design techniques have been reviewed for the accurate tuning of the resonating waveguide block. It has been shown that high-order dispersion filters are needed for accurate simulation of inharmonicity. Therefore, perceptual issues have been addressed since they are helpful in optimizing the design and reducing computational loads. The requirement of physicality can always be weakened when the effect caused by a specific feature is considered to be inaudible.
A filter-based approach was presented for the soundboard model. As such, it cannot be interpreted as physical, but this does not influence the functionality of the model. In general, only those parameters which are involved in block interaction or are influenced by control messages need to have a clear physical interpretation. Therefore, we recommend synthesis structures that are based on building blocks with physical input and output parameters, but whose inner Figure 6: The multirate soundboard model. structure does not necessarily follow physical model. In other words, the basic building blocks are black box models with the most efficient implementations available, and they form the physical structure of the instrument model at a higher level. The use of multirate techniques was suggested for modeling beating and two-stage decay as well as the soundboard. The model can run at different sampling rates (e.g., 44.1, 22.05, and 11.025 kHz) or/and with different filter orders implemented in the digital waveguide model. Since the stability of the numerical structures is maintained in all cases, the user has the option of choosing between quality and efficiency. This remark is also relevant for potential applications in structured audio coding. In cases when instrument models are to be encoded and transmitted without the precise knowledge of the computational power of the decoder, it is essential that the stability is guaranteed even at low-sampling rates in order to allow graceful degradation.

ACKNOWLEDGMENTS
Work at CSC-DEI, University of Padova, was developed under a Research Contract with Generalmusic. Partial funding was provided by the EU Project "MOSART," Improving Human Potential, and the Hungarian National Scientific Research Fund OTKA F035060. The authors are thankful to P. Hussami and to the anonymous reviewers for their helpful comments, which have contributed to the improvement of the paper. Federico Avanzini received in 1997 the Laurea degree in physics from the University of Milano, with a thesis on nonlinear dynamical systems and full marks. From November 1998 to November 2001, he pursued a Ph.D. degree in computer science at the University of Padova, with a research project on "Computational issues in physically-based sound models." Within his doctoral activities (January to June, 2001), he worked as a visiting Researcher at the Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, where he was involved in the "Sound Source Modeling" project. He is currently a Postdoctoral Researcher at the University of Padova. His research interests include sound synthesis models in humancomputer interaction, computational issues, models for voice synthesis and analysis, and multimodal interfaces. Recent research experience includes participation to both national ("Sound models in human-computer and human-environment interaction") and European ("SOb-The Sounding Object," and "MEGA-Multisensory Expressive Gesture Applications") projects, where he has been working on sound synthesis algorithms based on physical models. Gianpaolo Borin received the Laurea degree in electronic engineering from the University of Padova, Italy, in 1990, with a thesis on sound synthesis by physical modeling. Since then, he has been doing research at the Center of Computational Sonology (CSC), University of Padova. He has also been working both as a Unix Professional Developer and as a Consultant Researcher for Generalmusic. He is a coauthor of a Generalmusic's US patent for digital piano postprocessing methods. His current research interest include algorithms and methods for the efficient implementation of physical models of musical instruments and tools for real-time sound synthesis. His main research interests are in algorithms for sound synthesis and analysis, models for expressiveness in music, multimedia systems and human-computer interaction, and preservation and restoration of audio documents. He is an author of several scientific international publications. He has also served in the Scientific Committees of international conferences. He is involved in the COST G6, MEGA, and MOSART-IHP European projects as a local coordinator. He is an owner of several patents on digital music instruments.

Call for Papers
The recently developed video coding standard H.264/MPEG-4 AVC significantly outperforms previous standards in terms of coding efficiency at reasonable implementation complexity and costs in VLSI realization. Real-time H.264 coders will be available very soon. Many applications, such as surveillance systems with multiple video channel recording, multiple channel video services for mobile devices, will benefit from the H.264 coder due to its excellent coding efficiency. The new video coding technology introduces new opportunities for video services and applications. However, advanced video coding is only one aspect for successful video services and applications. To enable successful new applications, additional technologies to cope with time-varying channel behaviors and diverse usage characteristics are needed. For serving multiple videos, some extended designs such as joint rate-distortion optimization and scheduling of multiple parallel video sessions are also required to achieve fair and robust video storage and delivery. For video surveillance systems, intelligent video content analysis and scalabilities in video quality, resolution, and display area, coupled with wireless transmission, can offer new features for the application. Finally, computational complexity reduction and lowpower design of video codecs as well as content protection of video streams are particularly important for mobile devices.
The goal of this special issue is to discuss state-of-the-art techniques to enable various video services and applications on H.264/AVC technologies and their new developments.
Topics of interest include (but are not limited to): •

Call for Papers
Almost every multichannel measurement includes mixtures of signals from several underlying sources. While the structure of the mixing process may be known to some degree, other unknown parameters are necessary to demix the measured sensor data. The time courses of the source signals and/or their locations in the source space are often unknown a priori and can only be estimated by statistical means. In the analysis of such measurements, it is essential to separate the mixed signals before beginning postprocessing.
Blind source separation (BSS) techniques then allow separation of the source signals from the measured mixtures. Many BSS problems may be solved using independent component analysis (ICA) or alternative approaches such as sparse component analysis (SCA) or nonnegative matrix factorization (NMF), evolving from information theoretical assumptions that the underlying sources are mutually statistically independent, sparse, smooth, and/or nonnegative.
The aim of this special issue is to focus on recent developments in this expanding research area.
The special issue will focus on one hand on theoretical approaches for single-and multichannel BSS, evolving from information theory, and especially on nonlinear blind source separation methods, and on the other hand or their currently ever-widening range of applications such as brain imaging, image coding and processing, dereverberation in noisy environments, and so forth.
Authors should follow the EURASIP JASP manuscript format described at http://www.hindawi.com/journals/asp/. Prospective authors should submit an electronic copy of their complete manuscript through the EURASIP JASP manuscript tracking system at http://www.mstracking.com/asp/, according to the following timetable:

Special Issue on
Tracking in Video Sequences of Crowded Scenes

Call for Papers
Object tracking in live video is an enabling technology that is in strong demand by large application sectors, such as video surveillance for security and behavior analysis, traffic monitoring, sports analysis for enhanced TV broadcasting and coaching, and human body tracking for human-computer interaction and movie special effects. Many techniques and systems have been developed and demonstrated for tracking objects in video sequences. The specific goal of this special issue is to provide a status report regarding the state of the art in object tracking in crowded scenes based on the video stream(s) of one or more cameras. The objects can be people, animals, cars, and so forth. The cameras can be fixed or moving. Moving cameras may pan, tilt, and zoom in ways that may or may not be communicated to the tracking system.
All papers submitted must address at least the following two issues: • Processing of live video feeds For many applications in surveillance/security and TV sports broadcasting, the results of processing have value only if they can be provided to the end user within an applicationdefined delay. The submitted papers should present algorithms that are plausibly applicable to such incremental ("causal") processing of live video feeds, given suitable hardware.
• Handling of crowded scenes Crowded-scene situations range from relatively simple (e.g., players on a planar field in a soccer match) to very difficult (e.g., crowds on stairs in an airport or a train station). The central difficulties in crowded scenes arise from the constantly changing occlusions of any number of objects by any number of other objects.
Occlusions can be resolved to some degree using a single video stream. However, many situations of occlusion are more readily resolved by the simultaneous use of several cameras separated by wide baselines. In addition to resolving ambiguities, multiple cameras also ease the exploitation of 3D structure, which can be important for trajectory estimation or event detection.
Topics of interest include principles and evaluation of relevant end-to-end systems or important components thereof, including (but not limited to): • Handling of occlusions in the image plane in singlecamera scenarios • Handling of occlusions in a world coordinate system (3D, possibly degenerated to 2D) in single-or multicamera scenarios • Fusion of information from multiple cameras and construction of integrated spatiotemporal models of dynamic scenes • 3D trajectory estimation • Tracking of multiple rigid, articulated, or nonrigid objects • Automatic recovery of camera pose from track data • Detection and recognition of events involving multiple objects (e.g., offside in soccer)

Call for Papers
Subspace-based techniques have been studied extensively over the past two decades and have proven to be very powerful for estimation and detection tasks in many signal processing and communications applications. Such techniques were initially investigated in the context of super-resolution parametric spectral analysis and the related problem of direction finding. During the past decade or so, new potential applications have emerged, and subspace methods have been proposed in several diverse fields such as smart antennas, sensor arrays, system identification, time delay estimation, blind channel estimation, image segmentation, speech enhancement, learning systems, and so forth. Subspace-based methods not only provide new insight into the problem under investigation but they also offer a good trade-off between achieved performance and computational complexity. In most cases they can be considered as low cost alternatives to computationally intensive maximum likelihood approaches.
The interest of the signal processing community in subspace-based schemes remains strong as is evident from the numerous articles and reports published in this area each year. Research efforts are currently focusing on the development of low-complexity adaptive implementations and their efficient use in applications, numerical stability, convergence analysis, and so forth.
The goal of this special issue is to present state-of-the-art subspace techniques for modern applications and to address theoretical and implementation issues concerning this useful methodology.
Topics of interest include (but are not limited to): • Efficient and stable subspace estimation and tracking methods • Subspace-based detection techniques • Sensor array signal processing • Smart antennas • Space-time, multiuser, multicarrier communications • System identification and blind channel estimation • State-space model estimation and change detection • Learning and classification • Speech processing (enhancement, recognition) • Biomedical signal processing • Image processing (face recognition, compression, restoration)

Image Perception Call for Papers
Perception is a complex process that involves brain activities at different levels. The availability of models for the representation and interpretation of the sensory information opens up new research avenues that cut across neuroscience, imaging, information engineering, and modern robotics. The goal of the multidisciplinary field of perceptual signal processing is to identify the features of the stimuli that determine their "perception," namely "a single unified awareness derived from sensory processes while a stimulus is present," and to derive associated computational models that can be generalized.
In the case of vision, the stimuli go through a complex analysis chain along the so-called "visual pathway," starting with the encoding by the photoreceptors in the retina (low-level processing) and ending with cognitive mechanisms (high-level processes) that depend on the task being performed.
Accordingly, low-level models are concerned with image "representation" and aim at emulating the way the visual stimulus is encoded by the early stages of the visual system as well as capturing the varying sensitivity to the features of the input stimuli; high-level models are related to image "interpretation" and allow to predict the performance of a human observer in a given predefined task.
A global model, accounting for both such bottom-up and top-down approaches, would enable the automatic interpretation of the visual stimuli based on both their low-level features and their semantic content.
Among the main image processing fields that would take advantage of such models are feature extraction, contentbased image description and retrieval, model-based coding, and the emergent domain of medical image perception.
The goal of this special issue is to provide original contributions in the field of image perception and modeling.
Topics of interest include (but are not limited to): • Perceptually plausible mathematical bases for the representation of visual information (static and dynamic) • Modeling nonlinear processes (masking, facilitation) and their exploitation in the imaging field (compression, enhancement, and restoration) • Beyond early vision: investigating the pertinence and potential of cognitive models (feature extraction, image quality) • Stochastic properties of complex natural scenes (static, dynamic, colored) and their relationships with perception • Perception-based models for natural (static and dynamic) textures. Theoretical formulation and psychophysical validation • Applications in the field of biomedical imaging (medical image perception)

Call for Papers
The main focus of this special issue is on the application of digital signal processing techniques for music information retrieval (MIR). MIR is an emerging and exciting area of research that seeks to solve a wide variety of problems dealing with preserving, analyzing, indexing, searching, and accessing large collections of digitized music. There are also strong interests in this field of research from music libraries and the recording industry as they move towards digital music distribution. The demands from the general public for easy access to these music libraries challenge researchers to create tools and algorithms that are robust, small, and fast. Music is represented in either encoded audio waveforms (CD audio, MP3, etc.) or symbolic forms (musical score, MIDI, etc.). Audio representations, in particular, require robust signal processing techniques for many applications of MIR since meaningful descriptions need to be extracted from audio signals in which sounds from multiple instruments and vocals are often mixed together. Researchers in MIR are therefore developing a wide range of new methods based on statistical pattern recognition, classification, and machine learning techniques such as the Hidden Markov Model (HMM), maximum likelihood estimation, and Bayes estimation as well as digital signal processing techniques such as Fourier and Wavelet transforms, adaptive filtering, and source-filter models. New music interface and query systems leveraging such methods are also important for end users to benefit from MIR research.
Although research contributions on MIR have been published at various conferences in 1990s, the members of the MIR research community meet annually at the International Conference on Music Information Retrieval (ISMIR) since 2000.
Topics of interest include (but are not limited to): • Automatic summarization (succinct representation of music) • Automatic transcription (audio to symbolic format conversion) • Music annotation (semantic analysis) • Music fingerprinting (unique identification of music) • Music interface • Music similarity metrics (comparison) • Music understanding • Musical feature extraction • Musical styles and genres • Optical music score recognition (image to symbolic format conversion) • Performer/artist identification • Query systems • Timbre/instrument recognition