Sonic Watermarking

Audio watermarking has been used mainly for digital sound. In this paper, we extend the range of its applications to live performances with a new composition method for real-time audio watermarking. Sonic watermarking mixes the sound of the watermark signal and the host sound in the air to detect illegal music recordings recorded from auditoriums. We propose an audio watermarking algorithm for sonic watermarking that increases the magnitudes of the host signal only in segmented areas pseudorandomly chosen in the time-frequency plane. The result of a MUSHRA subjective listening test assesses the acoustic quality of the method in the range of “excellent quality.” The robustness is dependent on the type of music samples. For popular and orchestral music, a watermark can be stably detected from music samples that have been sonic-watermarked and then once compressed in an MPEG 1 layer 3 ﬁle.


INTRODUCTION
A digital audio watermark has been proposed as a means to identify the owner or distributor of digital audio data [1,2,3,4].Proposed applications of audio watermarks are copyright management, annotation, authentication, broadcast monitoring, and tamper proofing.For these purposes, the transparency, data payload, reliability, and robustness of audio watermarking technologies have been improved by a number of researchers.Recently, several audio watermarking techniques that work by modifying magnitudes in the frequency domain were proposed to achieve robustness against distortions such as time scale modification and pitch shifting [5,6,7].
Of the various applications, the primary driving forces for audio watermarking research have been the copy control of digital music and searching for illegally copied digital music, as can be seen in The Secure Digital Music Initiative (http://www.sdmi.org/)and the Japanese Society for the Rights of Authors, Composers and Publishers (Final selection of technology toward the global spread of digital audio watermarks, http://www.jasrac.or.jp/ ejhp/release/2000/1006.html, October 2001).In these usages, it is natural to consider that an original music sample, which is the target of watermark embedding, exists as a file stored digitally on a computer.However, music is performed, created, stored, and listened to in many different ways, and it is much more common that music is not stored as a digital file on a computer.
Earlier research [8] proposed various composition methods for real-time watermark embedding and showed how they can extend the range of applications of audio watermarks.In a proposed composition method named "analog watermarking," a trusted conventional analog mixer is used to mix the host signal (HS) and the watermark signal (WS) after the WS is generated by a computer and converted to an analog signal.This composition method makes it unnecessary to convert the analog HS to a digital signal, since the conversion results in a risk of interrupting and delaying the playback of the HS.
At the same time, another composition method named "sonic watermarking" was proposed.This composition method mixes the sound of the WS and the host sound in the air so that the watermark can be detected from a recording of the mixed sound.The method will allow searching for bootleg recordings on the Internet, that is, illegal music files that have been recorded in auditoriums by untrustworthy audience members using portable recording devices.The recordings are sometimes burned on audio CDs and even sold at shops, or distributed via the Internet.Countermeasures, such as examining the audience members' personal belongings at auditorium entrances, have been used for decades to cope with this problem.The ease of distribution in the broadband Internet has increased the problem of bootleg recordings.For movies, applications of video watermarking to digital cinema have been gathering increasing attention recently [9,10].One of the purposes is to prevent a handy cam attack, which is a recording of the movie made at a theatre.However, neither digital watermarking, encryption, nor streaming can be used in live performances, so there has been no efficient means to protect the copyrights of live performances in the Internet era.
In this paper, we carefully consider the application model and the possible problems of sonic watermarking, which was briefly proposed in [8], and report the results of intensive robustness tests and a multiple stimulus with hidden reference and anchors [11] (MUSHRA) subjective listening test which we performed to investigate the effects of critical factors of sonic watermarking, such as the delay and the distance between the sound sources of the HS and the WS.
The paper is organized as follows.In Section 2, we describe the usage scenario of sonic watermarking.Some possible problems limiting the use of sonic watermarking are listed in Section 3. In Section 4, we describe a watermarking algorithm that is designed to solve some of the problems.The acoustic quality of the algorithm is assessed by a subjective listening test described in Section 5.The robustness of the algorithm is shown by experimental results in Section 6.In Section 7, we present some concluding remarks.

SONIC WATERMARKING
In sonic watermarking, the watermark sound generated by a watermark generator is mixed with the host sound in the air (Figure 1).A watermark generator is a device that is equipped with a microphone, a speaker, and a computer.The host sound is captured using the microphone, the computer calculates the WS, and the WS enters the air from the speaker.The reason that the computer needs to be fed the host sound is to calculate the frequency masking effect [12] of the host sound.The lifecycle of a bootleg recording containing sonic watermarks is illustrated in Figure 2.While broken lines with arrowheads indicate sonic propagation, the solid lines indicate wired analog transmissions or digital file transfers.For example, the untrustworthy audience member may compress the bootleg recording as an MP31 file and upload it to the Internet.They may attack the sonic watermarking before compression.The recording device may be an analog cassette tape recorder, an MP3 recorder, a minidisc recorder, and so forth.
Note that sonic watermarking is not necessary in live performances where the sound of the musical instruments and the performers are mixed and amplified using analog electronic devices.Analog watermarking [8] can be used instead.

PROBLEMS
In this section, we classify the possible problems that may limit the use of sonic watermarking into three major categories: (1) real-time embedding, (2) robustness, and (3) acoustic quality.Though all of the other problems of digital audio watermarking are also problems of sonic watermarking, they are not listed here.

Problems related to real-time embedding
The major problems related to real-time embedding are the performance of the watermark embedding process and the delay of the WS.
(1) Performance.Watermark embedding faster than realtime is the minimum condition for sonic watermarking.The computational load of the watermark generator must be kept low enough for stable real-time production of the WS.A watermark embedding algorithm faster than real-time was also reported by [14].
(2) Delay.Even when the watermark generator works in real-time, the watermark sound will be delayed relative to the host sound.We will discuss the problems of robustness and acoustic quality caused by the delay in later sections.
The delay consists of a prerecording delay and a delay inside the watermark generator.The prerecording delay is the time required for the sound to propagate from the source of the host sound to the microphone of the watermark generator.For example, when the distance is 5 m, the prerecording delay will be approximately 15 milliseconds.
The delay inside the watermark generator is caused by the recording buffers, playback buffers, and WS calculations (Figure 3).Though the length of the playback buffers and the recording buffers can be reduced using technologies, such as ASIO 2 software and hardware, it is impossible to reduce them to zero.The WS calculation causes two kinds of delay.The first is that it is necessary to store a discrete Fourier transform (DFT) frame of the HS to calculate its power spectrums.The second is the elapsed time for the WS calculation.

Robustness
Possible causes interfering with successful detection can be roughly categorized into (1) deteriorations after recording and (2) deteriorations before and during recording by the untrustworthy audience member.After recording, the untrustworthy audience member may try to delete the watermark from the bootleg recording.The possible attacks include compression, analog conversion, trimming, pitch shifting, random sample cropping, and so forth.As for deteriorations before and during recording, the following items have to be considered.
(1) Delay of the watermark signal.When the WS is delayed, the phase of the HS drastically changes during the delay, so the phases of the HS and the WS become almost independent.Watermarking algorithms assuming perfect synchronization of the phases suffer serious damage from the delay.
(2) Reverberations.Reverberations of the auditorium must be mixed into the host sound and the watermark sound.
(3) Noises made by audience.Noises made by sources other than the musical instruments become disturbing fac-tors for watermark detection.Such sounds include voices and applause from audience members and rustling noises made by hands touching the recording device.If microphones directed towards the audience record the loud noise of the audience, and if the watermark generator utilizes the masking effect of the audience noise as well, detection of the watermark will be easier.However, since it is impossible to record noises that are made near widely scattered portable recording devices, the noise inevitably interferes with watermark detection.
(4) Multiple watermark generators.In some cases, arrangements using multiple watermark generators would be better to reflect the actual masking effects of each audience member.When using multiple watermark generators, it would be also necessary to consider their mutual interference.

Acoustic quality
There are several factors that may make the acoustic quality of sonic watermarking worse than that of digital audio watermarking.
(1) Strength of the watermark signal.Because the efficiency of watermark embedding is worse and more severe deterioration is expected in the sound than for digital audio watermarking, the WS must be relatively louder than a digital audio watermark.This results in lower acoustic quality.
(2) Delay of the watermark signal.An example would be when the host sound includes a drumbeat that abruptly diminishes, and the delayed watermark sound stands out from the host sound and results in worse acoustic quality.There is a "postmasking effect" that occurs after the masker diminishes [12].For the first 5 milliseconds after the masker diminishes, the amount of the postmasking effect is as high as simultaneous masking.After the 5 milliseconds, it starts an almost exponential decay with a time constant of 10 milliseconds.Therefore, if the delay of the watermark sound is short enough, the postmasking effect is expected to mask the watermark sound.However, the longer the delay, the more the host sound changes, and the weaker the masking from the postmasking effect.
(3) Differences of the masker.The HS captured by the microphone of the watermark generator is different from the host sound that the audience listens to.Hence, the masking effect calculated by the generator will also be different from the actual masking effect as heard by the audience.
(4) Different locations of the sound sources.While the sources of the host sound may be spread around the auditorium stage, the sources of the watermark sound must be limited to a few locations, even if multiple watermark generators are used.The difference in the direction and the distance of the sources of the watermark sound and the host sound from each audience member will have a negative effect on the acoustic quality.

ALGORITHMS
A modified spread spectrum audio watermarking algorithm that has an advantage in its robustness against audio processings such as geometric distortions of the audio signal was proposed in [6,15].Since the algorithm is not applicable to sonic watermarking because of the delay of the WS, we altered the embedding algorithm.If the same values of parameters were used, the same previous detection algorithm can detect the watermark from the content, whether the previous algorithm or the modified algorithm is used for watermarking.However, because this is the first intensive experiments of sonic watermarking, more priority was given to the basic robustness against sonic propagation and noise addition than to the robustness against geometric distortion.Therefore, different parameter values from [15] were used in the experiments, and robustness against geometric distortions was not tested.

Basic concepts
The method can be summarized as follows.The method embeds a multiple-bit message in the content by dividing it into short messages and embedding each of them together with a synchronization signal in a pattern block.The synchronization signal is an additional bit whose value is always 1.The pattern block is defined as a two-dimensional segmented area in the time-frequency plane of the content (Figure 4a), which is constructed from the sequence of power spectrums calculated using short-term DFTs.A pattern block is further divided into tiles.We call the tiles in row a subband.A tile consists of four consecutive overlapping DFT frames.A pseudorandom number is selected corresponding to each tile (Figure 4b).We denote the value of the pseudorandom number assigned to the tile at the bth subband in the tth frame by ω t,b , which is +1 or −1.The previous algorithm decreased the magnitudes of the HS in the tiles assigned −1 (Figure 5b).However, because it is impossible to decrease the magnitudes of the HS in the case of sonic watermarking, the proposed algorithm makes the WS zero in those tiles (Figure 5d).For the tiles with a positive sign, the magnitudes and the phases of the WS are given as in the previous method.However, because of the delay, to give the WS the same phases as the HS at the computer has almost the same effect as giving the WS a random phase (Figure 5c).We denote the value of the bit assigned to the tile by B t,b , which is 1 or 0. The values of the pseudorandom numbers and the tile assignments of the bits are determined by a symmetric key shared by the embedder and the detector.

Watermark generation
The watermark generation algorithm calculates the complex spectrum, c t, f , of the f th frequency bin in the tth frame of a pattern block of the content by using the DFT analysis of a frame of the content.We denote the magnitude and the phase of the bin by a t, f and θ t, f , respectively.Then the algorithm calculates the inaudible level of the magnitude modification by using a psychoacoustic model based on the complex spectrum.We indicate this amount of the f th frequency of the tth frame in a pattern block by p t, f .We use this amount for the magnitude in the f th frequency bin of the WS.
A sign, s t,b , which determines whether to increase or leave unchanged the magnitudes of the HS in a tile is calculated from the pseudorandom value, ω t,b , the bit value, B t,b , and the location, t, of the frame in the block.If the frame is in the first two frames of a row of tiles, that is, if the remainder of dividing t by 4 is less than 2, then . This is because, by embedding opposite signs in the first and last two frames of a tile and by detecting the watermark using the difference of the magnitudes, cancellation of the HS can make the detection robust.In the tiles where the calculated sign, s t,b , is positive, the phase of the HS, θ t, f , is used for the phase, φ t, f , in the f th frequency bin of the WS, while we assume the f th frequency is in the bth subband.In the tiles with a negative sign, the magnitude p t, f and the phase φ t, f is set to zero.At this point in the procedure, the magnitude p t, f and the phase φ t, f of the WS have been calculated.The WS is converted to the time domain using inverse DFTs.
This procedure increases the magnitudes of the HS by p t, f only in the tiles with a positive sign.This change makes the power distribution of the content nonuniform, and hence makes detection possible.However, because the efficiency of magnitude modification is much worse than in the previous algorithm, a decrease of the detected watermark strength is inevitable.It is necessary to use a stronger WS than that the previous method uses.

Psychoacoustic model
The ISO-MPEG 1 audio psychoacoustic model 2 for layer 3 [13] is used as the basis of the psychoacoustic calculations for the experiments, with some alterations: (i) an absolute threshold was not used for these experiments.We believe this is not suitable for practical watermarking because it depends on the listening volume and is too small in the frequencies used for watermarking, (ii) a local minimum of masking values within each frequency subband was used for all frequency bins in the subband.Excessive changes to the WS magnitudes do not contribute to the watermark strength, and they also lower the acoustic quality by increasing the WS, (iii) a 512-sample frame, 256-sample IBLEN, 3 and a sine window were used for the DFT for the psychoacoustic analysis to reduce the computational cost.
Due to the postmasking effect, a shorter DFT frame is expected to result in better acoustic quality, because of the shorter delay.However, the poor frequency resolution caused by a too short DFT frame reduces the detected watermark strength.This is the reason a 512-sample DFT frame was selected for the implementation.

Watermark detection
The detection algorithm calculates the magnitudes of the content for all tiles and correlates these magnitudes with the pseudorandom array.The magnitude a t, f of the f th frequency in the tth frame of a pattern block of the content is calculated by the DFT analysis of a frame of the content.A frame overlaps the adjacent frames by a half window.The magnitudes are then normalized by the average of the magnitudes in the frame.We denote a normalized magnitude by a t, f .The difference between the logarithmic magnitudes of a frame and the next nonoverlapping frame is taken as P t, f = log a t, f − log a t, f +2 .The magnitude Q t,b of a tile located at the bth subband of the tth frame in the block is calculated by averaging the P t, f s in the tile.The detected watermark strength for the jth bit in the tile is calculated as the cross-correlation of the pseudorandom numbers and the normalized magnitudes of the tiles by where Q is the average of Q t,b , and the summations are calculated for the tiles assigned for the bit.Similarly, the synchronization strength is calculated for the synchronization signal.
The watermark strength for a bit is calculated after synchronizing to the first frame of the block.The synchronization process consists of a global synchronization and a local adjustment.In the global synchronization, assuming that correct synchronization positions of several consecutive blocks 3 IBLEN is a length parameter used by the MPEG 1 psychoacoustic model [13].The analysis window for the psychoacoustic calculation process is shifted by IBLEN for each FFT. are separated by the same number of frames, the synchronization strengths detected from blocks that are separated by the same number of frames are summed up, and the frame that gives the maximum summed synchronization strength is chosen.In the local adjustment, the frame with the locally maximum synchronization strength is chosen from a few neighboring frames.In [15], the synchronization process is described in more detail.

Implementation
We implemented a watermark generator that can generate sonic watermarks in real time and a detector that can detect 64-bit messages in 30-second pieces of music A Pentium IV 2.2 GHz Windows XP PC equipped with a Sound Blaster Audigy Platinum sound card by Creative Technology, Ltd. was used for the platform.The message is encoded in 448 bits by adding 8 cyclic redundancy check (CRC) parity bits, using turbo coding, and repeating it twice.Each pattern block has 3 bits and a synchronization signal embedded, and the block has 24 columns and 8 rows of tiles.Each of the 24 frequency subbands is given an equal bandwidth of 6 frequency bins.The frequency of the highest bin used is 12.7 kHz.The length of a DFT frame is 512 samples to shorten the delay.Based on the psychoacoustic model, the root mean square power of the WS is 23.0 dB lower than that of the HS on average.Examples of watermark signals generated for a popular song and a trumpet solo are shown in Figure 6.
At the time of detection, while 48 tiles out of the 192 tiles are dedicated to the local adjustment of the pattern block synchronization, the tiles assigned for the bits are also used for the global synchronization.For the global synchronization, it is assumed that 16 consecutive blocks have consistent synchronization positions.The false alarm error ratio is theoretically under 10 −5 , based on the threshold of the square means of the detected bit strengths.Another threshold on the estimated watermark SNR is set to keep the code word error ratio under 10 −5 .The reasons to use both thresholds are described in [16].

Delay
The delay of the WS was approximately 17.8 milliseconds in total.The details are as follows.A total of 128 samples for both the playback buffer and the recording buffer were required for stable real-time watermark generation.The length of a DFT frame was 512 samples.The watermark calculation process took approximately 3.1% of the playback time.Since the length of a DFT frame was 512 samples, the elapsed time for the WS calculation corresponds approximately to the playback time for 16 samples.Hence, the total delay was 128 + 128 + 512 + 16 = 784 samples, which was about 17.8 milliseconds for 44.1 kHz sampling.

ACOUSTIC QUALITY
The evaluation of the subjective audio quality of the algorithm was done by a MUSHRA [11] listening test.The effects of two factors that can be considered to be particularly important for the use of sonic watermarking are also   2) the angle between the sound sources of the WS and the HS (as measured from the listener's location).The test samples were monaural excerpts from popular music, orchestral music, and instrumental solos as described in Table 1.The mean duration of the samples was 12.3 seconds.All of the test signals were sampled at a frequency of 44.1 kHz and with a bit resolution of 16 bits.All of them were upsampled to 48 kHz before the test to adjust to the listening equipment.Though most of the 18 subjects were inexperienced listeners, there were training sessions in advance of the test in which they were exposed to the full range and nature of all of the test signals.To give anchors for comparison, the subjects were also required to assess the audio quality of hidden references (hr), 4 7 kHz lowpass filtered samples (al7), and samples which had been compressed in MP3 files with a bit rate of 48 kbps (am48) or 64 kbps (am64) for a monaural channel using the Fraunhofer codec of Musicmatch Jukebox 7.20.The references (r), the hidden references, and the anchors were played by the speaker SP1 (Figure 7).The other test signals (Table 2) were as described below. 4Though the test signals of the hidden references were identical to the reference signals, the subjects were required to assess their quality without knowing which were which.(i) sd10 sonic watermark with a delay of 10 milliseconds.While the HS completely identical to the reference was played from SP1, a WS that had been computed in advance based on the HS was simultaneously played from another speaker, SP2, with a delay of 10 milliseconds.SP2 was offset from the direction of SP1 by 4.3 • .The subjects listened to the mixed sound of the HS and the WS.
(ii) sd20 sonic watermark with a delay of 20 milliseconds.The same WS used for sd10 was played from SP2 with a delay of 20 milliseconds, which is close to the delay of our implementation.
(iii) sd40 sonic watermark with a delay of 40 milliseconds.The WS was played from SP2 with a delay of 40 milliseconds.
(iv) sa15 sonic watermark with an angle of 15 • .The WS was played from another speaker, SP3, with a delay of 20 milliseconds.SP3 was offset 15 • from SP1.
(v) sa30 sonic watermark with an angle of 30 • .The WS was played from another speaker, SP4, with a delay of 20 milliseconds.SP4 was offset 30 • from SP1.

Results
The mean and 95% confidence interval of the subjective acoustic quality of the test signals are shown in Figure 8.The quality of sonic watermarks with a delay equal to or less than 20 milliseconds was assessed in the range of "excellent"  3. Watermark signal with 40 milliseconds delay quality.Though the WSs were not inaudible, the acoustic quality for most of the test samples can be considered to be good enough for the realistic use.

Effect of the delay
The relationship of the quality and the delay is shown in Figure 9.Most subjects could notice acoustic impairments in sd40 and reduced its score to "good" quality.Especially in the case of castanets (Figure 10), the watermark sound with a large delay could be heard as additional small castanets.A similar effect also occurred for drumbeats and cymbals in the popular music (Figure 11).In those cases, the subjects perceived increased noisiness at the higher frequencies.For the test samples in which long notes were held for some seconds (Figure 12), the effect of the delay was low.In general, the quality difference between sd10 and sd20 was assessed to be small, and subjects sometimes gave sd20 better evaluations than sd10.

Effect of the sound source direction
The relationship of the quality and the sound source direction is shown in Figure 13.The effect was so large that sa30 was assessed in the range of "fair."When the WS was played from SP4, the subjects noticed the difference by perceiving a weak stereo effect.However, in the case of sd20, even though the WS was played from SP2 in addition to the HS from SP1, the subjects perceived the mixed sound as a monaural sound.The effect was particularly prominent for the test samples for which the effect of the delay was distinguishable.Although the situation would be more complicated with multiple sources of the host sound for the realistic use of sonic watermarking, the experimental results suggest the sound source of the WS should be placed as close to the source of the host sound as possible.

ROBUSTNESS
We tested the robustness of the algorithm against transformations that are important for the lifecycle of sonic

Results
We measured the correct detection rates (CDRs) at which the correct 64-bit messages were detected.The error correction and detection algorithm successfully avoided the detection of an incorrect message.

Robustness against MP3 compression
Table 5 shows the results for sonic watermarking and MP3 compression."Digital WM" means that the WS was digitally added to the HS with a delay of 20 milliseconds."Sonic WM" means that the sound of the WS was mixed with the host sound in the air and recorded by a microphone.We used the same experimental equipment as used for sd20 of the listening test.For the "original watermark," the watermark was detected immediately after watermark embedding as described above.For "MP3," the watermarked signal was compressed in an MP3 file with the specified bit rate for a monaural channel and then decompressed before watermark detection.For popular music and orchestral music, correct watermarks were detected from over 95% of detection windows after sonic watermarking and MP3 compression.The reason the CDRs for instrumental solos were low is that the test samples included many sections that are almost silent or at a quite low volume, and the watermarks in those sections were easily destroyed by the background noise of the room and by the MP3 compression.We observed a 28 dB(A) 6 background noise in the soundproof room when nothing was played by the speakers.The leftmost points are the rates immediately after sonic watermaking.

Robustness against echo addition
Figure 14 shows the CDRs after sonic WM and echo addition.Echoing was done digitally on a computer with a feedback coefficient of 0.5.The horizontal axis of the figure is the value of the maximum delay used for echo addition.Though the CDRs for the instrumental solos were low because of sonic WM, it can be seen that echo addition interferes very little with watermark detection.

Robustness against noise addition
Figure 15 shows the CDRs after sonic WM and noise addition.White Gaussian noises with an average noise-to-signal ratio shown in the horizontal axis of the figure were digitally added to the recordings.For popular music, the CDRs remained high up to −20 dB of noise addition.In contrast, the CDRs for orchestral music dropped after noise addition above −35 dB.This is because orchestral music has wider dynamic ranges than popular music does, and contains more low volume sections.Those quiet sections degrade more quickly than loud sections do when the additive noise has a comparable signal level.Though it has been shown in [8] that CDR for quiet sections can be improved, at the sacrifice of transparency, by utilizing the masking effect of the background noise, the robustness against noise when the masking effect is not used by the watermark generator is still an open problem.

SUMMARY
In this paper, we introduced the idea of sonic watermarking that mixes the sound of the watermark signal and the host sound in the air to detect bootleg recordings.The possible problems that may limit the use of sonic watermarking were classified.We proposed an audio watermarking algorithm suitable for sonic watermarking.The subjective acous- tic quality of the algorithm was assessed in the range of "excellent" quality by the MUSHRA listening test.We assessed the effect of the delay of the watermark signal on the quality, and found that 20 milliseconds were short enough to sustain excellent quality.The effect of the direction of the sound sources of the watermark signal and the host signal was so large that special attention should be paid to the placement of the sound sources when using sonic watermarking.The experimental results of robustness were dependent on the type of the music samples.For popular music, the watermark was quite robust so that correct messages were detected from over 90% of the detection windows even when noise addition, echo addition, or MP3 compression was performed after sonic watermarking.However, in the case of instrument solos, since the watermarks for low volume sections were easily degraded by the background noise, the CDR after sonic watermarking was only 60%.
Because this is the first attempt of this kind, there are still large problems to solve with sonic watermarking.The robustness of low volume sections and the acoustic transparency certainly have a room to improve.Some other audio watermarking algorithms might be also suitable for sonic watermarking.We need to theoretically and experimentally compare those algorithms.To evaluate the effects of the critical factors, we performed the experiments and analyzed the results by decomposing the factors into pieces in this paper.An experiment in a more natural situation has to be performed in the future.Other possible research items include cancellation of the watermark generation delay by placing the watermark generator closer to the audience, localization of the bootleg recorder based on detected watermark strengths corresponding to multiple watermark generators, and stably robust and transparent watermark generation by a watermark generator for the exclusive use of musical instruments whose volumes are stably high.

Figure 1 :
Figure 1: Sonic watermarking to detect bootleg recordings on the Internet.The watermark sound and the host sound are mixed in the air.

Figure 2 :
Figure 2: The lifecycle of a bootleg recording with sonic watermarks.While broken lines with arrowheads indicate sonic propagation, solid lines indicate wired analog transmissions or digital file transfers.

Figure 4 :
Figure 4: (b) is an enlargement of a part of (a).A pattern block consists of tiles.The embedding algorithm modifies magnitudes in the tiles according to pseudorandom numbers.The numbers in the figure are examples of the pseudorandom values.

Figure 5 :
Figure 5: The host signal and the watermark signal (a) and (b) for the previous method and (c) and (d) for the proposed method.

Figure 6 :
Figure 6: Examples of the watermark signal and the corresponding host signal for (a) a popular song and (b) a trumpet solo.

Figure 8 :Figure 9 :Figure 10 :
Figure 8: The mean and 95% confidence interval of the subjective acoustic quality of the test signals for all subjects.The test signals are described in Table2.

Figure 14 :
Figure14: The CDRs after sonic watermaking and echo addition.The leftmost points are the rates immediately after sonic watermaking.

Figure 15 :
Figure15: The CDRs after sonic watermaking and noise addition.The leftmost points are the rates immediately after sonic watermaking.

Table 1 :
The test samples for the listening tests.

Table 2 :
The test signals for the listening tests.SP1, SP2, SP3, and SP4 are the speakers illustrated in Figure7.Monaural signals simultaneously played from the speakers are listed in this table.The abbreviations are explained in Table

Table 3 :
Description of the abbreviations used in Table2.

Table 4 :
The number and the durations of the test samples used for the robustness tests.

Table 5 :
The CDRs at which the correct 64-bit messages were de-