11 pages

A knowledge-engineered autonomous mixing system

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
In this work a knowledge-engineered mixing engine is introduced that uses semantic mixing rules and bases mixing decisions on instrument tags as well as elementary, low-level signal features. Mixing rules are derived from practical mixing engineering
  Audio Engineering Society Convention Paper Presented at the 135th Convention2013 October 17–20 New York, USA This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been reproduced from the author’s advance manuscript without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42  nd  Street, New York, New York 10165-2520, USA; also see All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the   Journal of the Audio Engineering Society . A knowledge-engineered autonomous mixingsystem Brecht De Man 1 , Joshua D. Reiss 1 1 Centre for Digital Music, Queen Mary University of London, Mile End Road, London E1 4NS, United Kingdom  Correspondence should be addressed to Brecht De Man ( ) ABSTRACT In this paper a knowledge-engineered mixing engine is introduced that uses semantic mixing rules andbases mixing decisions on instrument tags as well as elementary, low-level signal features. Mixing rules arederived from practical mixing engineering textbooks. The performance of the system is compared to existingautomatic mixing tools as well as human engineers by means of a listening test, and future directions areestablished. 1 .  INTRODUCTION Since the first automatic microphone mixer [1],many systems have been proposed to automatevarious mixing engineering tasks, such as bal-ancing levels, panning signals between channels,dynamic range compression and equalisation [2–13].However, these systems generally lack instrument-specific processing. Mixing decisions are basedsolely on the extracted, low-level features of thesignals and no high-level semantic information,such as which instruments the incoming tracksaccommodate or the genre of the song, is providedby the user or extracted by the system.In this paper, we investigate a system that mixes rawaudio tracks into a stereo track using balance, pan,compression and equalisation rules derived frompractical audio engineering literature [14–21]. Addi-tionally, equaliser and compression presets includedwith the digital audio workstation (DAW) Logic Pro9 are added to the rule base.These sources stress that mixing is highly non-linear[19] and unpredictable [21], and that there are nohard and fast rules to follow [19], “magic” settings[20] or even effective equaliser presets [21]. It shouldbe noted that spectral and dynamic processing of   De Man and Reiss A knowledge-engineered autonomous mixing system tracks does indeed depend very much on the charac-teristics of the input signal. This paper is by nomeans aiming to disprove that. Rather, it seeksto investigate to what extent semantic informationabout a project and its individual tracks, in combi-nation with elementary low-level features, allows asystem to make suitable mixing decisions.To this end, we developed a framework that includesmodules to read these rules, modules to measureelementary, low-level features of audio signals, andmodules to carry out elementary mixing tasks (dy-namic range compression, equalising, fading, pan-ning) based on the rules.Section 2 presents the system and a rule base de-rived from practical mixing engineering literature.We conduct a listening test to assess the perfor-mance of this system and compare it to anotherautomatic mixing system (not knowledge-based andwithout track labels) as well as human mixing en-gineers, as described in Section 3. The results of this test are then discussed in Section 4. Section 5covers the conclusions we drew from this experimentand outlines future directions. 2 .  SYSTEM Figure 1 shows a block diagram of the proposed sys-tem.The system’s input consists of raw, multitrack audio(typically a mixture of mono and stereo tracks), anda text file specifying the instrument correspondingwith every audio file (e.g.  Kick D112.wav, kickdrum ). Elementary features of every track are ex-tracted at the measurement stage (see Section 2.2).For easy access within the system, the track num-ber is automatically stored as an integer or inte-ger array named after the instrument (e.g. if chan-nel 1 is a kick drum:  kickdrum = 1 , if channels 3through 5 are toms:  tom = [3, 4, 5] ). The differ-ent track indices are also stored in subgroup arrays(e.g.  drums g = [1, 2, 3, 4, 5, 7, 12] ) to beable to access all guitars, vocals, ... at once. Then,rules are read from the rule base and, if applica-ble, applied to the respective input tracks. The rulespecifies one out of five compressors: high pass fil-tering (‘HPF’), dynamic range compression (‘DRC’),equalisation (‘EQ’), balance/level (‘fader’) and pan-ning (‘pan pot’). The order of the application of theFig. 1: Block diagram of the system. Solid arrowsrepresent audio input or output; dashed arrows rep-resent textual information such as instrument namesand other metadata, and rules. AES 135 th Convention, New York, USA, 2013 October 17–20Page 2 of 11  De Man and Reiss A knowledge-engineered autonomous mixing system rules is determined by the chosen order of the pro-cessors, i.e. first the knowledge base is scanned forrules related to processor 1, then processor 2 and soon.After processing the individual tracks, the druminstruments (members of subgroup  drums g ) aremixed down using the respective fader and panningconstants, and equalised and compressed if their arerules related to the drum bus. Eventually, the stereodrum bus is mixed down together with the remainingtracks, again with their respective fader and panningconstants. The resulting mix is equalised and com-pressed if there are rules acting on the mix bus.At this point, both the extracted features and themixing parameters are constant over the whole of the audio track (in this experiment only short, four-bar audio fragments are used). In case longer audiotracks should be processed, it would be advisableto calculate these measures per song section (if sec-tions are marked by the user or automatically), orhave measures and settings that vary over time con-tinuously. 2.1 .  Rule list Each rule in the rule list consists of three parts: •  tags  : comma-separated words denoting thesource of the rule (sources can be included orexcluded for comparison purposes), the instru-ment(s) it should be applied on (or ‘generic’),the genre(s) it is applicable in (or ‘all’), and theprocessor it concerns. Based on these tags, theinference engine determines if the rule shouldbe applied, and on which track. The order andnumber of tags is not fixed. •  rules  : The ‘insert’ processors (high-pass filter,compressor and equaliser) replace the audio of the track specified in the  tags   part with a pro-cessed version, based on the parameters speci-fied in the  rules   part. This is done immediatelyupon reading the rule. The level and pan meta-data manipulated by the rules, on the otherhand, are not applied until the mixdown stage(see Section 2.3.5), after all the rules have beenread. The rule can also contain other MATLABcode, like conditional statements, loops, or cal-culations. Audio and metadata correspondingto the processed track, as well as other tracks,can be accessed from within the rule. •  comments  : These are printed in the console toshow which rules have been applied, and to fa-cilitate debugging.An example of a rule is as follows: tags:  authorX, kick drum, pop, rock,compressor rules:  ratio = 4.6; knee = 0; atime = 50;rtime = 1000; threshold = ch { track } .peak -12.5; comments:  punchy kick drum compression In future work, conversion of the rules to a formaldata model and use of the Audio Effects Ontology[22] will facilitate exchanging, editing and expandingthe rule base, and enable use in description logiccontexts. 2.2 .  Measurement modules For every incoming track, the following quantitiesare measured and added to the track metadata: thenumber of channels (mono or stereo), RMS level(1a), peak level (1b), crest factor (1c) and loudness(following the definition from [26]). L rms  =   1 N  N   n =1 | x ( n ) | 2 (1a) L  peak  = max( x ) (1b) C   =  L  peak /L rms  (1c)with  x  the amplitude vector representing the monoaudio file associated with the track. For a stereotrack  x  = [ x L  x R ], these equations become: L rms  =   1   N   N n =1  | x L ( n ) | 2 +   1   N   N n =1  | x R ( n ) | 2 2=  L rms,L  +  L rms,R 2 (2a) L  peak  = max(max( x L ) , max( x R ))= max( L  peak,L ,L  peak,R ) (2b) C   =  L  peak /L rms  (2c) AES 135 th Convention, New York, USA, 2013 October 17–20Page 3 of 11  De Man and Reiss A knowledge-engineered autonomous mixing system Additionally, a hysteresis gate determines whichparts of the track are active (Figure 2): a ( n ) =  0 ,  if   a ( n  −  1) = 1 and ˜ x ( n )  ≤  T  1 1 ,  if   a ( n  −  1) = 0 and ˜ x ( n )  > T  2 a ( n  −  1) ,  otherwise(3)                                    Fig. 2: Activity in function of audio level (hysteresisgate) following equation (3).Fig. 3: Active audio regions highlighted as definedby the hysteresis gate.where  a  is the binary vector indicating whether thetrack is active, ˜ x  a smoothed version of the track’saudio,  T  1  is the level threshold when the gate is off (audio is active),  T  2  is the threshold when the gateis on (audio is inactive), and  T  1  ≤  T  2 . For stereotracks,  x  is summed to mono and divided by two.Based on this definition, the following extra quanti-ties are also included as metadata: the percentageof time the track is active, and the RMS level, peaklevel, crest factor and loudness when active.Note that at this point no spectral information isextracted. 2.3 .  Processing modules Research about the suggested order of processing isongoing, and most practical literature bases the pre-ferred order on workflow considerations [14,15]. Insome cases, at least one EQ stage is desired beforethe compressor, because an undesirably heavy lowend or a salient frequency triggers the compressor ina way different from the desired effect [14]. For ourpurposes, we assume and ensure that the signal hasno such spectral anomalies that significantly affectthe working of the compressor (as confirmed by ashort test). Instead, we place a high-pass filter be-fore the compressor (preventing the compressor frombeing triggered by unwanted low frequency noise),and an equaliser after the compressor.It is widely accepted that the faders and pan potsshould manipulate the signal after the insert proces-sors such as compressor and equaliser, and we placethe pan pots after the faders as this is how mixingconsoles are generally wired. Furthermore, becauseof the linear nature of these processes and their in-dependence in this system, the order is of no impor-tance in this context. Note however that the systemallows for any order of processors.Based on these considerations, the following orderof processors is used for the assessment of this sys-tem: high-pass filter, dynamic range compressor,equaliser, fader and panner.At this point, time-based effects are not incorpo-rated in the system. 2.3.1 .  Dynamic range compression We include a very generic compressor model, with avariable threshold layout (as opposed to for exam-ple a fixed threshold, variable input gain design), aquadratic knee and the following, standard param-eters: threshold, ratio, attack and release (‘ballis-tics’), and knee width [23].Make-up gain is not used in this work since the levelsare set at a later stage by the ‘fader’ module, whichmakes manipulating the gain at the compressor stageredundant. For now, there is also no side-chain filter,a side-chain input for other channels than the pro-cessed one, or lookahead functionality. The compres-sor processes the incoming audio sample per sample. AES 135 th Convention, New York, USA, 2013 October 17–20Page 4 of 11  De Man and Reiss A knowledge-engineered autonomous mixing system Stereo files (such as an overhead microphone pair)are compressed in ‘stereo link’ mode, i.e. the levelsof both channels are reduced by an equal amount. −20 −18 −16 −14 −12 −10 −8 −6 −4 −2 0−20−18−16−14−12−10−8−6−4−20 Input level (dB)    O  u   t  p  u   t   l  e  v  e   l   (   d   B   ) Fig. 4: Dynamic range compressor transfer function(with quadratic knee). Settings used here are: a 8:1ratio, a -6 dB threshold and a knee width of 6 dB.Practical literature lists a fair amount of suggestedcompressor settings for various instruments and var-ious desired effects. 2.3.2 .  Equalising and filtering A second essential processing step is the equalisa-tion and filtering of the different tracks, or groupsof tracks. Two tools take care of this task in thissystem: a high pass filter (implementing rules suchas high pass filter with cutoff frequency of 100 Hzon every track but the bass guitar and kick drum)and a parametric equaliser (with high shelving, lowshelving and peak modes). The parameters for thelatter are  frequency  ,  gain  , and  Q   (quality factor).We use a simple biquadratic implementation forboth the high-pass filter (12 dB/octave, as suggestedby [21]) and the equaliser (second order filter perstage, i.e. one for every  frequency  / Q  / gain   triplet)[24].Most rules found in practical literature are statedso that a great deal of interpretation can be givento them. Usually, an approximate frequency aroundwhich the track should be boosted or cut, but ex-act gain and quality factor values are absent. Inthis case, we try to estimate the gain ( ± 3 dB is ageneric gain value that seemed to work well duringpilot tests, unless it is explicitly specified that thecut/boost should be modest or excessive) and thequality factor (sources often suggest to cut/boost afrequency region, such as 1-2 kHz, in which case thequality factor is chosen so that the width of the peakcorresponds loosely with the width of this region).When attempting to translate vague equalising sug-gestions into quantifiable mix actions, it helps totranslate terms like ‘airy’, ‘muddy’ and ‘bottom’ intofrequency ranges. This is possible because manysources provide tables or graphs that define thesewords in terms of frequencies [14–17]. 2.3.3 .  Panning The panning value is stored in the metadata of everytrack and initially set to zero. The value ranges from − 1 (panned completely to the left) to +1 (pannedcompletely to the right), and determines the the rel-ative gain of the track during mixdown in the leftversus the right channel.Although we provide the option to choose from avariety of   panning laws  , for our purposes we use the-3 dB, equal power, sine/cosine panning law (differ-ent names can be found in literature), as it is theone that is most commonly used according to thepractical audio engineering literature [14].The gain of the left ( g Li ) and right channel ( g Ri ) fortrack  i  is then calculated as follows, with panningvalue  p : g Li  = cos  π (  p  + 1)4   (4) g Ri  = sin  π (  p  + 1)4   (5)Note that constant power is in fact obtained, regard-less of the value of   p , as  g 2 Li  +  g 2 Ri  = 1 (see Figure5).There is a lot of information available in practical lit-erature on ‘standard’ panning values for every com-mon instrument, both exact panning values as wellas rules of thumb (e.g. describing the spread of har-mony instruments over the stereo panorama). 2.3.4 .  Level Like with panning, the ‘level’ variable per instru-ment is stored as metadata with the track. Its ini-tial value being 0 dB, it can then be manipulated AES 135 th Convention, New York, USA, 2013 October 17–20Page 5 of 11
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!