The Virtual DJ -

Morgan Brickley

B.A. ( Mod.) Computer Science

Final Year Project April 2000

Supervisor: Hugh McCabe

Abstract

This report details the design and implementation of an application which allows users to create sample-based music using the Virtual Reality equipment in Trinity College. The purpose of designing the application is to explore the possibilities of using a Virtual Reality System to make music.

Key issues in the design and implementation of the application include:

· Polyphonic Sample Playback using DirectSound (a component of Microsoft’s Direct X suite)

· Real-time Sample Processing (manipulating the samples in real-time)

· Motion Tracking (using the sensors to determine hand positions)

The report discusses the aims of the project, the challenges, and the outcome – the finished application.

Acknowledgements

I would firstly like to thank Hugh McCabe for taking on the project, and giving me the chance to work on something I really enjoyed. Otherwise you’d be reading about the fascinating world of Linux clustering.

To my parents, I apologise for spending too much time hibernating in the Lab.

To Philip McKenzie for some invaluable advice on Microsoft Foundation Classes and Threads and all things complicated.

To the guys in the VR lab – Matthew Ling and John Horan for introducing me to West Coast coffee and Frisbee respectively. And especially to Gareth Bradshaw for helping us out whenever things went awry.

To Dave Gargan, for writing the UltraTrak DLL.

To Ben Phalen, for proof reading the report and teaching me how to spell.

To everyone who ever made me smile.

Contents...................................................................................................................................................................... 4

Table of Figures:.................................................................................................................................................. 6

Chapter 1..................................................................................................................................................................... 7

Introduction............................................................................................................................................................ 7

1.1 Introduction to the Report.................................................................................................................. 7

1.2 Motivation behind the Project........................................................................................................... 7

1.3 The Aim of the Project.............................................................................................................................. 8

1.4 The Application Developed.................................................................................................................... 8

1.5 Intended Audience....................................................................................................................................... 9

1.6 Report Outline............................................................................................................................................... 9

Chapter 2................................................................................................................................................................... 11

Background Information & Chosen Technologies.................................................................... 11

2.1 Introduction................................................................................................................................................ 11

2.2 Analogue Instruments.......................................................................................................................... 11

2.2.1 Vibrations in a Compressible Medium............................................................................................... 11

2.2.2 Psychoacoustics – Human Sound Perception................................................................................... 12

2.2.3 Complex tones......................................................................................................................................... 12

2.3 Synthesis......................................................................................................................................................... 13

2.4 Key terms and Definitions................................................................................................................... 14

2.4.1 Digital Audio Terminology..................................................................................................................... 15

2.4.2 Terminology coined for the project....................................................................................................... 17

2.4.3 File types created for the project........................................................................................................... 18

2.5 Introduction to DirectSound........................................................................................................... 19

2.5.1 Secondary Buffers................................................................................................................................... 19

2.5.2 The Primary Buffer.................................................................................................................................. 20

2.6 Previous Work in Interactive Music.............................................................................................. 20

2.6.1 The Theremin........................................................................................................................................... 20

2.6.2 The Hypercello........................................................................................................................................ 21

2.6.3 The Sensor Chair.................................................................................................................................... 21

2.6.4 The Digital Baton................................................................................................................................... 22

2.7 Trinity College’s Virtual Reality Lab.......................................................................................... 22

2.7.1 Polhemus Motion Tracking System..................................................................................................... 22

2.7.2 5^th Dimensions Data Glove.................................................................................................................... 23

2.7.3 Virtual Research V8 Headset............................................................................................................... 24

2.8 Summary......................................................................................................................................................... 24

Chapter 3................................................................................................................................................................... 25

Design........................................................................................................................................................................... 25

3.1 Introduction................................................................................................................................................. 25

3.2 Three Phase Design..................................................................................................................................... 25

3.3 Generating the Sounds........................................................................................................................... 25

3.3.1 Chosen Technology - DirectSound..................................................................................................... 27

3.3.2 The Design of the SoundEngine........................................................................................................... 27

3.4 Possible methods of Control............................................................................................................. 28

3.4.1 The Squeezebox....................................................................................................................................... 28

3.4.2 The Table.................................................................................................................................................. 29

3.4.3 Musical Beams........................................................................................................................................ 29

3.4.4 The Conjurer............................................................................................................................................ 29

3.4.5 Music Boxes............................................................................................................................................. 30

3.5 The Graphics.................................................................................................................................................. 32

3.5.1 The MusicRoom.......................................................................................................................................... 32

3.6 The Development Environment......................................................................................................... 32

3.7 System Overview.......................................................................................................................................... 33

3.8 Summary................................................................................................................................................... 33

Chapter 4................................................................................................................................................................... 34

Implementation.................................................................................................................................................... 34

4.1 Introduction................................................................................................................................................. 34

4.2 The Implementation Underlying the Dialogue Boxes......................................................... 34

4.2.1 The Opening Dialogue box...................................................................................................................... 34

4.2.3 The Menu Bar.......................................................................................................................................... 39

4.2.4 The Music Box Creator Dialogue........................................................................................................ 46

4.5 The mainline........................................................................................................................................... 48

4.7 High Level System Overview.......................................................................................................... 50

4.8 Summary................................................................................................................................................... 51

Chapter 5................................................................................................................................................................... 52

Analysis & Applications of the System............................................................................................... 52

5.1 Introduction................................................................................................................................................ 52

5.2 Testing Stage................................................................................................................................................ 52

5.2.1 The Beatmaster and the Sample-Queue:............................................................................................ 52

5.2.2 Labeled boxes.......................................................................................................................................... 55

5.2.3 The OpenGL menu.................................................................................................................................. 56

5.2.4 Nuke control............................................................................................................................................ 56

5.3 Analysis........................................................................................................................................................... 56

5.3.1 Teaching Music Theory......................................................................................................................... 57

5.3.2 Teaching Spatial Representation........................................................................................................ 59

5.3.3 DJ mode.................................................................................................................................................... 60

5.4 Suggested Further Work...................................................................................................................... 62

5.4.1 3d Sound..................................................................................................................................................... 62

5.4.2 The Design and Implementation of a MIDI Interface for the VR System....................................... 62

5.4.3 Using the Virtual Reality System to Interpret Dance....................................................................... 62

5.4.4 The Virtual Theremin............................................................................................................................. 62

5.5 Summary......................................................................................................................................................... 63

Chapter 6................................................................................................................................................................... 64

Conclusion.............................................................................................................................................................. 64

6.1 Introduction............................................................................................................................................... 64

6.2 Results............................................................................................................................................................. 64

6.3 Skills Acquired.......................................................................................................................................... 64

6.4 Credits............................................................................................................................................................. 65

6.5 Conclusion.................................................................................................................................................... 66

Code Sample – The Sample Queue............................................................................................................. 72

Code Sample – The Arpeggiator............................................................................................................... 74

List of Figures:

Figure 2.1 A typical ADSR envelope...................................................................................................................... 16

Figure 2.2 A grid-style MusicBox with nine samples with active sample 4.................................................... 18

Figure 2.3 How to play a Theremin........................................................................................................................ 20

Figure 2.4 The European and American co-ordinate systems.......................................................................... 23

Figure 2.5 A typical DataGlove.............................................................................................................................. 24

Figure 3.1 A 2D MusicBox with four samples...................................................................................................... 30

Figure 3.2 A stack-style MusicBox with three samples with active sample 1................................................ 31

Figure 3.3 A grid-style MusicBox with nine samples with active sample 4.................................................... 31

Figure 3.4 System architecture overview.............................................................................................................. 33

Figure 4.1 The Opening Dialogue Box.................................................................................................................. 35

Figure 4.2 The working of the Tremelo effect....................................................................................................... 37

Figure 4.3 The ADSR envelope controller Dialogue Box.................................................................................. 41

Figure 4.4 The mouse controller............................................................................................................................. 41

Figure 4.5 An early version of the Arpeggiator Dialogue Box......................................................................... 42

Figure 4.6 A later version of the Arpeggiator Dialogue Box with the ballad preset selected.................... 43

Figure 4.7 The combined Groove Maker / Arpeggiator with a user programmed drum beat.................... 45

Figure 4.8 The Music Box Creator Dialogue Box............................................................................................... 46

Figure 4.9 Configuring the display settings......................................................................................................... 49

Figure 4.10 High Level system overview revisited................................................................................................ 50

Figure 5.1 Adding a sample into the Sample-Queue.......................................................................................... 53

Figure 5.2 Removing a sample from the Sample-Queue..................................................................................... 54

Figure 5.3 A Music Box set up for learning minor 2^nd and Major 2^nd intervals............................................. 58

Figure 5.4 Intervals and semitones........................................................................................................................ 58

Figure 5.5 A Music Box set up for learning minor 2^nd, major 2^nd , and major 3^rd intervals......................... 59

Figure 5.6 A typical setup for DJ mode................................................................................................................. 60

Chapter 1 Introduction

1.1 Introduction to the Report

This report describes the design and implementation of an application which allows the user to create sample-based music using the college’s Virtual Reality system. By ‘sample-based music’ the author means music which is generated by repeating samples – also known as loop-based music. This chapter will discuss the motivation for the project, its aims, and a brief outline of the report.

1.2 Motivation behind the Project

The computer is a potentially brilliant musical instrument, if only we could control it. The limiting factor of the computer as a musical instrument is in its interface – the keyboard and mouse do not offer enough control to give the user any feeling of truly interacting with it. On the other hand the computer can store a vast amount of sound samples, and synthesize a wide range of sounds. A 400Mhz PC running a good Physical Modeling algorithm can produce some very convincing sounds in real-time, but as of yet the only controller is a MIDI device which is typically a keyboard, which is also somewhat limited when it comes to controlling instruments with a continuous range or bowed-instruments such as violins. We now have a device which can generate the sounds of many different instruments but only offers a single way of controlling them. A musical interface developed for a Virtual Reality system could allow a person get as close as possible to the creation of the sounds. The sounds could be triggered and controlled by simple movements of the arms and hands, or gestures. The characteristics of the note could be dependent on the position or direction of the hands, or the degree to which the fist is clenched. The advantages of using a Virtual Reality environment to control the music is that many different methods can be employed to control the sounds. The various methods considered for this project are discussed in Chapter 3 – Design.

1.3 The Aim of the Project

“A musical instrument is a device to translate body movements into sound.”
-Sawada, Ohkura, and Hashimoto [Sawada 95]

The aim of the project was to explore the potentials and possibilities of using Virtual Reality equipment to make music. Thus the goals of the project were deliberately broad and the specifications deliberately flexible. This allowed for a degree of experimentation when it came to implementing the method of control for triggering and shaping the sounds. The project could be developed further as either a serious music tool for DJs or as a learning aid for developing a concept of sounds and spatial representation. Other possible uses such as Music Theory education will be discussed later in Chapter 5 – Analysis.

1.4 The Application Developed

The application designed and implemented for the project can be divided into four distinct parts:

· The Sound Engine

· The Virtual Reality Interface

· The OpenGL Graphics Engine

· The Graphical User Interface

The first phase of the project was to develope a ‘Sound Engine’ which would allow multiple samples to be started, stopped and affected in real-time. The second phase was to implement the ‘Virtual Reality Interface’ to interpret the input from the various sensors and the data glove thus allowing the user to trigger and control the sound samples. The third phase was to represent the world in a graphical context and to output this in stereo to the Headset. Finally a Graphical User Interface is supplied so that the user can customise the 3D world, the sounds, the controls and the graphics.

The virtual reality interface interprets the glove and sensor data, and triggers events and actions in the sound-engine as a result. The sensor data is also fed into the graphics engine so that a 3D representation of the current world status can be rendered to the headset for the user

1.5 Intended Audience

The report is aimed at students of Computer Science at a Junior Sophistor level or greater and as such some familiarity with developing a Windows Application in C++ is assumed. The report also alludes to various areas of digital music, but any terms that may be unfamiliar to the reader are defined in Chapter 2 – Background Information.

1.6 Report Outline

The report is divided into six chapters, and is laid out as follows:

Chapter One – Introduction

This chapter serves as an introduction to both the report and the project.

Chapter Two – Background Information & Chosen Technologies

Chapter 2 discusses some of background information relevant to the project and introduces the chosen technologies i.e. Trinity College’s Virtual Reality Lab. The background information consists of a brief description of how analogue instruments work and how they can be imitated digitally. Previous work such as that of MITs’ MediaLab is discussed here. Any terms coined specifically for the project and also terms the reader may not be familiar with are defined here.

Chapter Three – Design

Chapter 3 tackles the issues of design and chosen technologies under three sections:

· Sound Engine Design

· Control System Design

· 3D Graphics Design

Chapter Four – Implementation

Chapter 4 is a run through of what the application does and how it does it. Each dialogue box is taken in turn and its purpose and workings are discussed. Implementation issues are dealt with as they arise.

Chapter Five – Analysis

The analysis chapter is concerned with possible applications of the project.

Chapter Six - Conclusion

The final chapter reflects on the aims, development and results of the project.

Chapter 2 Background Information & Chosen Technologies

2.1 Introduction

Before designing an instrument it is necessary to understand something of how analogue (i.e. non-digital) instruments generate their characteristic tones and also the various methods by which computers can attempt to imitate these sounds (i.e. synthesis). This chapter also discusses previous work in the field of interactive music and introduces the hardware that this project was based on.

2.2 Analogue Instruments

The first decision to be made regarding the project was whether to try and simulate a traditional instrument, use synthesis algorithms to generate new sounds, or use pre-recorded sound samples. There follows an introduction to some of the issues that would need to be addressed if the first route had been followed.

2.2.1 Vibrations in a Compressible Medium

In order to gain an insight into the workings of analogue instruments let us consider the effect of a single note being plucked on a guitar string. The string begins to vibrate at a frequency which is inversely proportional to the length of the vibrating part of the string. This vibration is then picked up by the saddle and the bridge, which in turn starts the front of the guitar or ‘soundboard’ vibrating back and forth. This vibration alternates between squashing and stretching the air inside the guitar. As the soundboard becomes more concave it presses in against all the air molecules inside the body creating an area of high pressure underneath the soundboard. The molecules repel from the denser areas in an effort to maintain an equal distance from all their neighbours creating a moving wave of pressure which is amplified inside the guitar before being released through the sound-hole. Our ears can sense these changes in air pressure from a rate of twenty per second up to a rate of 15,000 changes per second [Kientzle 98]. The sound pressure or amplitude of these waves determines the perceived volume we hear the note at, and the frequency of the wave determines the perceived pitch. If the note being plucked was low A our ear drum would beat 440 times a second. In fact it may also beat at various multiples of this too, as the guitar doesn’t simply emit a pure tone of 440Hz, which would sound quite dull, but instead it blends in harmonics of that note. Harmonics are mutliples of the note frequency, so for 440Hz they would include 880Hz, 1760Hz and so on. From this example we can perceive sound as waves of pressure moving through a compressible medium, such as air.

In order to synthesize digitally what happens acoustically it is not enough to implement simple laws of physics as we must also take into account the way in which we perceive sound.

2.2.2 Psychoacoustics – Human Sound Perception

Psychoacoustics is the study of human sound perception and reveals quite a few remarkable nuances of the workings of the human ear. In the paragraph above, it was mentioned that the amplitude of the sound wave determines the volume we perceive, and similarly the frequency of the wave determines the pitch that we perceive. The Greeks believed that pitch and frequency were in fact the same, but that is not quite the case. The assumption that they are the same holds for only the middle range of notes, as people tend to hear high notes as slightly flatter than they are and low notes as slightly sharper than they actually are. This is just one phenomenon unearthed by Psychoacoustics, the study of human sound perception. Not only is the relationship between pitch and frequency non-linear, pitch is also dependant on amplitude, as the louder the sound, the lower its pitch. A convincing synthesizer must be able to produce complex tones.

2.2.3 Complex tones

So far we have presumed for simplicity that the sound generated was a pure tone, which would rarely be the case as far as analogue instruments are concerned. Most instruments as we mentioned would produce a complex sound, a combination of the fundamental frequency and a number of harmonics. Generally the harmonics are produced at precise multiples of the fundamental frequency although this is not always the case, as some instruments such as the piano produce ‘stretched partials’ where the harmonics are not precise multiples. Furthermore, the fundamental frequency need not always be louder than the harmonics in order for that pitch to be perceived, in fact in some cases it can be entirely absent. The relative interplay between the harmonics that arise from playing a note is what gives particular instruments their richness. In fact one of the fundamental differences between instruments is in the range and amplitude of the harmonics it produces.

From the preceding discussions on analogue instruments, psycho-acoustics, and complex tones it is hoped that the reader will understand that modeling the natural processes that we take for granted when an analogue instrument is being played is quite difficult. The process of generating digital sounds that may or may not sound like traditional instruments is called synthesis.

2.3 Synthesis

Digital instruments rely on one of four basic methods for creating sounds:

· Additive Synthesis

· FM (Frequency Modulation) Synthesis

· Wavetable Synthesis

· Physical Modeling Synthesis

The design phase of the project discusses the suitability of each technique for this particular project, and as such it is necessary that the reader should be familiar with these techniques and some basic principals of digital audio, which are defined in this section.

Additive Synthesis is one of the earlier techniques developed to make electronic sounds and involves adding multiple sine waves together to generate tones.

Frequency Modulation is a more complex method which feeds the output of a modulator cell back into its own input creating all the harmonics and combines this with the output from a carrier cell to produce a tone. Most SoundBlaster cards available use FM synthesis to produce sounds.

Wavetable Synthesis is the technique of recording a sample of a real instrument and converting it into a digital form such as a wave file. The disadvantage is that there must be a sample for every note in the range of the instrument being imitated.

Physical Modeling is a technique that has only become feasible to run in real-time very recently, due to the amount of processing power it requires to generate sounds. The sounds that a well designed model produces however, are considerably more realistic than those of other synthesis techniques. Each sound it produces is created in real-time from input parameters corresponding to variables which arise in playing the actual instrument. For modeling a note on an electric guitar for example, you could specify the position where the plectrum hit the string, as well as the fret at which the string is depressed. When designing the model for the electric guitar the modeler would have to take into consideration the position of the pickups (the coils used to detect the vibrations in the string), the thickness of the strings and also the density of the wood of the guitar’s body. Physical Modeling can be hardwired or software based. In fact many upper-market synthesizer manufacturers release free software versions of their synthesizers so that users can test-drive the synthesizer on their computers. Programs such as Audio Simulation’s [Dream Station] allow users to connect a MIDI device to their ‘Virtual Analogue Synth’ which is essentially a physical modeling engine. Physical modeling is also used in guitar processors, one example being Rolands’ COSM chip.

2.4 Key terms and Definitions

Here follows a description of some of the terms used in the report which the reader may or may not be familiar with. They are divided into three sections:

· Digital Audio Terminology.

· Terms adapted for use in the project.

· File types created for the project.

2.4.1 Digital Audio Terminology

Polyphony:

The number of notes that can be played simultaneously. Both hardware synthesisors and software synthesisors are limited in the amount of notes they can play simultaneously.

Sample:

A sample can be taken to mean many things, but for the purposes of the report a Sample should be taken to mean a wave file of a particular instrument or sound.

Sample Rate:

The number of values captured per second when recording to a digital medium. The sample rate is generally used as an indication of the quality of the sample, but there is a trade off between higher sampling rates and storage capacity, as the higher the sample rate the more space required to store the sound. The following examples should provide an idea of the level of quality of the three most popular sampling rates:

· 11025Hz is suitable for voice audio

· 22050Hz is suitable for low quality music audio

· 44100Hz is suitable for high quality music audio

The ADSR Envelope:

In order to simulate the dynamics (i.e. changing amplitude) of a real note played on an analogue instrument, a graph which describes the wave’s amplitude as a function of time can be plotted. As an approximation to this graph we can represent the dynamics in four distinct phases:

- The Attack phase (The time from the start of the note until it reaches its peak amplitude)

- The Decay phase (The subsequent fall off until note stabilises or ends)

- The Sustain phase (The time for which the note is held)

- The Release phase (The time it takes the note to diminish into silence when it is released – the release itself will not be instantaneous and depends on the quality of the player’s aftertouch)

A typical ADSR envelope for a note played on a piano may look something like this:

Peak Vol.

Level Vol.

Base

Vol.

Attack Decay Sustain Release

Horizontal Axis: Time

Vertical Axis: Amplitude / Volume

Figure 2.1 A typical ADSR envelope

MIDI:

Musical Instrument Digital Interface is a set of protocols specifying a language describing a piece of music as a series of Events. Most keyboards available are MIDI compatible and can be connected to a PC via a MIDI port on a suitable soundcard. Whenever a key is either pressed or released on the keyboard it sends a MIDI Event with certain parameters such as the velocity at which the key was struck and the name of the key down the MIDI Cable. It also sends the MIDI code of the device that the Event is destined for. This means that you can chain a substantial amount of MIDI devices together in a daisy chain. The quality of the sounds produced by the system depends on the quality of your synthesizer. Soundcards often contain various MIDI instruments built into them but have a tendency to sound quite unnatural and plinky.

Quantization:

When converting a signal from analog to digital, the signal becomes quantized, i.e. approximated. If a sample is created using 8-bits to store each instantaneous value, the instantaneous signal value must be represented as one of 2⁸(256) values, which means the sample will sound quite low quality. Typically sampling at 16-bits per value allows for high quality samples, as this allows for 2¹⁶(65,536) levels.

WAVE files:

Wave files are the native sound file format for audio on the Windows Platform. The Wave file format is an example of the Windows Resource Interchange File Format (RIFF) common to both .wav and .avi files. RIFF is descended from the IFF file format developed by Electronic Arts for the Amiga.

2.4.2 Terminology coined for the project

A number of terms were invented or borrowed to define various objects and concept for the project.

MusicBox:

A MusicBox is technically a C++ struct with various attributes, but can be thought of as a collection of samples in a designated volume of space. Various types of MusicBoxes are depicted in Chapter 3.

MiniBox:

A MiniBox represents one single sample within a greater MusicBox. It is defined as a C++ struct specifying a volume of space.

Partition:

A partition is a ‘slice’ of the MusicBox whether it be in the x, y, or z directions.

MusicBox

Y partition 2

MiniBox

X partition 0 X partition 1 X partition 2

( active )

Figure 2.2 A grid-style MusicBox with nine samples with active sample 4.

2.4.3 File types created for the project

Three file types were created for saving instruments, MusicBoxes and Arpeggio styles for repeat use, and these are defined below:

Grand Central Instrument ( *.GCI )

A Grand Central Instrument file is simply a text file containing a list of wave files preceded by a number indicating how many wave files are to be included in the instrument.

Grand Central MusicBox ( *.GCB )

A Grand Central MusicBox is a binary file that allows the user to store the MusicBox struct to a persistent storage medium, for recalling again when using the application at a later date.

Arpeggio file ( *.ARP )

An Arpeggio file is a file containing an arpeggio style such as a Ballad setting that describes the speed, length and intervals used to recreate a particular arpeggio.

2.5 Introduction to DirectSound

DirectSound 7.0 is a component of Microsoft’s Direct X API which allows Hardware accelerated audio capture and playback. DirectX was developed to ease the development of games and multimedia applications on the Windows platform and as such offers a number of benefits. One of these advantages is that DirectX is device independent and should therefore work on a vast range of soundcards. It also allows mixing of a (theoretically) unlimited number of sounds and due to its automatic use of hardware acceleration when it is available can provide extremely low-latency playback (< 20ms). The latency is the time between calling play() on a secondary buffer and the time at which the sound is rendered by the speakers. DirectSound is essentially based on DirectSoundBuffers which come in two flavours: Primary and Secondary.

2.5.1 Secondary Buffers:

Each sound source has an associated Secondary buffer. For the purposes of this project, a separate Secondary Buffer is created for each wave file the user loads, and is destroyed when it is no longer required. Secondary Buffers can be either static or streamed, both of which contain audio data in PCM format.

Static Secondary Buffers:

A static buffer resides in a block of memory and remains there until destroyed. Typically static buffers are short sounds that we require to be played many times. Because of their low-latency and minimum overhead all the buffers in the project are created as static buffers. In order to keep the space required to store the samples to a minimum, all samples were recorded at 22,050Hz 16-bit mono.

Streamed Secondary Buffers:

A streamed buffer is generally used for sounds that are too large to fit into a single block of memory and therefore must be loaded in section by section as they are playing. To implement this the developer must conceptualise the streaming buffer as being circular, and place triggers in the buffer (for example at the mid and end points) so that when the play-cursor reaches the half-way mark, the first half of the buffer is overwritten with fresh data, and similarly when it reaches the end-point. Because of this constant over-writing, streamed buffers require quite a bit more overhead than the static type.

2.5.2 The Primary Buffer:

All the secondary buffers that are playing get mixed into a single buffer, the Primary Buffer. The Primary Buffer represents the final mix that gets rendered directly to the computer’s speakers. It is created by the developer at the start of the application and for the optimum mixing should be creating with the same properties and capabilities of the secondary buffers i.e. 22,050Hz, 16-bit mono.

2.6 Previous Work in Interactive Music

There follows a brief analysis of some work done in relevant areas, most of which occurred at the MIT MediaLab.

2.6.1 The Theremin

Figure 2.3 How to play a Theremin

One of the particular inspirations behind the project was the Theremin, which is an unusual instrument named after its inventor, Lev Sergeivitch Termen. The Thereminvox, as it was originally called, was unique in that it was the first instrument invented that did not have to be touched in order to produce a sound. Instead the distance of the players hand from the antenna determines the pitch produced. The instrument operates on a continuous range and the sound it produces is quite eerie. The design of the Theremin suggests a control-style which shall be discussed in the Chapter Three of this report.

2.6.2 The Hypercello

One of the most interesting instruments to emerge from MIT’s innovative MediaLab is Tod Machover’s range of hyper-instruments. Specifically the [HyperCello], designed for cellist YoYo Ma allowed a great deal more control than the conventional instrument. The hypercello allows both a synthesized cello sound to be controlled in parallel with computer-generated music. In a sense the cellist must use the bow both to conduct the piece as a whole and to play the cello as the solo instrument. Various types of sensors measure the position and pressure of the bow, the vibrations under the top plate of the cello, as well as the cellist’s finger positions. The collaborated data is used to respond to every nuance of Yo Yo’s gestures and playing style to create sounds unobtainable on a regular cello. For example by lifting the bow off of the strings after playing it the note is held and varies corresponding to the gestures of the bow [Gershenfeld 98].

2.6.3 The Sensor Chair

Another project born at MIT and developed by Tod Machover is the [Sensor Chair], an instrument which consists of a chair with an embedded plate-shaped antenna in the seat which causes whoever is sitting in it to become an extension of that antenna. Two parallel poles on either side of the front of the chair act as receiving antenna which determine the position of the person’s hands in the space between the poles. This data can be used in various ways by the computer connected to the chair, but for the purposes of this report we shall focus on the ‘Wild Drums’ mode of operation where the space between the poles is divided up into a twenty by twenty grid, each square of which is mapped to a MIDI sound. The playing of the sounds are quantized to 1/8 of a second so as the playing does not seem too chaotic. Two principles from this design are incorporated and extended into this project. Both the representation of sounds in a grid and the quantization of samples to a particular beat are features of both the Sensor Chair and this project.

2.6.4 The Digital Baton

The Digital Baton was a project undertaken by Teresa Anne Marrin, while studying under Tod Machover at MIT. The intention of the project was to map expressive intention using a digital baton allowing a user to conduct a computer generated musical score. Once again sensors of various types were employed to receive data on the position, orientation and acceleration of the bow as well as the surface pressure. The Baton is an interesting interface in that it is a fairly simple device but can used to convey quite a large range of distinct meanings. At a classical concert, the conductor with this simple stick conveys the tempo, dynamics and mood of a piece. As an interface the baton is quite intuitive, the jerky movements made to convey staccato are just as obvious as the smooth movements during a legato piece. Although none of the techniques used in Marrin’s project are used here, her discussion of interpreting gestures and the need for new instruments in her thesis provide some interesting insights into the possibilities that lie ahead [Marrin 96].

2.7 Trinity College’s Virtual Reality Lab

The Virtual Reality facilities at Trinity College include a Motion Tracking System, a Motion Capture System, a DataGlove and a Headset. There follows a description of the technologies chosen for the project.

2.7.1 Polhemus Motion Tracking System

The Polhemus Motion Tracking System consists of 12 sensors connected directly to a PC named Ultratrak, each of which is capable of sensing its position in an electro-magnetic field created by the ‘Long-ranger’ Globe. The sensors can be set to constantly feed data to the UltraTrak machine which interprets this data as three positional values corresponding to an offset in the X,Y,Z co-ordinate planes. The center of the globe is the origin and UltraTrak presents the values as they would appear in the American co-ordinate system. The difference between the American and European co-ordinate system is best illustrated with a diagram, and is shown below:

European Co-ordinate System American Co-ordinate System

Figure 2.4 The European and American co-ordinate systems

2.7.2 5^th Dimensions Data Glove

The DataGlove uses 5 flexible fiber-optic wires running down the length of each finger to determine the degree of flexing. It returns a single byte value between zero and 255 based on the amount of light that makes a round-trip down the fiber-optic circuit for each finger. It also returns further data corresponding to either the tilt, roll, or azimuth of the glove, depending on where the glove’s sensor is attached. The glove’s sensor does not return any positional information and therefore it was decided to attach one of the sensors from the Polhemus system, as not only did these sensors give positional and orientational data, they were more accurate in the data that they did supply.

Figure 2.5 A typical DataGlove

2.7.3 Virtual Research V8 Headset

The Headset is a head-mounted stereo display unit. The computer to which it was attached was equipped with two graphics cards, which simultaneously render the screen data to two separate outputs. The Headset allows for a sensor to be screwed into the top of it, thus allowing for a truly immersive environment.

2.8 Summary

This chapter began by discussing how ‘traditional’ analogue instruments function, and mentioned some of the issues concerned with human perception of sound, before discussing how digital instruments use synthesis to generated sounds. This chapter also took a look at some relevant previous work and ended by introducing the technology on which this project was implemented.

Chapter 3 Design

3.1 Introduction

This chapter describes the design phase of the project and addresses the major decisions that influenced the direction of the project. The aim of the project was to explore the possibilities of virtual music and as such the actual specifications were subject to change, and in fact did change on a weekly basis. Although the specifications were constantly in a state of flux a design plan involving three separate stages was formulated and is detailed below.

3.2 Three Phase Design

The implementation of the project falls naturally into three modules.

· The Sound Engine

· The Virtual Relatity Interface

· The Graphics Engine

The first phase of the project was to develope a ‘Sound Engine’ which would allow multiple samples to be started, stopped and manipulated in real-time. The second phase was to implement the ‘Virtual Reality Interface’ to interpret the input from the various sensors and the data glove and assign meaningful actions to the user’s gestures. The third and final phase was to represent the world in a graphical context and to output this in stereo for the Headset. The three phase design lent itself well to developing in an object oriented manner, for example the sound engine is completely separate to the rest of the system, which facilitates software recycling. The design phase of the project however was by no means a disjoint step in an application development lifecycle, but an on-going process.

3.3 Generating the Sounds

The major question in the initial design of the Sound Engine was how to generate the sounds. The three options under consideration were:

· Synthesize the sounds using algorithms taking sensor values as parameters.

· Use pre-recorded samples of instruments/sounds.

· Employ an external synthesizer (software or hardware) to generate the sounds and incorporate a MIDI interface thus enabling the Virtual Reality system to act as a MIDI controller.

All the options were viable and the author was very tempted to incorporate a MIDI-interface following the release of the software synthesizer ‘Dream Station’. The attraction of Dream Station is that not only does it allow control over which notes are played, it also allows control over a wide selection of envelope filters and various digital signal processors creating some very unique sounds. After initial testing, however, it did not seem possible to connect to Dream Station’s virtual port which would mean that additional hardware would be required to implement the MIDI control.

Another point to consider was the type of sounds each method lends itself to creating. Using pre-recorded samples works well for discrete sounds, but does not lend itself well to simulating instruments on a continuous range, for example the Theremin works on a continuous range over three octaves. DirectSound does in fact allow the frequency of a sample to be altered in real time but this alters the speed and therefore the length of the sample too.

Writing algorithms to generate the sounds on the other hand would most likely lead to creating unusual sounds which would bear little resemblance to the instruments we know.

The method of choice was a compromise between the first and second options. The sound-engine would use wave-table synthesis (i.e. play back pre-recorded samples) but it would also run algorithms in parallel with the playback to allow control over the dynamics of the sample. The implementation of this is discussed in the next chapter.

3.3.1 Chosen Technology - DirectSound

The next design issue that needed to be addressed was what would playback the sample files. One method would be to program the SoundBlaster directly, but this would reduce the application’s portability and did not offer any considerable advantages. After considering a selection of different API’s such as Aureal’s 3D sound engine and shareware engines such as Rex, the author decided on Microsoft’s DirectSound which is a component of the greater DirectX suite. The reasons for choosing DirectX were plentifold: Firstly the latest version (Direct X 7.0) offers hardware accelerated playback of samples. It also allows you to specify whether or not you would like the sample to be stored in memory or streamed in from disk. In addition it allows a high degree of polyphonic playback and real-time control over the pan, frequency and volume of the sample. The actual implementation of this is quite complicated and is described in the following chapter. DirectX version 7.0 also features methods for simulating the nuances of 3D sound such as Doppler effects and changes in tone and volume due to the position and orientation of the listener object (i.e. the head).

3.3.2 The Design of the SoundEngine

DirectSound provides device-independent control over the SoundCard but also requires quite a large amount of code to actually get a sample playing. In order to develop an adaptable system it was necessary to design a SoundEngine to sit on top of DirectSound and provide methods of a high level of abstraction such as play, stop and so on. One of the requirements of the SoundEngine was that it be capable of polyphonic playback and individual control over samples, and as such the SoundEngine is comprised of two classes, class SoundStation and class Sample. Class Sample has methods for controlling a sample’s own pitch, pan and frequency. Each sample also has duplicate buffers associated with it for delay and echo effects (DirectSound provides for creating duplicate buffers at a very low overhead). Class SoundStation sets up DirectSound and provides overall control over the samples including beat-syncronisation and arpeggio effects. Class SoundStation also acts as a protective filter to class Sample, as no Sample methods are allowed to be called directly from the mainline. This protects against a method trying to access non-existent samples for example.

3.4 Possible methods of Control

A number of methods for controlling the system were devised specifically for the project, the more practical of which are presented here.

3.4.1 The Squeezebox

One of the more interesting designs allowed the pitch of the notes be controlled by the distance between the hands, the volume by the height off the ground, and effects by the angles of the hands such as the tilt. Using the DataGlove the fingers of the right hand could be used to play notes in the scale of the current pitch. In fact the fingers need not be restricted to single notes as they could be used to launch an arpeggio, or perhaps a particular combination could be used to play a major or a minor chord. This control arrangement also included a virtual foot controller, which could be used to change the underlying sample.

Advantages:

Controlling the sound using the distance between the hands to alter the pitch or pan is an intuitive interface. Some forms of meditation involve imagining a malleable ball of energy between the hands, which can be rolled around. When the hands are quite close, the power of the imagination can be very convincing, making it feel like there is actually a real force flowing between the hands. This feeling could be amplified by the presence of a sound which is actually changing with the relative motion.

Disadvantages:

When we move one finger, the neighbouring fingers tend to move with it, which makes distinguishing one finger movement from another error prone. Moving individual fingers while keeping the others still is also difficult and possibly leads to repetitive stress injury, and as such is not a viable control mechanism.

3.4.2 The Table

In the table model, the sample to be played is determined by the position of the hands on the table. Sensors attached to both hands select the notes to be played, dependant on where on the table they are positioned. Conceptually the table could be divided into a grid, and the player would stand over it pressing ‘hot-spots’.

A paper cover could be used with the grid drawn in so that it would be clear where one note ended and one began. Different areas of the table could have different functions. Perhaps the bottom or one of the sides could be reserved as a slider control for volume, pan or frequency. The purpose of having the table there when it would work just as well without is that it gives the player something physical to interact with, and therefore incorporates more senses into the process. The glove could be worn on the right hand for extra control, providing chords and arpeggios.

3.4.3 Musical Beams

The Musical Beam model owes much to Jean-Michael Jarre’s laser instruments using infra-red beams and vast amounts of dry-ice. In this instance a music beam would represent a single sample. When the hand breaks the vertical beam it becomes selected, and perhaps changes colour to reflect the fact. The height at which the beam was broken would determine its volume and as in the other models, its pan and frequency could also be altered by various glove gestures. Alternatively the position of the beams with respect to the listeners ears could be used to determine the pan, to give the illusion that it is actually the beams themselves from which the sound emanates. This model also opens the door to using DirectSound3D for simulating Doppler shifts and so on. Graphically the beams could be represented in OpenGL as infra-red beams surrounded by fog to recreate the feel of a Jean Michael Jarre light-show.

3.4.4 The Conjurer

Inspired by the Digital Baton, the Conjurer would allow the user enter a virtual 3D room, in which would be located various music loops situated in boxes on the floor. By pointing directly at a particular box the player could then raise it from the ground at which time its volume would increment with respect to its displacement. The player would play the role of conductor by selecting which samples were playing at which time. For a model of this type to work the samples must be of a complementary nature and cropped so that when looping they remain synchronous with one another. The samples would also have to be synchronized so that they always start on the beat. This could be done by starting them all at the same time and just raising the volume when they are ‘turned on’. The user would also be able to navigate around the room ‘discovering’ new sounds on their path. By using the sound-cone object in DirectSound 3D it would be possible to only allow the user to hear a sound if they were standing in its cone of projection.

3.4.5 Music Boxes

The chosen model for the project is based on a very familiar concept – the box. The difference is that Music Boxes contain a sample of music that is triggered whenever a sensor enters the box. In fact the boxes can be designed so that they contain many different samples in different areas. Conceptually we are associating an area of space with a designated sound sample. In reality the Music Box is a C++ struct containing a collection of vertices and attributes which we can render to the screen using OpenGL.

( X_max – X_min )

Y partition 1

Y partition 0

X partition 0 X partition 1

Figure 3.1 A 2D Music Box with four samples.

The MusicBox model is perhaps the most adaptable of all the models proposed, as it allows the dimensions of a grid in either one, two or three dimensions to be specified. Each section inside the grid can contain a sample or a control trigger. Visually the box can be arranged to suit the preferences of the user. Samples could be stacked one on top of the other like pizza-boxes or side by side in long rectangles like a very large piano. Different setups favour different modes of play. For playing loop-based music where the user is playing the role of DJ, it’s best to have a grid of largish boxes containing the samples to be looped and another box containing the controls such as volume and pitch. A selection of different control scenarios is depicted in the Implementation chapter.

Y partition 2

Y partition 1 ( active )

Y partition 0

Figure 3.2 A stack-style Music Box with three samples with active sample 1.

Y partition 2

Y partition 1 ( active )

Y partition 0

X partition 0 X partition 1 X partition 2

( active )

Figure 3.3 A grid-style Music Box with nine samples with active sample 4.

3.5 The Graphics

The third phase of design was to represent the system using 3D graphics. The technology chosen to implement this was OpenGL, as oppose to Microsoft’s Direct3D. Although OpenGL does not explicitly allow for stereo-optics (required for the Headset), it does allow ‘viewports’ or windows of any size to be created and different worlds can be rendered at different positions within each. A number of graphical representations were considered, and once again, some were more practical than others. One early idea was to represent the world as a continually moving tunnel, to give the sense of passage through time with the samples arranged around the user on the walls of the tunnel. In the end it was thought better to keep the world as simple as possible so as not to overly disorientate the user, hence the MusicRoom.

3.5.1 The MusicRoom

The MusicRoom consists of a collection of MusicBoxes in a virtual room of specific dimension with a specific globe position. The boxes can be arranged around the room with respect to the globe position which is always at the origin. The design of the graphical representation of the room is quite basic.

The floor is depicted as a checker board which is produced by a simple algorithm. The walls were initially designed as texture maps although this was changed to a simple star-field as the texture maps slowed down the rendering of the graphics to an impractical speed. The Polhemus globe is represented as two intertwined pulsating spheres (simply glut spheres with modulating radii). The graphical representation of the MusicBoxes is configurable from the OpenGL menu and can be represent as either solid or wire-frame shapes. The active partitions are highlighted with blue wire-frames and the active MiniBox is represented by a colour-interpolated cube.

3.6 The Development Environment

It was decided to develop the application using C++, the DirectX API, and use OpenGL for the graphics. The sheer size of the GUI meant that after the initial stages the author switched to developing using Microsoft Foundation Classes tools.

3.7 System Overview

The design of the system was not set in stone at the beginning of the project but evolved around the SoundEngine. The basic overview of the system is shown in figure 3.4. A more detailed overview is provided at the end of the next chapter – Chapter 4 (Implementation).

3.8

3.9

Figure 3.4 System Architecture Overview

3.8 Summary

This chapter discussed the issues of the design phase. The design was dealt with in three parts – the design of the sound engine, the design of the control system and the design of the graphical representation. As the project was concerned with exploring possibilities a number of ideas that never left the drawing board were also mentioned.

Chapter 4 Implementation

4.1 Introduction

The purpose of this chapter is two-fold. Firstly it provides a description of the capabilities of the application and how it should be used. Secondly it discusses at a high-level how the functionality was implemented. The Author deliberately has not included any source code in this chapter but some of the more important methods are printed in a stripped down form in the Code Appendix. As the design of the project somewhat evolved around the GUI this chapter takes each dialogue box in turn and explains the implementation issues as they arise.

4.2 The Implementation Underlying the Dialogue Boxes

Here follows a discussion of the functionality behind each of the Dialogue Boxes in the application. Important implementation issues are discussed as they arise.

4.2.1 The Opening Dialogue box

The opening dialogue box shown in Figure 4.1 is provided as a testbed for the users instruments. Here the user can load up a Grand Central Instrument (i.e. a file with the .gci extension containing a list of wave files) and play the various samples by clicking on one of the twelve keys on the picture of the piano keyboard. There are three different ways in which samples can be played, as chosen from the control menu:

· Single-shot mode:

The user clicks a piano key once and this triggers the sample to be played one time.

· Double-Touch mode:

The user clicks to start the sample which plays repeatedly until the user clicks on it again to stop it.

· Dynamic mode:

The user controls the volume of the note in real time using a continuous input from either the Data-Glove or sensors.

By clicking on the checkboxes underneath a particular piano key the user can set the sample to loop. One of the advantages of allowing users to loop samples in the testbed would be to determine whether or not the sample was cropped correctly. Cropping samples correctly is critical in loop-based music and basically involves making the sample the right length so that if it is set to repeat it comes in on the right beat every time. By looking at the peaks of the wave file and identifying the important beats it is quite easy to calculate where to crop the file.Not only should looping samples be in time with themselves but they also need to be in time with any other samples that may be playing alongside them, and as such a ‘Beatmaster’ listbox is provided. The Beatmaster is a sample that keeps the time of the piece, all other samples play in synchronization with it to hold everything together. The implementation of the Beatmaster is left till later (see 5.2.1).

Figure 4.1 – The Opening Dialogue Box

4.2.1.1 Timing Drift:

Samples that start in time with each other can sometimes ‘drift’ out of time if their lengths are not exact multiples of each other. Even a one millisecond difference can build up over time to become noticeable. The ear will forgive a drift of up to 40-50ms seconds but any more than that will sound ‘out-of-time’ (Stewart Acoustical Consultants). Using these figures a sample that repeats more than 50 iterations which differs in one millisecond to the beat length will eventually cause a perceivable drift. Fortunately virtually any wave editor provides facilities for cutting and cropping files to the required size.

4.2.1.2 The Effects - Tremelo:

Referring back to Figure 4.1, the reader will notice three effects boxes named after the guitar effects pedals which they attempt to simulate. The first effect namely the Roger Mayer Voodoo Vibe was actually the Tremelo pedal which characterized some of Jimi Hendrix’s guitar playing. It modulates the Volume of the input at specified frequency and to a certain depth. Depending on the settings it can produce a subtle waver in a sample to a stuttering effect or machine-gun sound. In order to emulate the effect using software an algorithm that varied the volume with respect to time was implemented. The pseudocode for the algorithm is provided overleaf and a more detailed version can be found in the Code appendix. Also provided overleaf is a diagram of how the Tremelo effect changes the volume of the sample.

Definitions:

Tremelo Rate: The rate at which the volume varies from Peak to Peak. It is inversely proportional to the Period of the resulting sine wave.

Tremelo Depth: The Difference in Volume between the minimum or maximum volume and the average volume.

Tremelo Constant: The pre-calculated difference in volume to be applied at each step of the tremelo effect. The tremelo constant is multiplied by tremelo_sign to determine whether or not the volume should be incremented or decremented.

ADSR_GRANULARITY: The distance in the sound buffer between two successive points at which we may alter the volume. If the granularity is too big the volume steps are noticeable, and if the granularity is too small the sound quality will suffer from performing too many operations on the buffer.

SetupTremelo(){

…

tremelo_const = ( 2 * tremelo_depth * ADSR_GRANULARITY)/tremelo_rate;

tremelo_sign = INCREASING;

tremelo_curr_depth = 0;

…

}

// Apply Tremelo effect if required:

if ( fTremelo ){

newVolume += ( tremelo_sign * tremelo_const );

// increment or decrement the current depth:

tremelo_curr_depth += ( tremelo_sign * tremelo_const );

// if we have reached the maximum or minimum depth the flip direction

if ( tremelo_curr_depth >= tremelo_depth * tremelo_sign )

tremelo_sign *= -1;

}// end Tremelo effect

Code Sample – The Tremelo Effect

1/Tremelo Rate

Legend: Short Dashed line: Original volume

X-axis: Time Long Dashed lines: Depth boundries

Y-axis: Volume Unbroken line: Volume after Tremelo effect

Figure 4.2 – The working of the Tremelo effect

4.2.2.2 The Effects -Vibrato:

The Vibrato effect works is much the same fashion as the Tremelo effect except that instead of modulating the volume of the sample we modulate the frequency. In order to sound correct we must firstly determine the frequency around which to modulate. This frequency, which can be between 100 and 100,000, depends on the sampling rate. All the samples in this project were sampled at 22,050Hz and therefore this was the frequency used as the bases for modulation, although DirectSound does provide a method for querying the sample rate of any wave sample. Once again we set the depth of the modulations and their rate depending on the desired effect.

4.2.2.3 The Effects – Delay:

Delay effects such as the Boss DD-2 work by feeding the original signal back into the signal path after a specified delay and repeating this process a number of times. The Delay effect simulated for the project has three parameters – depth, rate and level. The Depth specifies the number of times the input is to be repeated, while the rate specifies the delay in milliseconds between each of the repetitions. Finally the level specifies the volume to play each repetition as a percentage of the previous iteration. DirectSound’s ability to provide duplicate buffers facilitated a very low-overhead implementation for the Delay effect. A sample set up with Delay capability had an array of duplicate buffers associated with it. When the sample was played with the Delay effect it started a series of WM_TIMER messages to be triggered after a specified delay. These messages are caught by a timer thread which identifies the originating sample and then calls the appropriate delay function which updates the new volume, plays the sample and triggers another WM_TIMER message if necessary. The WM_TIMER messages are a Windows message type and are not intended to be entirely accurate, and as such there are sometimes noticeable variations in the delays. The psuedo-code for the delay effect follows overleaf and a more detailed version of the required code is supplied in the index for the readers interest.

StartDelay(){

StartTimer(sampleID,delay_rate)

}

HandleTimerMessage(TimerEvent)

{

if ( TimerEvent is a DelayEvent )

Samples[sampleID].PlayDelay()

Else if ( TimerEvent is an Arpeggio Event ){

If ( sampleID is valid ){

Samples[sampleID].SetVolume(current_arpeggio_vol)

Samples[sampleID].Play()

}

increment current_arpeggio_depth

if ( current_arpeggio_depth = end_of_arpeggio )

StopTimer(TimerEvent)

}

PlayDelay(){

DelaySampleArray[current_delay_depth].SetVolume(current_delay_vol)

DlaySampleArray[current_delay_depth].Play()

If ( current_delay_depth = end_of_delay )

KillTimer(TimerEvent)

Else

SetTimer(sampleID,delay_rate)

}

4.2.3 The Menu Bar:

The Menu Bar associated with the opening dialogue provides for opening and saving Grand Central Instruments. It also provides three sound-engine tools used designed for creating arpeggios, beats and controlling the dynamics of the sample. Each of these tools is explained in this section.

4.2.3.1 Tools – The ADSR Envelope Controller

The reader will recall from Chapter 2 our earlier discussion on the workings of the ADSR volume controller. The purpose of the ADSR controller is primarily to aid the simulation of the dynamics of instruments whose volume varies with time. This includes everything from the swelling of stringed instruments, to staccato or legato notes played on a piano. When a pianist strikes a key, the volume of that note rises to a Peak or maximum value in a short time period. After hitting the peak, the volume recoils to a Level volume where it sustains itself until the pianist then releases the key. Depending on the quality of the pianist’s after-touch the note will decay to zero over a period of time. We specify the cut-off point at which to stop the sample playing as the Base volume. Using slider bars the user can specify the length of each phase of the envelope, namely the Attack, Decay, Sustain and Release phases. Furthermore the user may set the three critical volumes of the envelope – Peak Volume, Level Volume and Base Volume.

The reader may ask why both a Release Depth and After-touch Depth may be specified when they relate to the same ADSR phase. The reason being is that the release depth is used when we are operating in single-shot mode and the after-touch is used when we are in double-touch mode. In single shot mode we pre-calculate the entire envelope whereas in the double-touch mode the release phase is calculated on-the-fly. It is worth noting that the ADSR envelope may be substantially longer than the length of the sample itself.

The workings of the ADSR volume envelope is similar to that of the Beatmaster. We set multiple triggers in the sample buffer at regular intervals. The actual interval specified is defined as the ADSR_GRANULARITY which can be configured to suit the system. During playback, when the current position hits a trigger an ‘ADSR event’ is caused. Once again, we have a specific thread running which catches ‘ADSR events’, determines which sample caused the event by checking the index of the event handle and calls the relevant method UpdateEnvelope() which does all the horse-work. If the sample is in the Attack or Decay phases then UpdateEnvelope() sets the volume of the sample to the next value in the pre-calculated ADSR array, otherwise it calculates the new volume on the fly.

It is worth noting that there a single thread created in class SoundStation handles all the ‘ADSR events’ which are set in each of the sample buffers in class Sample.

Figure 4.3 – The ADSR Envelope Controller Dialogue Box

4.2.3.4 The Sensitive Mouse Pad

The Sensitive mouse pad was developed to test out the real-time volume and panning control offered by DirectSound. When the user clicks down on the pad the sample starts up. The volume is then determined by the vertical position of the mouse cursor and the pan by the horizontal position. Thus by dragging the mouse to the right the sound follows suit.

Figure 4.4 The Mouse Controller

4.2.3.2 Arpitron – The Arpeggiator Tool

Figure 4.5 An early version of the Arpeggiator Dialogue Box

One of the requirements of the Sound Engine was that it be capable of playing a stream of notes when triggered as opposed to just one. A feature of many of the latest synths such as Roland's JP-8000 is a device called an Arpeggiator. This device allows the player to assign an entire phrase (a sequence of notes) to a single key on the keyboard. Shown above is an early version of the Arpeggiator used for the project (the final version can be seen in Figure 4.6). Just like its hardware counterpart it allows the user to select from a variety of styles of arpeggio – the one depicted on the screen above is particularly suited to a ballad style and was used to play a finger picked guitar chord instead of just one note. The reader may also notice four slider bars which are used to set:

· Depth - the number of notes in the phrase

· Rate - the time between each note

· Level - the volume of each consecutive note as a percentage of the previous note

·
Repeat - the number of times to repeat the entire phrase (defaults to 1)

The implementation of the arpeggio is handled by two classes – class Arpeggiator and class SoundStation. The Dialogue specific functions are all handled by class Arpeggiator along with saving and opening arpeggio files (files with extension .arp). The playing of the arpeggios is handled by SoundStation through the use of WM_TIMER messages and a timer event handler. Essentially the arpeggio is an array of integers which can be used to reference samples.

Figure 4.6 A later version of the Arpeggiator Dialogue Box with the Ballad preset selected

4.2.3.3 Groove Maker – The Rhythm Tool

The reader may find Figure 4.6 somewhat familiar as it is in fact an extension of the aforementioned Arpeggiator, inspired by the countless drum-beat generating programs on offer. The Groove Maker demonstrates how the existing Arpeggiator tool can be modified to be used to pre-set some original drum beats. In the screenshot above a simple instrument has been loaded up with 4 percussion sounds - a clap, a deep thud (dum) and two tom drums. A repeating pattern has been selected by aligning the sliders with the name of the sample to be played. Rests may also be inserted by either selecting a blank entry or deselecting the appropriate checkbox. The Clap sample here is the root sample, and this can be changed by scrolling up or down the listbox, the root sample is always played first so the pattern set above would be:

Clap Tom1 Tom2 Tom1 Dum Tom2 Tom1 Tom2 Clap ….

The reader may also note that the Arpeggiator now allows any interval up to a Perfect 5^th (ascending or descending) to be specified.

Figure 4.7 – The combined Groove Maker / Arpeggiator with a user programmed drum beat

4.2.4 The Music Box Creator Dialogue:

Figure 4.8 – The Music Box Creator Dialogue Box

Before explaining the role of the Music Box Creator, the reader should recall that a Music Box was the chosen control implementation. To refresh, a Music Box is represented by a rectangular box in three dimensions, divided up into a grid of Mini Boxes. The various types of Music Boxes created are detailed in the next chapter. For the moment let us consider the process of creating a new Music Box:

1. Select an instrument – By clicking on the small button in the ‘associated instrument file’ section the user can browse for any GCI file.

2. Enter in the physical dimensions and location of the box with respect to the globe.

3. Select the control style for the box

– If the samples are merely fills, select single touch.

– If the samples are loops, select touch on/off

– If the sample is the solo instrument, select dynamic

4. Set up the Sensors:

Each sensor when polled returns six integer values:

The x_position of the sensor with respect to the globe.

The y_position of the sensor with respect to the globe.

The z_position of the sensor with respect to the globe.

The roll of the sensor.

The pitch (a.k.a. elevation) of the sensor.

The yaw (a.k.a. azimuth) of the sensor.

Each of the six parameters returned by the sensors can be used to control one of the following options:

Sample - The sample to be played

Volume - The current playback volume

Pan - The current pan position for

Frequency - The sample rate for playback

Tremelo Rate - The speed of the tremelo effect

Vibrato Rate - The speed of the vibrato effect

Tremelo Depth - The depth of the volume modulations

Vibrato Depth - The depth of the frequency modulations

Arpeggio Depth - The number of arpeggio notes to be played

Arpeggio Rate - The speed at which to play the Arpeggio / Delay

5. The final step is to select how the glove can interact with the box, if necessary. The glove can be used to control the Volume, Pan or Frequency of the either the current sample (Local area of effect) or all samples (Global area of effect).

Using the MusicBox Creator quite a variety of Music Boxes can be created, saved for later or added to the current music room(a music room is merely a collection of Music Boxes). The class MusicBoxDlg handles all of the creation / opening / saving Music Boxes as well as the mainline for the actual simulation. As such it is the largest class in the application and warrants a mention. Most of the functionality is GUI driven, and in the next section some of the workings behind the screens are unveiled.

4.5 The mainline

When the user clicks on the Sensor button as seen in figure 4.9, a thread is spawned to continually poll the active sensors (as selected on the GUI). The thread then calls a boundary testing method which rapidly determines whether or not the sensor lies within a Music Box. If the sensor does lie within a box the thread calls another method to find out exactly which partitions are active (see Figures 3.1 to 3.3 for explanation of partitions) and then notifies the sound engine to either play or queue the samples. The thread then sleeps to allow the OpenGL windows to refresh.

In order to minimise the number of separate threads running, the glove polling is also implemented in the sensor thread, which ensures that the glove readings supplied to the control methods are always the most recent. It was discovered that when the glove was polled constantly in a separate thread, the glove supplied data faster than it could be read, hence filling up the buffer, and as such the values being read from the glove were out-of-date by up to 20 seconds.

4.6 The OpenGL windows

In order for the system to be truly immersive, some form of graphical feedback was required. DirectX does provide a system called DirectDraw for rendering 3D worlds, but it does not seem as intuitive to develop in as OpenGL. One of the reasons for using Open GL is that a library called glut (OpenGL Utilities) supports view-ports, which is the method used to render the 3D world to two separate graphics cards for stereo-optics. As mentioned earlier the computer for which the application was developed had two graphics cards installed, and by configuring the display settings in Window’s control panel the Windows desktop can be extended out to the right by twice its length. For the project the displays were both set to a resolution of 640 x 480 pixels.

Figure 4.9 Configuring the Display Settings

When the user selects stereo-optics, a window of size 1280 x 480 pixels is opened with a glut command. The draw method draws the entire scene once for the left eye, then translates to the bottom corner of the right hand screen and redraws the scene for the right eye. In practice we actually translate a little past the right hand screen to compensate for the distance between the pupils. Because of the limited field of vision in the world (approximately two to three meters in each direction) there is no reason to implement any stereo -perspective correction algorithms (i.e. it is acceptable that both eyes see the same perspective although in reality they would not).

4.7 High Level System Overview

Instead of laboriously describing the class structure of the project a simplified diagram is provided below to give an idea of the system architecture. The arrow heads indicate the direction of information flow. Square boxes represent Dialogue box MFC style classes, ovals represents standard classes and rounded boxes represent Hardware. This diagram is a much simplified view of the intercommunication between classes, as only the important links have been included for clarity.

( ADSR / Beat Events )

Windows Messaging System

Figure 4.10 High Level system overview revisited

4.8 Summary

This chapter discussed the capabilities of the application and also described some of the workings underlying the application. To conclude the chapter a diagram of the system overview was presented.

Chapter 5 Analysis & Applications of the System

5.1 Introduction

This chapter deals with the final stage in the application development lifecycle, that of testing. A number of volunteers were asked to try out the system and provide feedback on what parts they liked and what improvements could be made. Some of these suggestions were then implemented and others have been left for future work.

5.2 Testing Stage

After completion of each component of the project, i.e. the sound engine, the control engine, the graphics engine, a number of volunteers were asked to give their views on the application thus far. As such the testing stage was once again distributed across the development. The final month however was spent implementing the various recommendations and improvements inspired by the suggestions of classmates.

The next section looks at these suggestions in turn and describes what was done to address the issues that arose from the feedback.

5.2.1 The Beatmaster and the Sample-Queue:

The most quoted grievance during the early stages of testing was that it was quite hard to trigger looping samples so that they would play in time with the other samples already playing. Even if the player has got a good sense of rhythm he/she still has to compensate for the slightly jerky nature of the sensor readings. In fact the jerkiness of sensor readings can be lessened by using the built in ‘compensation’ feature on the UltraTrak system but this merely takes in a number of readings from the sensors and averages them out, thus smoothening the readings but also slowing down the effective output. This feature may be more suitable for realistic motion capture in slow motion. In order to ensure that samples would always play in time with each other, the system was adapted to include a Beatmaster.

The Beatmaster is simply the sample that keeps the beat. All other samples that are triggered must play in time with the Beatmaster. This is achieved by placing triggers in the Beatmaster’s buffer, which cause a ‘beat event’ which is caught by a thread specifically invoked to catch beat events. On receiving a ‘beat event’ the thread starts every sample in the Sample-Queue until it is entirely empty. Because of the speed of execution all these samples appear to have started simultaneously. The Sample-Queue itself is a simple cyclic queue structure which contains a list of samples waiting to be played. It operates a FIFO system (First-In First Out) which is described pictorially in Figures 4.3. and 4.3. The psuedo-code for the queue is also provided.

Headd

Figure 5.1 Adding a Sample into the Sample-Queue

Right Arrow: GetNext()

Return Drums.wav

Figure 5.2 Removing a Sample from the Sample-Queue

SampleQueue::Add(Sample){

// Allways insert at tail of Q:

SampleQ[tail++] = Sample;

// let tail wrap-around

if ( tail >= MAX_QUEUE_SIZE )

tail = 0;

}

// GetNext

// Gets next element from the queue and returns true if the queue is not empty

bool SampleQueue::GetNext(int & next){

if ( head == tail ){

return false;

}else{

next = SampleQ[head++];

if ( head >= MAX_QUEUE_SIZE )

head = 0;

return true;

}

SampleQueue::Clear(){

head = 0;

tail = 0;

}

PseudoCode Sample – The Sample Queue

5.2.2 Labeled boxes

While observing classmates using the system it became apparent that some form of labeling the individual samples would greatly help people remember where each sample was. Also it would be beneficial to know which samples were currently looping and which segments of the Music Box were active. To combat the confusion the following improvements were made:

· Firstly each box was labeled with the name of the associated instrument file.

· Each MiniBox was labeled with the name of the sample it triggers. The sample names could be displayed in one of two modes:

· Colour mode – The sample is displayed in red when it is off and displayed in blue when it is on.

· Ghost mode – The sample is not displayed when it is off and displayed in blue when it is on.

· When a sensor enters a Music Box the active x, y, and z segments are drawn with a blue wire-frame, that is, if wire-frames have been turned on in the visuals options menu. Otherwise the entire MiniBox will ‘light up’ as a solid box the colour of each face being determined by interpolating the corner points. Because the set of corner points of each MiniBox are unique, then each MiniBox lights up with different colours, which allows the user to identify the sample with that particular colour.

The improvements listed above went a good way to making the system more intuitive to use as many people found it a bit disorienting at first. Also suggested as a possible improvement was that of using icons to represent the samples and boxes. For example we could use an icon of a drum to represent the drum samples, and to indicate a looping sample we could draw a looping arrow similar to those on stereos et cetera.

5.2.3 The OpenGL menu

The addition of an OpenGL menu served two purposes:

Firstly, during a ‘session’ when a user has mounted the headset, it is often necessary to switch between the graphical display and the GUI to change various settings. It can be considerably disorientating for the person wearing the headset to see the world in one eye and the desktop in the other while the various settings are being changed. Secondly the MusicBox Dialogue Box at one stage had so many controls on it that it actually did not fit onto the screen in 640 x 480 mode so some features were inaccessible. Overly complex GUI’s can be quite forbidding and as such a large amount of the controls were moved so they could be accessed from an OpenGL menu. The ability to add menus to OpenGL windows comes once again with the GLUT package. The menu appears when the mouse is right-clicked anywhere on the screen, and is a very compact way of offering a large number of controls.

5.2.4 Nuke control

Another essential addition to the project was the addition of a Nuke box. A Nuke box is a control box whose function is to stop all sounds playing immediately and reset all variables to their initial conditions. Its usefulness becomes apparent when the player walks through a Music Box settings off all the samples to generate a cacophony of sound.

5.3 Analysis

The final testing stage of the project was concerned with analysing the suitability of the project for

· Teaching music theory

· Teaching spatial representation

· Making music and entertainment (DJ mode)

When using the system for one of these specific purposes the type of MusicBoxes selected should reflect the requirements of the potential users. The following section describes what configuration would be best suited to each.

5.3.1 Teaching Music Theory

One of the most important qualities a musician can possess is that of relative pitch. Having relative pitch involves being able to tell the distance or ‘interval’ between two notes. There are two types of relative pitch:

· Melodic – where the notes are played consecutively.

· Harmonic – where the notes are played simultaneously.

Generally, intervals are learnt based on associating each interval with a piece of music exhibiting the interval in important phrases. For example, the famous melody from the film Jaws is based on minor 2^nd intervals. Nursery rhymes often contain important intervals too, and ‘Three Blind Mice’ is based on descending Major 2^nd intervals, for example. Chords can also be built on the concept of intervals, a Major chord is constructed using a triad of form: root-Major3rd-Perfect5th. Figure 5.2 lists the intervals used in the Western music and gives their ‘width’ in semitones. The distance between the notes C and E is five semitones. This distance is termed a Major 3^rd, an interval which can be heard in the opening line of Summertime by Gershwin. Learning to associate intervals with various famous melodies can aid the learning but it would be better to make a direct association between the sound of an interval and a measure of distance, which is exactly what Interval MusicBoxes do. A MusicBox can be arranged so that neighbouring samples on the x-axis (European) differ by a minor 2^nd, and neighbouring samples on the y-axis (European) differ by a Major 2^nd (See figure 5.1). The concept can be expanded to include Major 3^rd s by doubling up the MusicBox to three dimensions as in figure 5.3. In fact by rearranging the MusicBox any set of intervals can be taught.

Figure 5.3 A Music Box set up for learning minor 2^nd and Major 2^nd intervals

Interval Name:	Distance from root:	Note in C scale:
Root	0 semitones	C
Minor 2^nd	1 semitones	C#
Major 2^nd	2 semitones	D
Minor 3^rd	3 semitones	D#
Major 3^rd	4 semitones	E
Perfect 4^th	5 semitones	F
Augmented 4^th	6 semitones	F#
Perfect 5th	7 semitones	G
minor 6^th	8 semitones	G#
Major 6th	9 semitones	A
Minor 7^th	10 semitones	A#
Major 7^th	11 semitones	B
Octave	12 semitones	C

Figure 5.4 Intervals and Semitones

Figure 5.5 A Music Box set up for learning minor 2^nd, Major 2^nd , and Major 3^rd intervals

5.3.2 Teaching Spatial Representation

As a follow on from the above, MusicBoxes can be set up in various manners to teach spatial representations. A number of ideal configurations for this would be

· Volume boxes: Where the height of the sensor within the box determines the volume.

· Pan boxes: Where the offset of the sensor from the mid-axis determines the degree of pan.

· Frequency boxes: Where the depth of the sensor into the box determines the playback sample rate.

· Combination: Any combination or permutation of the above.

These boxes are best used when wearing the headset headphones, so that the pan is actually relative to the user’s own head. Further work in the area may consider using a 3D sound engine such as A3D v3.0 to set the center of each MiniBox as a sound source and use the sensor on the headset as a listener object. This would really allow the sensation that the sample was indeed emanating from that particular area of the box and would add to the feeling of being within the sound.

5.3.3 DJ mode

The most immediately apparent use for the project is that of entertainment. The user can enter the role of a modern DJ triggering loops, fills, vocals and controlling effects with his/her hands. Here follows a description of one particular setup devised for this mode of practice.

The room was set up with four boxes:

· 4loops.gcb – A MusicBox containing 2 looping drumbeats and 2 looping drum breaks

· 6fills.gcb – A single-shot MusicBox containing 6 fills such as vocals and swishes.

· 6misc.gcb – A variety of samples that can be played in accompaniment to the drum loops including synths, vocals, and drum patterns.

· 6controls.gcb – A control box containing six control settings for the glove.

From above the room would look like so:

3x2 fills ( single shot box )

2x2 drum Loops

( Beatmaster box )

3x2 glove controls ( control box )

6fills.gcb

2x3 complimentary riffs

( double touch box )

Figure 5.6 A typical setup for DJ mode

The DJ would be given the headset to wear with a working sensor attached. The glove is worn on the right hand and sensor is also wrapped around the wrist of the glove hand. Finally the user is given another sensor to hold in the left hand. The idea is that the left hand would be used to start and stop looping samples, and the glove hand is used to actually control them. The parameter that the glove is set to control is either selected from the OpenGL menu or the user may select a particular control by entering the glove hand into one of the MiniBoxes in the control box. In this particular setup the control box has six options:

· Pan Control

· Volume Control

· Frequency Control

· Reset All

· Local Glove Control mode

· Global Glove Control mode

The first three are self-explanatory, the fourth, ‘Reset’ resets the pan to central, the volume to max and the frequency (sample rate) to the original value (typically 22,050Hz). The last two options allow the user swap between Local and Global control. In local control mode, when the glove is actively altering a sample parameter the effects are only seen acting on the current sample, whereas in Global mode all playing samples are affected

The MusicBox 4loops contains the Beatmaster sample and this must be started up before any other looping samples that are triggered start to play, the reason being that these samples must sync-to-the-beat of the BeatMaster. Once started up the Beatmaster never actually stops playing, if a user turns it ‘off’ it does not actually turn off, instead its volume is set so low that it is inaudible.

5.4 Suggested Further Work

The following section provides some suggestions for further work in the area of Virtual Reality Music Systems. Those interested should also read the design section of this report as a number of alternate systems were considered at that point.

5.4.1 3D Sound

In the later stages of the project the use of 3D sound became a viable option, but having attempted some 3D sound using DirectSound 3D the author would not recommend its use. The A3D sound-engine developed by Aureal would seem a more attractive option, and could be used to allow each sample to seemingly emanate from a point source. An application could be developed where the user is allowed ‘catch’ a sample and carry it over to another side of the room. By attaching a positional sensor to the headset the sounds characteristics would alter as the user moved around the room. The application could function as an advanced version of the KoanX Ambient Music Generator [KOAN] program which allows a user ‘push’ looping samples around a screen for different mixes.

5.4.2 The Design and Implementation of a MIDI Interface for the VR System

With the release of soft-synths such as ‘Dream Station’, a MIDI Interface for the VR system would allow the user to control a large amount of filters and signal processors in a Virtual Reality environment. The user would not be creating any music ‘live’ as on a keyboard but instead could control the playback of the tracks in ‘Dream Station’.

5.4.3 Using the Virtual Reality System to Interpret Dance

An ambitious project would be to allow the computer to take input from a dancing subject and use the data to generate appropriate sounds. Many projects of a similar nature such as the Digital Baton have been attempted at MIT’s MediaLab, many of which are available for review on the internet[MIT web].

5.4.4 The Virtual Theremin

The Thereminvox was a revolutionary instrument when first made, being the first ever instrument to be used without having to be touched, which is exactly why it would be the perfect candidate for reproducing in a Virtual Reality System. In fact there would be no need to limit one-self to recreating the exact sound of the Theremin, the control is the most important aspect. Various different algorithms could be experimented with to achieve different results.

5.5 Summary

This chapter concerned itself with issues of analysis. A number of implemented improvements arising from the analysis stage were documented, such as the Beat-Synchronization achieved using the Beatmaster and Sample-Queue, and other such alterations. Also discussed here were the possible applications of the project, such as – an aid for learning musical intervals, a tool for teaching spatial representation, and purely for entertainment. The chapter concluded with a few suggestions for further work such as adapting the project to include MIDI support, 3D sound or implementing a Virtual Instrument based on the Theremin.

Chapter 6 Conclusion

6.1 Introduction

The final chapter in the report focuses on the results of the project with respect to the initial aims. The initial aim of the project was ‘to explore the potentials and possibilities of using Virtual Reality equipment to make music’. Another implicit purpose of the project was to learn, and as such the skills acquired as a result of the project are discussed here also.

6.2 Results

As stated previously the specifications of the project were deliberately kept broad so as to allow a good deal of flexibility in developing the project. The result of the project was a fully working application which allowed a user to make sample-based music using the virtual reality system. As shown in the analysis chapter, there are a number of ways in which the system can be controlled to keep its applications broad. The project is by no means a complete and final product, but the door has been opened to the possibilities available when Virtual Reality system are used for making music. The author hopes that further developments in VR technology will make similar systems available on the market. As of yet the equipment is still very expensive and the amount of wires and cables makes the system overly intrusive and unattractive to new users (not to mention dangerous).

6.3 Skills Acquired

The development of the application served as a great basis for learning how to use the popular DirectSound API. Although there was no original plan to use Microsoft Foundation Classes, the author found that during development most of the time was spent coding and debugging the GUI and as such MFC was enlisted to make much of the GUI code automatic. Developing in this manner was much preferred and comes highly recommended. Developing with Microsoft Foundation Classes does involve a learning curve but the time-saving benefits of using the system far outweigh learning how to use it. Programming environments such as MFC may be the basis of the next generation of rapid-prototyping languages.

6.4 Credits

The samples used for the presentation of the project were acquired from a number of sources. The samples of single guitar notes were taken from a rhythm machine made by Boss called Dr. Rhythm, as were the string samples initially used to test the ADSR controller. Various samples were downloaded from the internet resource [SampleNet] but a good deal of searching was required to find samples that were of the right tempo, length and that blended well with each other. An easier option was to use the samples that come with the various demos of dance software such as [Dance Station] and [Dance E Jay]. The advantage of using the samples that come with these programs is that they are already cropped to the right size and generally play to a common tempo. Apart from using these programs for their samples, much was learned for the various interfaces each provided for the user. The registered version of Dance Station comes complete with its own Evolution 2 octave keyboard which can be used for triggering samples, whereas Dance E Jay has a built in sequencer which scrolls along as the music is playing. As well as using samples from these sources a number of Drum and Bass Beats were created using [Rebirth 2], a program that emulates analogue bass and drum synth’s namely the TB-303, the TR-808 and the TR-909. Another useful source of sounds was Magix [Live Act] a program that allows the user control both the audio and the visuals for dance tunes. Although the samples may have been slightly cheesy in the author’s opinion they sufficed for the testing stage of the project. A number of high quality demos of similar applications are available for download at the on-line music resource [Harmony Central].

6.5 Conclusion

Once again thanks must go to Hugh McCabe for taking on the project, as although there was a lot of hard work involved, the fact that the field of study was very interesting provided a great deal of motivation. The author was satisfied with the outcome and looks forward to further developments in the field.

References:

[Abstract Illustration]

Bill Watterson

The Days are Just Packed – Calvin & Hobbes series, 1994.

[Marrin 96]

Teresa Anne Marrin

Toward an Understanding of Musical Gesture:
Mapping Expressive Intention with the Digital Baton
M.S. Thesis, Massachusetts Institute of Technology, June 1996

[Sawada 95]

Sawada, Hideyuki, ShinÕya Ohkura, and Shuji Hashimoto.

Gesture Analysis Using 3D Acceleration Sensor for Music Control. Proceedings of the
International Computer Music Conference, 1995. San Francisco: Computer Music Association, 1995, pp. 257-260.

[HyperCello]

Tod Machover, Hyperinstruments project, MIT Media Lab

www.media.mit.edu/hyperins

[Gershenfeld 98]

Neil Gershenfeld, When Things Start to Think, MIT Press, 1998.

[Kientzle 98]

Tim Kientzle, A Programmer’s Guide To Sound, Addison Wesley, 1998.

[Sensor Chair]

Tod Mahover, Brain Opera project, MIT MediaLab (website)

brainop.media.mit.edu/Archive/Hyperinstruments/chair.html

[DreamStation]

Audio Simulations’ Dream Station (software)

www.dreamstation.de

[DanceStation]

Evolution Dance Station (software/hardware)

www.dancestation.com

[Rebirth 2]

Propellerhead Software – Rebirth 2.0 (software)

www.propellerheads.se

[KOANX]

KoanX – Ambient Music Generator (software)

www.sseyo.com

[SampleNet]

Online Sampling Resource (website)

www.samplenet.co.uk

[Harmony Central]

Harmony Central online music resources (website)

www.harmonycentral.com

[Dance E Jay]

Dance E Jay – FastTrack Software Publishing (software)

www.fasttrack.co.uk

[Live Act]

Live Act – Magix (software)

www.magix.com

[MIT web]

MIT MediaLab (website)

www.media.mit.edu

Bibliography

Tim Kientzle, A Programmer’s Guide To Sound, Addison Wesley. (1998)

André La Mothe, Teach Yourself Game Programming in 21 days, Sams Publishing. (1994)

Alan Douglas, The Electronic Musical Instrument Manual – a guide to theory and design.(1990)

Allen L. Wyatt, Blaster Mastery, Sams Publishing. (1993)

Eduardo R. Miranda, Computer Sound Synthesis for the electronic musician, Oxford. (1998 )

Code Appendix:

Code Sample – Tremelo

void Sample::SetTremelo(bool){

…

// code to set up Tremelo effect:

tremelo_const = ( 2 * tremelo_depth * ADSR_GRANULARITY)/tremelo_rate;

tremelo_sign = 1;

tremelo_curr_depth = 0;

…

}

// Apply Tremelo effect if required:

if ( fTremelo ){

newVolume += ( tremelo_sign * tremelo_const );

// check bounds:

tremelo_curr_depth += ( tremelo_sign * tremelo_const );

if ( tremelo_curr_depth >= tremelo_depth * tremelo_sign )

tremelo_sign *= -1;

}// end Tremelo effect

Code Sample – Delay

bool Sample::SetupDelay(){

// The simplest implementation for Delay is to

// create an array of Duplicates of MAX_DELAY_DEPTH

HRESULT hr;

for(int i=0;i<MAX_DELAY_DEPTH;i++)

{

hr = pDS->DuplicateSoundBuffer(pDSB,&aDSB[i]);

if( hr != DS_OK )

return false;

}

curr_delay_depth=0;

curr_delay_volume=Peak;

return true;

}

void Sample::StartDelay(){

// Set up Timer Msg to send a WM_TIMER msg to window Queue

// The msg itself must be non-zero, hense the +1

if ((dwTimer = SetTimer(hWnd, sampleID+1, delay_rate, NULL) == 0))

{

fDelay = false;

fADSR = false;

}

// Timer to handle Delay, Arpeggios.

void SoundStation::HandleTimerMsg(UINT nIDEvent)

{

// Use I.D. To figure out which sample caused the timer:

if ( nIDEvent >= 0 && nIDEvent < Num_samples )

{

Samples[nIDEvent].PlayDelay();

}

else if( nIDEvent >= arp_id && nIDEvent < arp_id + Num_samples )

{

int s = nIDEvent - arp_id + arp_array[arp_curr_depth];

// if note requested is out of range then don't play it:

if( s >= 0 && s < Num_samples )

{

Samples[s].SetVolume((LONG)(arp_curr_volume*arp_level)/100);

Samples[s].Play();

}

// Increment depth:

arp_curr_depth += 1;

// If this was the last note in the arpeggio kill timer:

if( arp_curr_depth > arp_depth ){

KillTimer(Parent, nIDEvent+1);

}

void Sample::PlayDelay(){

HRESULT hr;

if ( curr_delay_depth >= MAX_DELAY_DEPTH)

curr_delay_depth = 0;

// alter volume of Delayed Note:

hr = (aDSB[curr_delay_depth])>SetVolume((LONG)(curr_delay_volume*delay_level)/100);

if ((hr = (aDSB[curr_delay_depth++])->Play( 0, 0, NULL)) != 0)

{

fDelay = false;

error = ERR_NO_PLAY;

}

// set up next timer msg if necessary:

if ( curr_delay_depth < delay_depth )

{

if ((dwTimer = SetTimer(hWnd, sampleID+1, delay_rate, NULL) == 0))

{

fDelay = false;

}

else

{

// Kill this paticular timer event:

dwTimer = KillTimer(hWnd, sampleID+1);

}

Code Sample – The Sample Queue

void SampleQueue::Add(int Sample){

// Allways insert at tail of Q:

SampleQ[tail++] = Sample;

if ( tail >= MAX_QUEUE_SIZE )

tail = 0;

}

// Gets next element from Q

// and returns true Q is not empty

bool SampleQueue::GetNext(int & next){

if ( head == tail ){

return false;

}else{

next = SampleQ[head++];

if ( head >= MAX_QUEUE_SIZE )

head = 0;

return true;

}

void SampleQueue::Clear(){

head = 0;

tail = 0;

}

Code Sample – ADSR Envelope update:

/************************************************************

* UpdateEnvelope() *

* Implements the ADSR Volume Filter *

* note: Attack&Decay are precalcualted *

* Release is calculated on the fly *

************************************************************/

void Sample::UpdateEnvelope()

{

HRESULT hr;

// Check bounds:

if ( fADSR == true ){

if ( fAttack && curr_adsr_pos < peak_ADSR_pos){

SetVolume( ADSR[curr_adsr_pos] );

if ( curr_adsr_pos == peak_ADSR_pos ){

fAttack = false;

fDecay = true;

}

curr_adsr_pos += 1;

}// end Attack

else if( fDecay && curr_adsr_pos < stop_ADSR_pos){

SetVolume( ADSR[curr_adsr_pos] );

if ( curr_adsr_pos == stop_ADSR_pos ){

curr_adsr_pos = 0;

fDecay = false;

fADSR = false;

}

curr_adsr_pos += 1;

}// end Decay

else if ( fRelease )

{

int v = Level - curr_rel_pos * Release_const;

SetVolume( v );

if (v < -10000 ){

curr_rel_pos = 0;

fRelease = false;

fADSR = false;

Stop();

}

curr_rel_pos += 1;

}// end Release

}

else {

// out of bounds, so reset all:

fAttack = false;

fDecay = false;

fSustain = false;

fRelease = false;

fADSR = false;

}

Code Sample – The Arpeggiator

// Timer to handle Delay, Arpeggios.

void SoundStation::HandleTimerMsg(UINT nIDEvent)

{

// Use I.D. To figure out which sample caused the timer:

if ( nIDEvent >= 0 && nIDEvent < Num_samples )

{

Samples[nIDEvent].PlayDelay();

}

else if( nIDEvent >= arp_id && nIDEvent < arp_id + Num_samples )

{

int s = nIDEvent - arp_id + arp_array[arp_curr_depth];

// if note requested is out of range then don't play it:

if( s >= 0 && s < Num_samples )

{

Samples[s].SetVolume((LONG)(arp_curr_volume*arp_level)/100);

Samples[s].Play();

}

// Increment depth:

arp_curr_depth += 1;

// If this was the last note in the arpeggio kill timer:

if( arp_curr_depth > arp_depth ){

KillTimer(Parent, nIDEvent+1);

}

Abstract

Acknowledgements

The application designed and implemented for the project can be divided into four distinct parts:

Additive Synthesis is one of the earlier techniques developed to make electronic sounds and involves adding multiple sine waves together to generate tones.

The ADSR Envelope:

Y partition 2 MiniBox

X partition 0 X partition 1 X partition 2

Y partition 1 ( active ) Y partition 0

Y partition 2 Y partition 1 ( active ) Y partition 0

3.8

3.9

Code Sample – The Tremelo Effect

PseudoCode Sample – The Sample Queue

Y partition 2

MiniBox

Y partition 1 ( active )

Y partition 0

Y partition 2

Y partition 1 ( active )

Y partition 0