§Perturbare lo spazio latente
Intelligenze artificiali e pratiche artistiche
Where do you place agency in a process-based music system?
by Erik Schoster

In process-based music we sometimes attribute the character of the process to its internal behavior without considering the constellation of other factors involved in the perception of the sounding situation. Some processes like sonification can be especially prone to these attributions since the translation of one territory into another can reveal hidden structures and behaviors that are difficult to observe in their original forms (Polli, 2005). Some composers try to deal with the translation of information to sound in as neutral manner as possible, for example in Mark Fell’s work with image-based analysis of particles moving in Brownian motion, where he «connect[s] data from tracked activity to low-level synthesis parameters» to avoid «familiar gestural or musical habits, any sense of expression or attempt to elicit emotional responses in the listener» in an attempt to present the process as material unto itself (Fell, 2012).

I’d like to explore the placement of agency in process-based music systems by considering them from the perspective of inputs and outputs configured within a system of action, perception, interaction and experience, drawing from Horacio Vaggione’s conception of the action/perception loop in the composition of process-based music (Vaggione, 2001).

A full accounting of the evolution of process-based music through history is out of scope for this article. In western music the practice of composing musical processes goes back at least to 17th century exercises in harmonic permutations and classical era parlor music games (Cope, 1996) but early explorations of musical automata such as water-powered clocks with gongs that «sound at set intervals» as well as specifically musical efforts such as Achilles Lagenbucher’s instruments which were «programmed by a sort of barrel-organ device» trace the desire to explore musical systems further back (de Solla Price, 1964).

During the later part of the 20th century this compositional practice began to be formalized somewhat through the dissemination of works by composers like John Cage and Steve Reich. Reich’s 1974 essay Music as a Gradual Process planted a flag on the term process music, though his conception of process is just one in an otherwise wide and varied field of practice. (Christensen, 2003) For Reich the process is necessarily an autonomous and linear carrying-out of a material principle with «no room for improvisation» (Christensen, 2003). A classic example is his 1968 piece Pendulum Music in which live microphones are suspended above a speaker, creating an acoustic feedback loop. The microphones are set in motion to swing like a pendulum, producing slowly shifting pulses of feedback each time they pass close to the speaker until eventually coming to a stop on a continuous tone. Reich does acknowledge that the process is never fully linear: it is shaped by the «psycho-acoustic by-products» in the act of listening (Reich, 1974, quoted in Christensen, 2003).

For the purposes of this article I’d like to adopt the definition of a musical process offered by F Richard Moore who articulates it as «any agent or activity that transforms information from one form to another» — human or machine alike (Moore, 1990). Here I’ll refer to signals (like an electrical voltage) as carriers of information in the discussion of material processes alongside those computational or notional processes which deal with information in a more abstract sense. Returning to the example of Pendulum Music, here the signal is the information in the form of a voltage being produced by the transducers on either end of the feedback loop. It is mediated by the pendulum-swing process which in turn is acted on by gravity as well as the hand of the performer who sets the microphone into motion.

Computational or notional processes deal with abstract representations of signals (information, as Moore appropriately characterizes it) such as the encoding of a pitch series in western classical notation, or the stream of bytes sampled from a temperature sensor placed in a block of ice. The process never really produces sound itself. There’s a translation of the information (a voltage mediated by a transducer pair in the case of the microphone feedback) into a form which can be articulated as a sounding. For example timing information extracted from a sensor-adorned kite feeding weather data as control signals mapped to the parameters for sampling the acoustic vibration of the kite string itself in Kaffe Matthews’ 1999 piece Weather Made (Matthews, 2018).

John Cage composed with a «note-to-note procedure» using gamuts while developing early pieces (like the Sonatas and Interludes for prepared piano) as a way to work with sounds in a procedurally detached manner. Sounds were «chosen as one chooses shells while walking along the beach» and became materials for gamut-restricted aleatoric manipulation (Cage, 1966). In procedural and process-based music establishing a lexicon for the materials with which the processes will engage is a natural part of the design of the process itself. From Arnold Schoenberg’s tone rows to later efforts of total serialization by Darmstadt School composers (where Cage famously visited to give a procedural lecture on composing with process) the essential idea of drawing a restricted notional lexicon around some set of sound materials was well established. The practice is built fairly deeply into all notated music as well. Many families of traditional instruments are designed with the gamut of note-based temperament in mind: the keys of a piano, frets on a guitar, the valve system of a trumpet or tuba, and so on. These and many other instruments encode a gamut into their basic operation that has a deep connection to the notional practice of western music.

The gamut is a mediator of the process, and has extraordinary influence over its character and behavior. As a personal anecdote to illustrate its effect, my first time working with Karlheinz Essl’s RTCLib system had me curious as to why his alea-rhythm tool (which generated a random onset time whenever it was asked for) sounded so much more interesting than my own implementations of white noise sampling or other uniform random number generators I’d used. The difference was the gamut: on creation of a new instance of alea-rhythm it first creates a gamut of onset times, and all subsequent onsets are selected from this set. The timings are still derived from a uniform random sampling, but because the possibilities are restricted, patterns of a secondary nature begin to emerge as onset times are revisited in different configurations.

To demonstrate, here is a series of tones whose pitches, onsets and other parameters for synthesis are selected from a pseudo-random number generator with a uniform distribution — in other words, white noise.

Sound Example #1

Here is the same process again with the aleatoric approach applied: the same random number generator is used to select values, but for each set of parameters a gamut is first established, and all subsequent values are drawn from it.

Sound Example #2

Both of these approaches are gamut-restricted, however the first has a far greater number of values to choose from (numbers which can be represented as a 64 bit float) on each selection. In the field of music synthesis and computer music synthesis especially, there is the promise and the fallacy of being able to create any sound imaginable — or unimaginable — if only the appropriate sequence of voltages driving the speakers reproducing the sound can be found. James Whitehead carries this to its logical extreme with his thought experiment All Possible CDs, in which the finite nature of digital sound is used to imagine the full universe of possibilities of every possible CD containing every possible combination of bits (Whitehead, n.d.). Somewhere between this enormous digital gamut and a humble tone row are the weights derived from datasets driving modern LLMs: so massive in scope as to feel just as fluid and unpredictable as a pseudo-random number generator, but still shaped by the weights derived from a dataset of human-created documents: a deterministic basis for the system that presents the illusion of a human actor, not unlike its rule-based precursor ELIZA (Weizenbaum, 1966).

In describing musical processes, just considering the inputs and outputs to the process itself does not always encompass all of the information in the process. Many processes utilize some basis as an appendage to their construction which may be tuned or otherwise curated to extend, temper and shape the behavior of its outputs. In processes which utilize machine learning for example the basis can take the form of the numerical weights which are pre-composed during a training stage and feed the initial conditions of a neural net, essentially limiting its behavior in response to any input.

The basis is often generated in compositional time during the construction of the process itself, but may sometimes be mutable in response to inputs. For example a machine learning system which has an interactive training stage embedded in the process or a genetic algorithm which re-composes itself in selection feedback stages (Fiebrink & Caramiaux, 2018).

For Vaggione «the formal rigor of a generative function does not guarantee by itself the musical coherence of a result». Perception — and interactions with processes in response to their perception — is the method through which the «thesis and constraints [of the process] are revealed» (Vaggione, 2001). Vaggione is speaking primarily of compositional time work as a development of the processes themselves through «successive operations of concretization having as a tuning tool — as a principle of reality — an action/perception feedback loop». However I believe the point can be extended to the performance situation, as well as engagements with processes through successive performance or listening situations when considering other time scales as a feature of the loop.

Consider the experience of listening to a process-based fixed media piece. The performance (playback) is fixed in the sense that the recording (especially a digital recording) can be reproduced very faithfully from one playback to the next. However this is only really true if we don’t consider the filtering context of the playback system itself, the effect of the resonance of the sound environment in which the playback system resides, the incidental sounds within the environment which can never be fully erased, and the fingerprint of the particular reality of being at a place in a time with others (or alone) participating in a moment. The impact of this perceptual context which is a mixture of the material reality of the affordance and limitations of the particular sound environment as well as the experiential background of the individual in a given moment contributes significantly to the operational becoming of the musical situation. In other words: all of the components of the system operate together in space(s) through time to synthesize the shared experience.

For a musical situation to become process-based there must be some degree of interactivity, even if it takes place in compositional time (visiting and re-visiting a situation through the action/perception loop) potentially through the course of months or years. For example experiencing the performance of Pendulum Music on multiple occasions, in multiple situations and contexts. The influence of context and past experience doesn’t need to be over-stated as the primary component of the articulation of perception (certainly it can be) but I would like to extend Vaggione’s serious consideration of its role beyond the compositional process itself to the perceptual situation inherent in the experience of and engagement with musical processes in general.

To consider the agency within a musical process we can begin by looking at the potential points of action in Vaggione’s action/perception loop. Actions are carried out by human actors and mechanistic actors alike, with myriad stages of human-machine mixtures possibly providing inputs to the system, or acting directly on it in some way. The gravitational field of the Earth is a significant actor in Pendulum Music, and so is the gesture of release by the human who sets the microphone in motion. Any internal mechanism of the process contributes to the action of the system once set in motion. Processes also make decisions by carrying out their procedures and responding to input signals like human actors might.

Returning to the effect of the gamut as a constraint on the behavior of the process, both paths into and out of the internal information arena of the process require a mapping stage to translate inputs from the real world into signals, and signals back into outputs. In an acousmatic situation utilizing direct synthesis — like Herbert Brün’s DUST system — the mapping may be at the margins of the system, a translation of audio data through the DAC into voltages that modulate air pressure and create soundwaves (Blum, 1979). Usually there are several stages of mapping, each with its own set of gamut constraints that translate the internal machinations of the process into concrete sound events.

Considering our alea-rhythm experiment from earlier, the internal process doesn’t really know anything about the sounds it is producing, it just manipulates parameters which are applied to a software defined synthesis routine. The process maps onset times, pitches, amplitudes, and a timbre shape parameter to the production of each sound event.

These mappings are unavoidable in working with musical processes: at some stage the internal signals need to be translated back into some form which drives a sounding apparatus. That apparatus could be the DAC driving the movement of a speaker cone, a voltage controlled synthesis module which in turn drives a speaker, or a mechanically coupled actuator to some object as in the sound sculpture work of Kristian Weeks. The mapping is never a transparent translation of information into sound — as much as we would like to believe it is true that the territory of the signal is represented somehow transparently through the mapping — it is a cartoon that assumes the shape of the articulating mechanism. A filter on the process’ internal signal (Polli, 2005).

As an illustration, here is the aleatoric process from earlier using the same data mapped to two different synthesis routines.

Sound Example #3

Sound Example #4

Once the signal has been mapped into the world it is perceived. The sound production is a material situation mediated by the reproduction affordance and context: filtered through the reverberant characteristics of the sounding apparatus, and the room itself. It’s also filtered through our past experiences. According to the predictive coding hypothesis, when we receive these perceptual signals our brains engage in a synthesis that merges sensory inputs with past experience. The process «enables the formation of internal models of increasing degrees of abstraction» used somehow to «generate top-down predictions, which are compared with subsequent inputs» (Strauss, et al, 2015).

In a group improvisational context this can create a sort of strange loop. Multiple participants may be directly engaging in the action/perception loop, synthesizing the shared sound context through interrelated group actions and individual past experience. The joy of improvisational playing is to participate in these shared situations informed by the experiences of multiple actors. honor ash makes the case that these musical situations (specifically they refer to improvisatory situations here) create a space for participants to «[access] the state of unselfconsciousness» (ash, 2023).

In Social Dissonance Mattin tells the story of a concert at Hampshire College where Moe Kamura, Taku Unami and Jarrod Fowler were scheduled to perform. Kamura and Unami hid in the bushes outside the concert hall for the duration of the performance. «The musicians explained that hiding was their contribution to the concert» (Mattin, 2022). For Mattin «the sonic element» in their action arises from the negative space created by their absence. Their presence and absence from the event is a modulating factor on the social context of the sound environment which is in turn constructed by the people involved: the improvisers, the audience; everyone in the room. Their planned presence and actual absence created a situation where the real sonic situation experienced was altered by removing the potential of what the audience «might have been hearing while the concert was supposed to be happening» if Kamura and Unami had been present (Mattin, 2022).

Mattin’s score for Social Dissonance (from which the book takes its name) came out of an engagement with Cage’s silent piece 4’33” (Mattin, 2022). This piece (which structures time without imposing additional sonic events on the existing sound environment) in Mattin’s view attempts to erase the consideration of the social aspects of the sound environment and direct the experience to «focus purely on the sonic material as an aesthetic experience» (Mattin, 2022). To fully realize the articulation of a musical process this social dimension should be considered as part of the full system of interlocking actions and experiences. It is always present even in its negative form as Kamura and Unami seemed to demonstrate.

Vaggione places his action/perception loop mostly at compositional time scales. At these time scales actions may be disjointed in time by minutes, hours or even years. We can look at all music as an articulation of an interactive process carried out across different time scales. Plucking the string of a guitar is a directly interactive process that plays out in musical time in which we perceive the resonance of the instrument and act directly upon the string while it sounds. When we hear a recording of this process there are new layers of action between its (re)sounding and the original sounding. There is the translation of the shadow of the acoustic environment through the use of the transducer which turns the sound into voltages stored as analog or digital signals — often to facilitate further manipulation of the sound information. This sound-as-information will typically be shaped to either transform it into something germane to the natural habitat of the speaker — the acousmatic situation — or put through filtering, dynamics processing, convolution and so on to reify the original experience by coloring the recording so that the reproduction simulates an experience that never really occurred, but also synthesizes a new contextually-based situation for the listener in the reproduction act (van Eck, 2017).

The social system of actions, perceptions, experiences and interaction. Copyright: Public Domain | Credits: Erik Schoster

To illustrate, Figure 1 describes this system of time-mediated action-perception loops as a diagram showing inputs, outputs and connections with coloring to indicate their role. Red for actions and actors, orange for processes as well as the information and signals they process, purple for perceptible phenomena and perceptors, blue for individual and shared experiential aspects of the system, green for mechanical or materially-based aspects of the system such as the (re)production apparatus and manipulatable input apparatus, and finally pink for aspects of the system potentially mediated at different time scales. Components of the system which are optional — in other words, not an essential part of the system — are outlined or traced with dots. Components of the system which impose a gamut are outlined with double dashed lines.

Starting with the process, information and signals are represented in orange. Within the process is the (optional) basis, and the information flowing through and being transformed by the process. The mapping translation of inputs to the process is represented as a signal pointing inward to the process, and represents one gamut. The inputs themselves represent a gamut and are presented in green to indicate their materiality. On the other side of the process is the output mapping translation represented in orange as a signal pointing outward to the sounding apparatus. The sounding apparatus in green also represents a gamut on the output. Next to the sounding apparatus emerging also from the output mapping translation are optional informational outputs whose signals in orange wrap back around to the inputs of the process, and to autonomous actors (machines) in red.

Beaming outward from the sounding apparatus are purple arrows representing the transmission of perceptible phenomena. Surrounding the process and actors are clouds of context in blue beaming purple arrows of perceptible phenomena.

There are pairs of actors in red (autonomous machine agents) with their actions directed back to the inputs of the process. There are pairs of actors/perceptors (human actors) in red and purple with their actions (through time) directed back to the inputs of the process and which are surrounded by purple arrows of perceptible phenomena pointing both inward and outward. Inside the actor/perceptors are optional clouds of past experience in blue. Finally there are pairs of perceptors (human observers) in purple with optional blue clouds of past experience and surrounded by purple arrows of perceptible phenomenon pointing both inward and outward.

The placement of agency in process music arises from a situation of human perception and experience engaging with processes in a shared interactive context. Musical processes are not isolated demonstrations of a thesis, rather they participate in a conversation mediated by the constellation of action, perception and experience that constitutes some kind of strange loop that necessarily involves human actors.

As an interesting counterpoint, Goodiepal’s Radical Computer Music score could be seen as an example of an attempt at composing a process-based music system which tries to eliminate the human from the loop. It is composed for a hypothetical future AI — he prefers the term alternative intelligence — and explores concepts that could be theoretically appealing to a binary intelligence by subverting the scannability and possible completeness of the piece. As a piece of music it has manifestations in the world which are in turn a component of the score, produced through time — already somewhat lost as elements of it are spread across short-lived websites, limited runs of physical objects and temporary «Snappidagg» images — but it is ultimately a theoretical work until some such alternative intelligence emerges (Goodiepal, 2011).

While we ponder that situation, we can enjoy the conversation.

Bibliography

ash h., Fear, play and getting things wrong: Understanding emotional responses when collaborating in improvised music settings Masters thesis, Department of Performing Arts, Middlesex University, 2023.
Blum T., Review of Project Sawdust: Work with Computer; Dust; More Dust, by Herbert Brün Computer Music Journal, 3(1) pp. 6-7, 1979.
Christensen E., Overt and Hidden Processes in 20th Century Music. in Seibt, J. (eds) Process Theories, Springer, Dordrecht, 2003.
Cage J., Silence; Lectures and Writings., The M.I.T. Press, Cambridge, 1966.
Cope D., Experiments in Musical Intelligence., Madison, A-R Editions, Wisconsin,1996.
de Solla Price D. J., Automata and the Origins of Mechanism and Mechanistic Philosophy. Technology and Culture, 5(1), 9, 1964.
Fell M., Scale Structure Synthesis: Mark Fell in collaboration with Jonathan Howse, Essay in accompanying booklet. Scale Structure Synthesis., Alku 97, Barcelona, 2012.
Fiebrink R., Caramiaux B., The Machine Learning Algorithm as Creative Musical Tool, in Mclean, A. and Dean, R.T. (ed.), The Oxford Handbook of Algorithmic Music, NY: Oxford University Press, New York, 2018.
Goodiepal, El Camino del Hardcore, Alku 83, Barcelona 2012.
Matthews K., Beyond Me, in Mclean, A. and Dean, R.T. (ed.), The Oxford Handbook of Algorithmic Music, NY: Oxford University Press, New York, 2018.
Mattin, Social Dissonance., Urbanomic, Cambridge, 2022.
Moore F. R., Elements of Computer Music., Prentice Hall, New Jersey, pp. 5-7, 1990.
Polli A., Atmospherics/Weather Works: A Spatialized Meteorological Data Sonification Project., Leonardo, 38(1) pp. 31–36, 2005.
Strauss M. et al., Disruption of Hierarchical Predictive Coding During Sleep., Proceedings of the National Academy of Sciences of the United States of America, 112(11), 2015.
Vaggione H., Some Ontological Remarks About Music Composition Processes., Computer Music Journal, 25(1) pp. 54–61, 2001.
van Eck C., Between Air and Electricity: Microphones and Loudspeakers as Musical Instruments., Bloomsbury, 2017.
Weizenbaum J., ELIZA—A Computer Program for the Study of Natural Language Communication Between Man and Machine (PDF), Communications of the ACM. 9: 36–45, 1966.
Whitehead J. (n.d.). ALL POSSIBLE CDS. [online] Available at LINK

Erik Schoster is an American composer, improviser and instrumentalist of sorts based in Winona Minnesota. He was a founding member of Geodes, A Name For Tomorrow, Rough Weather and Data Entropy and not a founding member of Cedar AV. He has also played in The Andrew Weathers Ensemble, Three Arguments Against The Singularity, Inlets, the Dub Sea Tank Quintet and currently Soft Generator with David Newman and Michel Mazza. His music has been published on various independent labels including Audiobulb, Home Normal, JMY and The Leaf Label. Once upon a time he studied composition with Joanne Metcalf, improvisation with Matt Turner and trombone with Nick Keelan.