
hordia at audiores blog [en]: My presentation at the "VI Jornadas de Acústica, Electroacústica y áreas vinculadas (CADAE)"
hordia at audiores blog [es]: Presentación en las VI Jornadas de Acústica, Electroacústica y áreas vinculadas (CADAE)
hordia at audiores blog [en]: Fundamental (in Hz) to a MIDI note
hordia at audiores blog [en]: SMSMorph part1
hordia at audiores blog [en]: SMS Harmonizer part1
hordia at audiores blog [en]: Blogging a little about gsoc
hordia at audiores blog [en]: Catching (phone) SMS pulse train with CLAM...
hordia at audiores blog [es]: Ejemplo sencillo de aplicación del modelo sinusoides + residuo (SMS)
hordia at audiores blog [es]: Introducción a CLAM
- NetworkEditor tutorial
- Jugar con las varias redes de ejemplo que están en el directorio ‘example-data’
- Annotator tutorial
- Annotator: reconocedor de acordes en canciones
- Artículo en linuxjournal
- CLAM Music Annotator, chord analyzer
- Empezar por la página wiki de desarrollo
- Anotarse en la lista de desarrollo y unirse al canal #clam en freenode
- Leer estos tutoriales “Constructing and playing a simple network“, “Loading and playing a simple network“, “Creating a minimal processing object” (hay varios más)
- El código esta bastante bien documentado, se puede consultar la versión online
Programación, dsp, audio y música
hordia at audiores blog [en]: My GSoC2007 application: "Real-time spectral transformations"
- Port it to real-time and NetworkEditor these spectral transformations: Harmonizer, morph, time-stretch and pitch-discretization.
- Fix gender change (already real-time, residual improvement)
- Make a harmonizer prototype with sliders to control each “voice” gain and the option of control them by midi too.
- Realtime Voice2MIDI. Piano-roll widget for NetworkEditor/Prototyper.
- More general improvements on SMS transformations over issues that can arise during the development of the project.
hordia at audiores blog [es]: Modelo de sinusoides más residuo
- análisis
- compresión de sonido
- separación de fuentes de sonido
- acústica musical
- percepción musical
- Xavier Serra: “Musical Sound Modeling With Sinusoids Plus Noise“.
- Tesis de Xavier Amatriain: “Sinusoidal plus Residual Model“
- Libro DAFX: Chapter 10 – Spectral Processing.
- CLAM SMSTools: Introduction tutorial, more details.
hordia at audiores blog [en]: Sinusoidal plus Residual Model
- analysis
- sound compression
- sound source separation
- musical acoustics
- music perception
- Xavier Serra: “Musical Sound Modeling With Sinusoids Plus Noise“.
- Xavier Amatriain’s Thesis: “Sinusoidal plus Residual Model“
- DAFX book: Chapter 10 – Spectral Processing.
- CLAM SMSTools: Introduction tutorial, more details.
hordia at audiores blog [en]: Starts a ?summer' of code for me
hordia at audiores blog [en]: Hello CLAM!
- CLAM web, wiki, general doc and screenshots.
- Xavier Amatriain’s PhD Thesis, this paper about design patterns: “A Data-?ow Pattern Catalog for Sound and Music Computing” and many other publications.
- Main developers blogs: Xavier Amatriain, Pau Arumi and David Garcia.
Yesterday I had the opportunity to give a talk about my recent work in the google summer of code at the VI Jornadas de Acústica, Electroacústica y áreas vinculadas (CADAE). The given time was short, so was a little hard to explain all in only 20 minutes, but seems that all went well (at least seemed like the people). Here my presentation (in Spanish):
Transformaciones espectrales en tiempo real para CLAM
Download: Transformaciones espectrales en tiempo real para CLAM.pdf
Ayer tuve la oportunidad de presentar mi trabajo realizado para el google summer of code en el marco de las VI Jornadas de Acústica, Electroacústica y áreas vinculadas (CADAE), les dejó la presentación de la misma:
Transformaciones espectrales en tiempo real para CLAM
Download: Transformaciones espectrales en tiempo real para CLAM.pdf
Working to have audio-to-midi in NetworkEditor (CLAM) I needed to convert a fundamental frequency value to a MIDI note one.
I found some source code related with this in Voice2MIDI app, but was not explained at all, so looking for the reason of that formula I arrived at this:
Knowing about equal-tempered scale (check this) and $latex 2^{\frac{n}{12}}$ relation between frequencies plus the fact that C4 or “middle c” has a MIDI value of 60, it’s easy to conclude that then A4 (which its frequency value is 440Hz, a standard for tunning and is 9 semi-tones more) has a MIDI value of 69.
Then, starting with:

It’s easy to arrive at this:

and then, also taking in account this mathematical relation::

the final formula looks like:

and a final c++ code like:
Related post: nictuku’s inverse formula (i.e. from MIDI to Hz) here “Translanting MIDI Notes to frequencies in the diatonic scale using the central A (440hz) as reference“.
Morph effect (best know in images domain) it’s about hybridize two sounds so the resulting one has intermediate characteristics. This implementation is mainly based on interpolation (peaks and residual spectrum) and a balance (depending on interpolation factor) of fundamental.
All the code is mainly based on this idea:
, where alpha is the interpolation factor (bounded to 0..1 range).

I’m still have to tweak it a bit… but anyway I’ve made some demos of it:
Sources: Piano C5 and Oboe C5.
Demos: Take 1, Take 2
To hear the online/streaming version go here.
Samples were taken from Freepats / Iowa Musical Instruments Samples.
I’ll start talking a bit about this effect which is mainly used for vocal harmonizing. Given an input voice (or whatever) as output you obtain (how many as you want) automatic harmonic related voices (a minor/major third, a fifth, a sixth or any musical interval you want).
This implementation, is mainly based on many SMS pitch-shiftings (one for each voice) and a control gain for each one. Pitch controls are based on equal-tempered scale semitones, following
relation for each voice.
This was my first version of the network:

Testing it, my voice never sounded so musical, hehehe… but still awful, so I was thinking in your ears health and demos are with Elvis one
Disclaimer: all audio demos are early testing versions (still with artifacts and clicks that should be removed soon)
Elvis harmonized demo: elvis-harmonized.ogg (to hear the online/streaming version go here)
Prototype:

Configuration:

Note: demos were done without residual processing because adding residual does not improve results much and adds a lot of overhead.
Then, following xamat‘s suggestions I also added a detunning effect (and delay, but this one isn’t working properly yet)

Elvis harmonized (detunned version) demo: elvis-harmonized-detunned.ogg (to hear the online/streaming version go here)
but wait! a lot of graphics and this is also a ‘coding’ blog!!! here you have some code… and btw you can see that programming under CLAM could be very easy once you get the basics…
const Fundamental& inFund,
const Spectrum& inSpectrum,
SpectralPeakArray& outPeaks,
Fundamental& outFund,
Spectrum& outSpectrum
)
{
outPeaks = inPeaks;
outFund = inFund;
outSpectrum = inSpectrum;
TData gain0 = mInputVoiceGain.GetLastValue();
mSinusoidalGain.GetInControl("Gain").DoControl(gain0);
mSinusoidalGain.Do(outPeaks,outPeaks);
SpectralPeakArray mtmpPeaks;
Fundamental mtmpFund;
Spectrum mtmpSpectrum;
for (int i=0; i < mVoicesPitch.Size(); i++)
{
TData gain = mVoicesGain[i].GetLastValue();
if (gain<0.01) //means voice OFF
continue;
TData amount = mVoicesPitch[i].GetLastValue() + frand()*mVoicesDetuningAmount[i].GetLastValue(); //detuning
amount = CLAM_pow( 2., amount/12. ); //adjust to equal-tempered scale semitones
mPitchShift.GetInControl("PitchSteps").DoControl(amount);
mPitchShift.Do( inPeaks,
inFund,
inSpectrum,
mtmpPeaks,
mtmpFund,
mtmpSpectrum);
mSinusoidalGain.GetInControl("Gain").DoControl(gain);
mSinusoidalGain.Do(mtmpPeaks,mtmpPeaks);
TData delay = mVoicesDelay[i].GetLastValue();
if (delay>0.)
{
mPeaksDelay.GetInControl("Delay Control").DoControl(delay);
mPeaksDelay.Do(mtmpPeaks, mtmpPeaks);
}
outPeaks = outPeaks + mtmpPeaks;
if (!mIgnoreResidual)
mSpectrumAdder.Do(outSpectrum, mtmpSpectrum, outSpectrum);
}
return true;
}
The plan includes add MIDI control for each voice pitch (then will be easy to control them for example by a keyboard by the same singing person)
Next post: SMSMorph.
From today I’ll try to start blogging a little more about my gsoc progress…
First of all I was adding bounded limits to many transformations, a task that taught me a lot about CLAM infrastructure (good suggestion pau!), then I had pitch discretization working in NetworkEditor (new network) and built a prototyped example (this taught me about how to make GUI prototypes with QTDesigner). I also worked in some minor bug fix and new features like add set default value to InControls.


Indeed I had wrote a couple of unit tests too
(something very easy, but totally new for me). I have to say that testfarm and automatic testing are very cool features for this kind of development.
Testfarm looks like this:

I’ve also added a new network for hoarseness, I think very useful as first aproach to work with Sinusoidal+Residual models (SMS)

Anyway, most of my work was with SMS Harmonizer, but that is for a forthcoming post.
I was testing my new harmonizer network (with my mic open) and a new SMS arrive to my phone…

funny, don’t? a perfect pulse…
btw: any of you could give a complete explanation about this kind of effect? I also noticed the same with some speakers (of course only hearing the signal) and I think a big clue are the wires, because with my car speakers that effect only happen when I have plugged my “cassette-to-mini-plug” adaptor…
Update: check “SMS interference mystery solved” post.
Si alguien leyó el post introductorio a CLAM y le gustó la idea, me parece que es bueno que lean algo sobre el modelo sinusoides + residuo, también llamado SMS (Spectral Modeling Synthesis) ya que muchas de las cosas que tiene se basan en este modelo (que creo que no es muy conocido por el público en general).
Un ejemplo práctico y el más fácil e intuitivo de entender que se me ocurre en este momento, consiste en analizar con este modelo (en el sentido de descomponer en sinusoides y residuo) una señal de voz , luego aplicarle una ganancia al residuo y volver a sintetizar. Lo que se obtiene es una voz ronca o disfónica, como la de Basile (ej para futboleros argentinos
) o la de Luis Armstrong.
Esta red se puede armar fácilmente con el NetworkEditor de CLAM, y de hecho ya esta disponible en la versión del svn.

CLAM es un completo framework para hacer investigación y desarrollo sobre audio y música (esto también incluye aplicaciones para usuarios finales). Ofrece un modelo conceptual y herramientas para el análisis, la síntesis y el procesamiento de señales de audio. Tiene una interfaz muy amigable, es Software Libre, multiplataforma y esta escrito en C++ (en muchas de sus aplicaciones utiliza tiempo real).
A pesar de que tuvo su origen en una Universidad de Barcelona, España, la documentación en español sobre este framework es escasa, asi que me decidí a hacer una pequeña introducción sobre las cosas básicas, pero con links (eso si, la mayoria en inglés) para el que quiera ir más allá. Pienso que le puede servir a más de uno para empezar.
Básicamente hay dos perfiles: el de usuario final (de las aplicaciones) y el de desarrolladores que quieran escribir sus propios programas sobre este framework.
En este momento se compone de 4 programas principales:
NetworkEditor:
Es una aplicación que permite conectar módulos en forma de red de procesamiento al estilo pd (pero mucho más amigable), MaxMSP o Reaktor (o para los que usan matlab, tipo simulink). Estas redes se ejecutan en tiempo real y se pueden correr con jack, portaudio, LADSPA o VST como backend.
Una de las características más interesantes es que esta red se puede exportar y después correr con una interfaz gráfica diseñada con QTDesigner (ambos programas exportan a un xml que luego se corre con la aplicación Prototyper)
Recomiendo ver esta presentación: “Visual prototyping of audio applications”
Es decir, un usuario que no es programador puede armar complejos plugins o aplicaciones sin escribir una sola línea de código. También es muy útil para armar prototipos de futuras aplicaciones o desarrollos.
En este momento se esta integrando con LADSPA, lv2 y se planea reforzar aún más la posibilidad de usar plugins externos (como un módulo más) dentro del NetworkEditor y vicerversa, usar estas redes como plugins en otras aplicaciones.
Para el que quiera empezar, recomiendo esto:
Annotator:
Es un programa para hacer transcripciones, en el estilo de Sonic Visualizer. Muy potente y con características que lo hacen único.
Para conocer más:
SMSTools:
Un analizador de señales de audio en el estilo de wavesurfer que soporta diferentes tipos de visualización como spectogramas, y todas las derivadas del módelo Sinusoides + Residuo asi como trasnformaciones complejas basadas en este modelo (gender change, pitch-shifting, morph, etc) y muchas otras cosas más (ver tutorial).
Voice2MIDI:
Convierte voz en MIDI. Esta comentado en este artículo de linuxjournal.
Para desarrolladores, sirve como entorno para realizar sus propias aplicaciones de forma fácil o como herramienta para hacer prototipos de sus futuras implementaciones.
Recomiendo:
Si uno quiere, puede aportar al proyecto mandando ‘patchs’ de código a los desarrolladores principales y hasta convertirse en ‘developer’ luego de haber mandado varios de ellos.
En fin, es un proyecto bastante grande y ambicioso. Incluso hay miles de desarrollos más sobre el mismo que no están en el ‘paquete principal’, pero me pareció útil dar un panorama general porque tal vez sea tan abarcativo que maree un poco para el que recién escucha algo de él.
Como ya dije, es multiplataforma y esta disponible para GNU/Linux (con paquetes para varias distribuciones), Windows y Mac (ver más y descargar)
Otro links interesantes:
Este lunes empiezo con el google summer of code, un programa destinado a sustentar el Software Libre y el Código Abierto promocionado por Google. Mi proyecto esta relacionado con transformaciones espectrales en tiempo real para el framework CLAM, para más información sobre esto consultar esta página: GSoC2007: "Real-time spectral transformations".
De paso aviso que de ahora en más todos los artículos que escriba sobre cosas relacionadas con audio y música van a estar en esta dirección: http://audiores.uint8.com.ar/blog y en este blog va a aparecer poco sobre estos temas (algún que otro post duplicado tomado del blog mencionado o lo que cuelgue en la sección "proyectos")
Otras cosas relacionadas con el audio que recomiendo/colaboro:
CLAM: completo framework multiplataforma Software Libre para trabajar con el audio en general y música. Entre otras cosas permite el prototipado rápido de aplicaciones mediante herramientas visuales de control de flujo.
Musix GNU+Linux: distribución para hacer música y trabajar con audio y multimedia en general orientada al usuario.
Grupo Buena Señal: grupo / lista de correo en español que trata temas sobre programación y procesamiento de señales aplicados al audio y la música.
Next Monday finally starts the google summer of code, here my finally accepted application:
Title: Real-time spectral transformations
Mentor: Pau Arumí Albó
License: GNU General Public License (GPL)
Abstract: Revamp all CLAM SMS transformations. Turn real-time all those still aren’t and have them working on Network Editor. For example: Harmonizer, Morph and Time Stretch. Make nice prototypes for use them with Prototyper and have special focus on some. Also make real-time Voice2Midi and all those widgets which can be needed.
Some words more about this:
Last days I was mainly reading about SMS transformations and its model and the “Spectral Processing” chapter of the DAFX (Digital Audio Effects) book.
More news and details about this project here soon.
Es un modelo de análisis/síntesis para procesamiento espectral orientado a aplicaciones musicales y de audio. Se puede ver como una generalización de la STFT (transformada de tiempo corto) y los modelos sinusoidales. Básicamente añade flexibilidad a la STFT manteniendo buena fidelidad de sonido y una representación eficiente.
Este modelo también es conocido como SMS (Spectral Modeling Synthesis) y como HILN en el contexto de MPEG4.
Básicamente esta modelado como la suma de un conjunto de sinusoides (los “sobretonos” estables armónicos o no, las componentes determinísticas del sonido) más el residuo de ruido (modelado como un proceso estocástico) como dos componentes separadas:
![s(t) = \sum_{r=1}^R A_r(t) cos[ \Phi_r(t)] + e(t) s(t) = \sum_{r=1}^R A_r(t) cos[ \Phi_r(t)] + e(t)](http://quicklatex.com/cache/ql_522c595214e80d353ed7445c5887db9f.gif)
donde
y
son la amplitud y fase instantaneas de la
sinusoide respectivamente, y
es la componente de ruido en el tiempo
.
La fase instantanea de la ecuación es: 

El primer paso del análisis detecta los sobretonos presentes en el espectro y los representa con sinusoides que varian con el tiempo . Luego se le resta al sonido original las componentes sinusoidales para obtener el “residuo” (ver el diagrama de bloques).
La señal residual es modelada como un proceso estocástico y se describe como ruido blanco filtrado:

donde
es ruido blanco y
es la respuesta al impulso de un filtro que varia con el tiempo evaluada en el instante
.
El residuo comprende la energía debida a vibraciones no estacionarias y a cualquier otra componente energética de naturaleza no sinusoidal.
Algunas áreas donde este modelo se puede aplicar:
Is an analysis/synthesis model for spectral processing oriented to audio and music applications. We can see it as a generalization of STFT and sinusoidal models, basically adds more flexibility to STFT while maintaining a good sound fidelity and efficient representation.
This model is also known as SMS (Spectral Modeling Synthesis) and HILN in the context of MPEG4.
Basically is modeled as the sum of a set of sinusoids (only the stable partials of a sound, harmonics or not, deterministic components) plus a noise residual (modeled as stochastic process) as two separate components:
![s(t) = \sum_{r=1}^R A_r(t) cos[ \Phi_r(t)] + e(t) s(t) = \sum_{r=1}^R A_r(t) cos[ \Phi_r(t)] + e(t)](http://quicklatex.com/cache/ql_522c595214e80d353ed7445c5887db9f.gif)
where
and
are the instantaneous amplitude and phase of the
sinusoid respectively, and
is the noise component at time
.
The instantaneous phase of the equation is: 

The first analysis step detects partials present in the spectra and represents them with time-varying sinusoids. Then the sinusoidal component is subtracted from the original sound to obtain the remaining “residual” (see the block diagram).
This residual signal is modeled as stochastic process and is described as filtered white noise:

where
is white noise and
is the response of a time varying filter to an impulse at time
.
The residual comprises the energy produced by not stationary vibrations plus any other energy component that is not sinusoidal in nature.
Some areas where this model could be applied:
Last week I got accepted into google’s summer of code program, so I will be with this on summer… ehm s/summer/winter here…
I’m very happy with that!
Google granted 6 students to CLAM, so it’s a big success!!! All applications are listed here.
The scope of my app may vary (or change totally! see below) because I still have to have a meeting with my mentor and adjust some details. Indeed seems that maybe could be another of my gsoc’s applications. Beyond that, of course it will be released under GPL.
ATM, it will be:
Title:Educative Vowel Synthesizer
Mentor: Pau Arumí Albó
License: GNU General Public License (GPL)
Abstract:The main goal of this project is to build an application that let the user to synthesizing different vowels by placing a point within the vowel triangle, and the reverse, given an input vowel from the microphone place a dot on the triangle. For example, this is useful to students who can check their pronunciation. This includes displaying the mouth position for the vowel, visualizing the spectral peaks (and identify the effect), changing the pitch and vocal track characteristics.
A teacher could limit the set of vowels to the ones used for a particular language such as Catalan or English, so that the students just see the relevant ones for the exercise. Also includes some didactict games about identify the vowels by his spectral content.
I really don’t know ATM if it’s going to change or not, but that is what the gsoc page says so far
CONFIRMED: it’s going to change, maybe something with real time sms transformations and real time voice2midi, news soon… here: “My GSoC2007 application: Real-time spectral transformations”
My mentor will be Pau Arumí Albó, one of the main developers of CLAM project and part of Universitat Pompeu Fabra (Barcelona, Spain). I meet him recently on irc channels and mailing lists and he seems very kind. He’s a free software enthusiastic and teachs software engineering. He also developed other free software like testfarm and MiniCppUnit and has many publications. Recently, for example, he was at LAC2007 conference with David García Garzón both showing the work: “Visual prototyping of audio applications“.
I had heard some about CLAM before this GSoC (i had played a little with it too) and indeed starting developing with was in my (long)ToDo. But i didn’t knew that they were open to new developers too, so as fast i get know about its GSoC participation (the bad thing was that was not much before the deadline) i had no doubts about try to apply. For luck, google extended the deadline a little and the time was enough to present (‘at least only’) 4 app’s, but i had presented 20 (GSoC limit!) if i had enought time. That time i was thinking, researching about CLAM, imagining, reading some source code and documentation and writing proposals of course, hehehe. Seems that i overstating but is true
I had discarded from first the idea of apply to a different organization.
My first goal was to have a good opportunity to develop to or under this framework and even I was decided to keep close to the project beyond the results. I came here to learn a lot and give my best. They gave me a warm welocome from the beginning. I wish to be able continue developing to CLAM after completing GSoC (and completing it well of course!).
Some days ago, I already introduced me to the CLAM community here and also I have become member of his planet! Great!
And last but not least i want to express all my thanks for the opportunity to the entire CLAM development group! thanks!!!
Hi all! I’m Hernán Ordiales, this is my first post to Planet CLAM but not my first post at all, I’m blogging since last year but mostly in spanish…
what to say about me?
I live in Buenos Aires, Argentina. Among other things, I love programming, audio and music. I’m studying (mainly)Electronics Engineering and Computer Engineering at FIUBA and for luck (and my fun) I’ve a work in a project related with audio, programming and GNU/Linux. I also have interest in communications (networking, protocols, etc) and all kind of digital systems.
I enjoy very much using/developing under Free Software (of course GNU/Linux is my OS of choice) and I also help with the linux audio distribution called Musix GNU+Linux.
After a long time of follow Xavier Amatriain blog, last weeks I’m started to get involved with CLAM and with every step I’m discovering a lot of new wonderful things and designs that it had never seen by me in another audio projects.
I expect to contribute to and/or develop under CLAM ASAP. I think I’ll start blogging about my progress or new things developed with this framework soon (among other things).
For those who still don’t know much about CLAM project, I’d recommend you:
Here, the “magic sentence” to start developing:
I also encourage you suscribe to user & dev mailing lists, and log in #clam channel at freenode.net!
See you!


Search









