Subject: Ph.D. thesis announcement From: Holger Prante <Prante(at)t-online.de> Date: Mon, 6 May 2002 22:01:57 +0200Dear List, I am pleased to announce the availability of my Ph.D. thesis: Modeling Judgements of Environmental Sounds by means of Artificial Neural Networks Holger U. Prante Berlin University of Technology, Germany, 2001 The document is available in paperback (ISBN 3-89825-364-3) or as download from http://www.dissertation.de/PDF/hup507.pdf [2.5 MB]. Best Regards, Holger U. Prante prante(at)t-online.de ------------------------------------------------------------------------- Abstract ------------------------------------------------------------------------- The thesis pursues the following objectives: (a) to collect and to evaluate pairs of adjectives for sound (quality) assessment. (b) to select and binaurally record environmental sounds. (c) to perform listening test for obtaining assessments of the recordings (b) using the adjectives (a). Examination of principle hearing dimensions by means of factor analysis applied to the subjects' assessments and correlation of these with physical parameters extracted from the sounds. (d) to set-up and to compare TEMPORAL supervised and unsupervised neural networks for re-producing the hearing dimensions (c). (e) to compare the prediction (d) with results from linear multiple regression using "classical" psychoacoustic parameters (loudness, roughness, sharpness etc.) examined from the sounds. Results: (a) In a pre-study 384 pairs of adjectives were collected from literature. A cluster analysis was performed to identify corresponding pairs of adjectives. As an outcome of the analysis 12 semantic clusters were formed which can be represented by 24 pairs of adjectives. (b) Subjects were asked to give sound examples which correspond best to the adjectives. 25 sounds were selected to represent the 12 semantic clusters. The sounds were recorded by means of a dummy head on digital audio tape [available on CD: Environmental sounds for psychoacoustic testing, K. Johannsen and H. Prante, Supplement to acta acoustica ACUSTICA 87 (2) 2001]. (c) In the listening test 20 subjects assessed the environmental sounds using the semantic differentials. The factor analysis produced 6 dimensions explaining 72% of the variance in the data. The dimensions are named according the highest factor loading, respectively: pleasant, metallic, scratching, powerful, fluctuating, and distinct. Technically, these dimensions represent soft / dull sounds (pleasant), sounds with strong high frequency content (metallic), sounds with low (fluctuating) or fast (scratching) modulation, sounds with high loudness (powerful) and sounds with high curtosis (distinct). (d) Two types of artificial neural networks were investigated: a temporal supervised and a temporal unsupervised one. The supervised network was implemented as FIR network using temporal backpropagation [Wan, 1994] and the unsupervised network was realized as temporal self-organized feature map [Chappell & Taylor, 1993]. As input for the connectionist prediction models the sounds were passed through an auditory model [Slaney, 1994] producing auditory spectra that were taken as input values for the prediction models. Factor scores from (c) were taken as target. Using cross-validation the supervised network showed highest prediction scores for dimension powerful with 80% correct prediction. The unsupervised method performs best on dimension powerful with 88% correct scores. (e) As reference to the neural network models an alternative approach from classical statistics was performed. Percentiles and other statistical quantities for several psycho-acoustic parameters were calculated from the sounds and fed into a multiple linear regression analysis. The cross-validation results produce the highest values in this study for the dimensions powerful and metallic, which reached 100% (correlation coefficient 0.96 and 0.97) within the pre-defined boundaries, respectively. Conclusions: - The better the pre-processor the better the outcome of the classifier. The results of the multiple regression analysis show that even a rather simple (linear) classifier using a "problem-adjusted" pre-processing can be used to predict the main cognitive factors of real world sounds up to 100% correct. - The analysis of the trained FIR neural networks shows that the connecting weights are trained to perform all-pass filters. This way the fluctuating incoming signals cancel each other out in the hidden layer. Due to the adjusted bias at the hidden units a DC output is produced as required from the constant supervised output value. This outcome reflects the flexibility of the FIR network and indicates possible applications, e.g. in the domain of active noise control. - The temporal-SOFM algorithms can be regarded as a stochastic clustering algorithm. The thesis shows that the algorithm is well suited for grouping high dimensional temporal data in pattern classification and recognition applications. ----------------------------------------------------------------------------