SINCE 2004

  • 0

      0 Item in Bag


      Your Shopping bag is empty

      CHECKOUT
  • Notice

    • ALL COMPUTER, ELECTRONICS AND MECHANICAL COURSES AVAILABLE…. PROJECT GUIDANCE SINCE 2004. FOR FURTHER DETAILS CALL 9443117328

    Projects > ELECTRONICS > 2017 > IEEE > DIGITAL IMAGE PROCESSING

    Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks


    Abstract

    In this paper, we present an efficient approach for audio scene classification. We aim at learning representations for scene examples by exploring the structure of their class labels. A category taxonomy is automatically learned by collectively optimizing a tree-structured clustering of the given labels into multiple metaclasses. A scene recording is then transformed into a label-tree embedding image. Elements of the image represent the likelihoods that the scene instance belongs to the metaclasses. We investigate classification with label-tree embedding features learned from different low-level features as well as their fusion. We show that the combination of multiple features is essential to obtain good performance. While averaging label-tree embedding images over time yields good performance, we argue that average pooling possesses an intrinsic shortcoming. We alternatively propose an improved classification scheme to bypass this limitation. We aim at automatically learning common templates that are useful for the classification task from these images using simple but tailored convolutional neural networks. The trained networks are then employed as a feature extractor that matches the learned templates across a label-tree embedding image and produce the maximum matching scores as features for classification. Since audio scenes exhibit rich content, template learning and matching on low-level features would be inefficient. With label-tree embedding features, we have quantized and reduced the low-level features into the likelihoods of the metaclasses, on which the template learning and matching are efficient.


    Existing System

    Hidden Markov Models (HMMs), Gaussian Mixture Models (GMMs), Support Vector Machine (SVM), and Deep Neural Networks (DNNs).


    Proposed System

    We presented an efficient approach to tackle the audio scene classification task. Our approach can be divided into three parts, label tree embedding, CNNs. Simple CNNs were trained on LTE images to learn templates that are useful for the classification task. Afterwards, the learned templates were matched on an input LTE image for feature extraction and the final classification was accomplished by linear SVMs. Using the proposed LTE learning algorithm, a scene instance is mapped into a 2-dimensional LTE image of size F × T where F is the number of derived features and T is the time frames. We investigate three different low-level feature sets for LTE learning, including Gammatone cepstral coefficients, MFCCs and log frequency filter bank coefficients. We also study how the presence/absence of background noise affects the LTE representations. We preprocess the input signals using minimum statistics noise estimation and subtraction whenever we need to remove background noise. As a result, six LTE images are obtained for a single scene instance, namely LTE0-Gam, LTE0-MFCC, LTE0-Log, LTE1-Gam, LTE1-MFCC, and LTE1-Log where “0” and “1” denote presence/absence of the background noise. The average pooling over time is then applied to the LTE images to produce global LTE feature vectors which are presented to SVM classifiers for classification. We study the combinations of complementary LTE channels derived from the same types of low-level features (LTE-Gam, LTE-MFCC, and LTE-Log), the combinations of those LTEs with the presence/absence of background noise (LTE0-Fusion3, LTE1-Fusion3), and the combination of all the six LTE feature vectors altogether (LTE-Fusion6). Furthermore, combination of various features with and without background noise is essential for a good performance. Finally, in this work we used random forest classifiers in the LTE learning algorithm.


    Architecture


    BLOCK DIAGRAM


    FOR MORE INFORMATION CLICK HERE