Inferring social contexts from audio recordings using deep neural networks

Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.

Original languageEnglish (US)
Title of host publicationIEEE International Workshop on Machine Learning for Signal Processing, MLSP
EditorsTulay Adali, Jan Larsen, Mamadou Mboup, Eric Moreau
PublisherIEEE Computer Society
ISBN (Electronic)9781479936946
DOIs
StatePublished - Nov 14 2014
Event2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014 - Reims, France
Duration: Sep 21 2014Sep 24 2014

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014
CountryFrance
CityReims
Period9/21/149/24/14

Keywords

  • Deep neural networks
  • Harmonic model
  • Multi-label classification

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing

Fingerprint Dive into the research topics of 'Inferring social contexts from audio recordings using deep neural networks'. Together they form a unique fingerprint.

Cite this