Elsevier

Applied Soft Computing

Volume 62, January 2018, Pages 649-666
Applied Soft Computing

Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's Disease

https://doi.org/10.1016/j.asoc.2017.11.001Get rights and content

Highlights

  • State-of-the-art speaker recognition schemes are used to detect Parkinson's Disease.

  • A detailed study of the influence of the degrees of freedom is carried out.

  • The performance of GMM-UBM and i-Vectors techniques is similar.

  • Rasta-PLP parameterization over read text provides the best results.

  • Derivatives contain very relevant information for the detection of Parkinson.

Abstract

The diagnosis of Parkinson's Disease is a challenging task which might be supported by new tools to objectively evaluate the presence of deviations in patient's motor capabilities.

To this respect, the dysarthric nature of patient's speech has been exploited in several works to detect the presence of this disease, but none of them has deeply studied the use of state-of-the-art speaker recognition techniques for this task.

In this paper, two classification schemes (GMM-UBM and i-Vectors-GPLDA) are employed separately with several parameterization techniques, namely PLP, MFCC and LPC. Additionally, the influence of the kinetic changes, described by their derivatives, is analysed.

With the proposed methodology, an accuracy of 87% with an AUC of 0.93 is obtained in the optimal configuration. These results are comparable to those obtained in other works employing speech for Parkinson's Disease detection and confirm that the selected speaker recognition techniques are a solid baseline to compare with future works. Results suggest that Rasta-PLP is the most reliable parameterization for the proposed task among all the tested features while the two employed classification schemes perform similarly. Additionally, results confirm that kinetic changes provide a substantial performance improvement in Parkinson's Disease automatic detection systems and should be considered in the future.

Introduction

The second most prevalent neurodegenerative disease, Parkinson's Disease (PD), is usually diagnosed on the basis of the observation of motor cardinal signs [1] and other non-motor indicators (physiological and cognitive manifestations) which are employed in the clinical diagnosis. Despite neuropathological diagnosis during autopsy is considered as the gold standard, some studies demonstrate that following the usual clinical diagnosis criteria it is possible to obtain 90% of accuracy in final judgement, but the average detection time to reach this accuracy is 2.9 years [2]. To monitor the progress of the disease, specialists often employ the Unified Parkinson's Disease Rating Scale (UPDRS)1 [3] or the Hoehn and Yahr (H&Y) scale2 [4] which include objective and non-objective assessments. In this regard, new technologies could accelerate the diagnosis process and provide a more objective monitoring of the affection.

Since PD affects the coordination of movements, it is reasonable to hypothesize that the assessment of the patients’ performance during a complex motor task might be employed for diagnosis purposes. Speech production, an ability that is almost universal, might be affected by PD since it involves complex and very precise movements, but on spite of being good candidate for PD detection and evaluation, its capabilities have not been deeply exploited yet.

It is well known that the neurodegenerative processes associated to the disorder cause hypokinetic dysarthria, thus producing a reduction of loudness and articulation amplitude, slowing down the speech sometimes and principally reducing intelligibility [5], [6], [7], [8], [9]. Literature evidences the influence of PD on speech from early to advanced stages although it is mainly perceived in mild to advanced phases [10], [11], [12]. In this regard, it is expected that new methods of automatic assessment employing voice and speech can be used to detect the signs that are not perceived in the first stages of the disease but which could provide relevant information.

There are several studies and approaches using speech or voice to find biomarkers of the presence of PD or to assess its severity. Most of the literature can be divided into four groups depending on the analysed aspect: phonatory, articulatory, prosodic and linguistic. The phonatory studies are related to the glottal source and resonant structures of the vocal tract. Works based on articulatory and prosodic aspects are more abundant and diverse as there exist more analysis possibilities and since the influences of PD in articulation and prosody seem to be more evident [12]. The works within these groups are based on syllable rate analysis, or the processing of certain segments of the speech to obtain indexes correlated with the disease. Concerning the prosodic works, studies are mainly focused in the paralinguistic features such as pitch variation or the manifestation of emotions among others [13], [14], [15], [16], [17]. Finally, the studies related to deviations in the linguistic domain examine the vocabulary, phrase construction and the existence of word repetitions. Some representative works within this group are found in [18], [19], [20], [21]. The speech material used in each case is a differentiating factor of the four groups. In the phonatory analysis, the most advisable acoustic material is sustained vowels while in the other three groups, running speech is needed. Specifically, in articulatory analysis, diadochokinetic (DDK)3 speech can be valuable in addition to the other running speech materials such as spontaneous speech or reading text.

The present study can be framed into both the articulatory and the phonatory groups attending to the type of analysis that is employed and the acoustic material used.

Going into detail about some articulatory relevant works, studies as [23] indicate that speech processing can produce powerful indicators of imprecise consonant articulation in PD-related dysarthria. Authors perform an analysis of DDK tasks (/pa-ta-ka/) in a database of 24 PD patients and 22 controls, providing 88% of efficiency on separating PD from controls. In this study all the utterances are subdivided automatically into different representative segments to analyse articulation. Only 13 features are obtained by performing measurements on these segments, each feature describing a different articulatory trait of speech. Its main drawbacks are the use of a small database which is sex unbalanced and the use of only DDK utterances, limiting the possible articulatory combinations. Other works such as [24] employ frequency features, namely Mel Frequency Cepstrum Coefficients (MFCC) and Band Bark Energies (BBE) from running speech, and other features obtained after the segmentation of specific regions, providing good results with three corpora. However, in this case the results are too optimistic due to an over-fitting of the model, since it was optimized during training.

Equally, there are articulatory studies more focused on the fluctuations of the voice onset, offset and break segments during running speech, which are considered to be crucial in the evaluation of voice quality. For instance, in [25], [26] it is evidenced that the parkinsonian speech has lower values of relative fundamental frequency, which is the ratio between the fundamental frequency in the cycles of a vowel before or after a voiceless consonant and the typical fundamental frequency during the utterance. The main drawback of these two works is that the databases are unbalanced in sex, which could bias some conclusions. Other studies perform the tracking of vowel formants and VSA during articulation, including onset and offset, with heterogeneous results [27], [28], [29]. As formants reflect the position of the tongue, a reduction of the articulation ranges could subsequently limit the frequency ranges of the formants. In [30], a comparison of PD detection techniques is performed using the acoustic material extracted from sustained vowels, sentence repetitions, reading passages and monologues. An accuracy of 80% is achieved using vowels extracted from monologues, providing enhanced results compared to utilizing sustained vowels. The main drawback of this study is the use of a small and unbalanced database (20 patients and 15 controls).

In any case, these and many other works such as [8], [9], [31], [32] evidence that articulation perturbations introduced by PD can provide reliable information about the presence of the disorder.

Respecting the phonatory works, sustained vowels are expected to generate simpler acoustic structures that might be easier to analyse. Some works demonstrate that it is possible to detect the influence of PD on the vocal folds vibration by reason of the presence of noise and other perturbations caused by incomplete closure [33], abnormal phase closure and phase asymmetry or vocal tremor [34]. Likewise, some works like [35], [36], [37] use dysphonia measures including noise or frequency and amplitude perturbations to assess the severity of PD in telemonitoring scenarios achieving good results. A major drawback, though, is that recordings are done using portable and different equipments introducing noise and variability in the databases which could bias the system.

Finally, other authors employ a combination of techniques as in [38]. In this case, phonatory, prosodic and articulatory features are used jointly, providing results of 80% of accuracy in PD detection.

Although there are many approaches using voice and speech as acoustic materials to detect and assess PD, as far as the authors of this study know, no work has analysed thoughtfully the use of state-of-the-art speaker recognition techniques for this task. Two major classification schemes in this field are Gaussian Mixture Model – Universal Background Model (GMM-UBM) [39] and i-Vectors [40] which are usually employed in combination with phonatory and articulatory information of the speaker. In this study several PD automatic detectors are analysed using GMM-UBM and i-Vectors in combination with different parameterizations and speech tasks.

The paper is organized as follows: Section 2 summarizes the main guidelines of this study. Section 3 develops the theoretical background about the different parameterizations and classification techniques. Section 4 introduces the experimental setup and describes the databases used in this study. Section 5 presents the obtained results. Lastly, Section 6 presents the discussions and 7, the conclusions and future work.

Section snippets

Overview and contribution

The present work performs a thorough study about the influence of the different parameters and configurations of state-of-the-art speaker recognition techniques for the detection of PD. Mainly, different combinations of acoustic material, parameterization and classification schemes are analysed separately to identify the strengths and weaknesses of each one in PD detection.

As it is depicted in Fig. 1, speech materials can be a sustained vowel, a DDK task or two different sentences. Three

Theoretical background

In this section, the main techniques and basis used in the methodology, i.e. the feature families, kinetic changes and classification schemes, are introduced along with a critical discussion of its use in automatic PD detection.

Experimental setup

This section describes the databases and the methodology employed in this work.

Results

For the sake of simplicity in the presentation, this section only includes results leading to best accuracy values and those showing a possible influence of the parameters in performance justified by the presence of PD.

Discussion

The exposed results allow to analyse the influence of parameterization, τwindow, kinetic changes, speech task and classification schemes for the automatic PD detection through speech. The employed methodologies are the state-of-the-art techniques for speaker recognition but different optimum configuration parameters are expected for PD detection.

Influence of the family of features. As it can be inferred from Table 5, Table 6, best results are obtained using Rasta-PLP + Δ + ΔΔ coefficients,

Conclusions and future work

In this work, state-of-the-art speaker recognition techniques are applied and adapted to a different application domain: the detection of PD using the patient's speech. Three families of features are considered, MFCC, Rasta-PLP and LPC along with their respective derivatives, utilizing multiple configurations. Equally, two classification techniques, namely GMM-UBM and i-Vectors, are used to train and test automatic detectors. The objective of this study is mainly twofold: firstly to evaluate

Acknowledgements

The authors of this paper want to thank to Jesús Francisco Vargas Bonilla and Julián David Arias-Londoño from the Faculty of Engineering at Universidad de Antioquia who cooperated with Juan Rafael Orozco-Arroyave in the recording of the GITA database. This work was supported by the Ministry of Economy and Competitiveness of Spain (grants EEBB-I-17-12092, BES-2013-062984 and project TEC2012-38630-C04-01), Universidad Politécnica de Madrid (Ayudas para la realización del doctorado – RR01/2011, XV

References (71)

  • T. Kinnunen et al.

    An overview of text-independent speaker recognition: from features to supervectors

    Speech Commun.

    (2010)
  • M. Li et al.

    Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification

    Comput. Speech Lang.

    (2014)
  • H. Behravan et al.

    Factors affecting i-vector based foreign accent recognition: a case study in spoken Finnish

    Speech Commun.

    (2015)
  • N. Saenz-Lechon et al.

    Methodological issues in the development of automatic systems for voice pathology detection

    Biomed. Signal Process. Control

    (2006)
  • R.D. Kent et al.

    Acoustic studies of dysarthric speech: methods, progress, and potential

    J. Commun. Disord.

    (1999)
  • S. Arora et al.

    Detecting and monitoring the symptoms of Parkinson's disease using smartphones: a pilot study

    Parkinsonism Relat. Disord.

    (2015)
  • R.F. Pfeiffer et al.

    Parkinson's Disease

    (2013)
  • A.J. Hughes et al.

    The accuracy of diagnosis of parkinsonian syndromes in a specialist movement disorder service

    Brain

    (2002)
  • S. Fahn

    Recent Developments in Parkinson's Disease

    (1986)
  • M.M. Hoehn et al.

    Parkinsonism onset, progression, and mortality

    Neurology

    (1967)
  • F.L. Darley et al.

    Differential diagnostic patterns of dysarthria

    J. Speech Lang. Hear. Res.

    (1969)
  • H. Ackermann et al.

    Articulatory deficits in parkinsonian dysarthria: an acoustic analysis

    J. Neurol. Neurosurg. Psychiatry

    (1991)
  • P. Blanchet et al.

    Speech rate deficits in individuals with Parkinson's disease: a review of the literature

    J. Med. Speech – Lang. Pathol.

    (2009)
  • J.W. Tetrud

    Preclinical Parkinson's disease detection of motor and nonmotor manifestations

    Neurology

    (1991)
  • G. Weismer

    Philosophy of research in motor speech disorders

    Clin. Linguist. Phon.

    (2006)
  • J.R. Duffy

    Motor Speech Disorders: Substrates, Differential Diagnosis, and Management

    (2013)
  • S. Skodda et al.

    Speech rate and rhythm in Parkinson's disease

    Mov. Disord.

    (2008)
  • S. Skodda et al.

    Intonation and speech rate in parkinson's disease: general and dynamic aspects and responsiveness to levodopa admission

    J. Voice

    (2011)
  • A.B. Walsh

    Basic parameters of articulatory movements and acoustics in individuals with Parkinson's disease

    Mov. Disord.

    (2012)
  • K. Tjaden et al.

    Vowel acoustics in Parkinson's disease and multiple sclerosis: comparison of clear, loud, and slow speaking conditions

    J. Speech Lang. Hear. Res.

    (2013)
  • D. Van Lancker Sidtis et al.

    Dramatic effects of speech task on motor and linguistic planning in severely dysfluent parkinsonian speech

    Clin. Linguist. Phon.

    (2012)
  • D.V.L. Sidtis et al.

    Formulaic language in Parkinson's disease and Alzheimer's disease: complementary effects of subcortical and cortical dysfunction

    J. Speech Lang. Hear. Res.

    (2015)
  • H. Ackermann et al.

    Oral diadochokinesis in neurological dysarthrias

    Folia Phoniatr. Logop.

    (1995)
  • M. Novotný et al.

    Automatic evaluation of articulatory disorders in Parkinson’s disease

    IEEE/ACM Trans. Audio, Speech and Lang. Process. (TASLP)

    (2014)
  • J.R. Orozco-Arroyave et al.

    Automatic detection of Parkinson's disease in running speech spoken in three different languages

    J. Acoust. Soc. Am.

    (2016)
  • Cited by (0)

    View full text