HAAWAII – SELF-MADE-ATC

The Horizon 2020 funded HAAWAII project developed a reliable, error resilient and adaptable solution to automatically transcribe voice commands from air traffic controllers (ATCO) and pilots.

Using machine learning, the project build on very large collections of speech data, organized with a minimum expert effort, to develop a new set of speech recognition models for the complex ATM environments of the London terminal area (TMA) and Icelandic enroute airspace. Speech and surveillance data recordings from real-life pilot-controller communications, i.e., directly from the operations rooms, are used.

HAAWAII aimed to significantly enhance ATM safety and reduce ATCOs workload. The digitization of controller-pilot-communication can be used for a wide variety of safety and performance related ATM improvements. Proof-of-concept applications are readback-error-detection, callsign-highlighting and ATCO-workload-estimation.

In the short clip below you see the HAAWAII prototype in action with automatic radar label maintenance, early callsign highlighting, immediate online recognition and readback-error-detection:

About HAAWAII

Air traffic controllers use flight strips to manage information concerning an aircraft. If an aircraft is given clearance, this information must be logged in the flight strip. Paper flight strips are easy to maintain, but the information they contain is not available in digital form in the overall system. A remedy is offered by electronic flight strips. These, however, increase the workload and, depending on the implementation, also the length of time for which the controller has to turn his eyes away from the radar screen (head-down times).

Voice recognition based on artificial intelligence (AI) offers a solution here. The projects AcListant® and AcListant®-Strips have shown that both good recognition rates and low recognition-error rates can be achieved with assistance-based speech recognition, i.e. by coupling a controller assistance system with a speech recogniser. Both factors result in the assistance system being able to better recognise the intentions of the controller, as a result of which it can support the controller more efficiently in his work.

The subsequent project MALORCA showed that through machine learning, such assistance-based speech recognisers can be adapted – automatically and therefore inexpensively – to different airports. The prerequisite for this is that sufficient speech data and radar data are available to train the algorithms of machine learning.

The current project HAAWAII is based on the work of AcListant® and MALORCA. For the first time, it also includes the recognition of pilot radio traffic and will use significantly more voice data to train the AI algorithms: MALORCA used only 25 hours of voice data for learning, while HAAWAII will use more than 1,000 hours. As an example, HAAWAII selects the complex environments of en-route air traffic in Iceland as well as air traffic in the terminal area (TMA) of London. Particular challenges here are, in addition to diverse accents and significantly poorer speech signals, aspects of data protection.

The work in HAAWAII will both improve air traffic safety and reduce the workload of air traffic controllers. One of the main application areas of HAAWAII research will be to recognise whether the pilot has understood exactly what the controller has said to him. This can help to avoid misunderstandings in communication. In order to achieve this, the validity of speech-recognition models must be significantly improved.

The digitisation of spoken messages from air traffic controllers and pilots can be used for a multitude of safety and efficiency-enhancing applications, e.g. in order to create advance entries in electronic flight strips with little effort or to transmit controller commands directly to the aircraft’s on-board computer via data link (Controller Pilot Data Link Communication, CPDLC). A further application is the objective estimation of air traffic controllers’ workload by means of digitised voice recordings of the complex London terminal control area.

Participants

The following partners were involved in this project: Deutsches Zentrum für Luft- und Raumfahrt (German Aerospace Center), idiap Research Institute, BRNO University of Technology, austro CONTROL, NATS, ISAVIA ANS, CROATIA CONTROL

References

Metrics and Command Extraction

Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances, O. Ohneiser, S. Sarfjoo, H. Helmke, S. Shetty, P. Motlicek M. Kleinert, H. Ehr, Š. Murauskas (Oro navigacija, Lithuania), Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021, pp. 3291-3295

The common paper of DLR, Idiap and the Lithuanian Air Navigation Service Provider Oro navigacija describes the recognition performance on word and on semantic level for utterance from the Lithuanian airspace.

Automated Interpretation of Air Traffic Control Communication: The Journey from Spoken Words to a Deeper Understanding of the Meaning. M. Kleinert, H. Helmke, S. Shetty, O. Ohneiser, H. Ehr, A. Prasad, P. Motlicek, J. Harfmann, 40^th Digital Avionics System Conference (DASC 21), hybrid conference, San Antonio, Texas, USA, October 3-7, 2021.

The common paper of DLR, Idiap and NATS describes the rule-based algorithm to transform a sequence of words from an air traffic controller or pilot utterance to its semantic interpretation defined an extension of the 16-04 ontology. The defined JSON format allows a consistent exchange concerning Speech-to-Text transformation, ontology information or both together between different systems and applications. The format is by definition machine readable and easy to expand with additional key-value pairs while ensuring compatibility with old data. The paper also impressively shows the break-down of extraction performance, when no surveillance data is used or available.

Measuring Speech Recognition And Understanding Performance in Air Traffic Control Domain Beyond Word Error Rates; H. Helmke, S. Shetty, M. Kleinert, O. Ohneiser, H. Ehr, A. Prasad, P. Motlicek, A. Cerna and C. Windisch, 11^th SESAR Innovation Days, online conference, 2021.

The common paper of DLR, Idiap, Austro Control and Czech air navigation service provider ANS CR introduces the metrics command extraction rate, callsign extraction rate, command extraction error rate. These rates are evaluated on utterances of Austro Control and ANS CR, which are recorded in the MALORCA project in the ops room environment and in the solution 16-04 in the lab environment.

A shorter version of this Paper was presented during the Satellite Workshop at the Interspeech 2021:

How to Measure Speech Recognition Performance in the Air Traffic Control domain? The Word Error Rate is only half of the truth! . Helmke, S. Shetty, M. Kleinert, O. Ohneiser, H. Ehr, A. Prasad, P. Motlicek, A. Cerna and C. Windisch, Interspeech 2021 Satellite Workshop, Brno, Chechia, 30 August – 3 September, 2021.

Readback Error Detection

The king’s discipline of Automatic Speech Recognition and Understanding is readback error detection. Noisy and very abbreviated readback of pilot utterances require speech recognition and its semantic interpretation even when word error rates are beyond 10%. And the even bigger challenge is that readback errors are, luckily, seldom events. Only 1% to 4% of the conversation contain readback errors.

Readback Error Detection by Automatic Speech Recognition to Increase ATM Safety, H. Helmke, M. Kleinert, S. Shetty, O. Ohneiser, H. Ehr, H. Arilíusson, T. Simiganoschi, A. Prasad, Amrutha, P. Motlicek, K. Veselý, K. Ondřej, P. Smrz, J. Harfmann, C. Windisch,.14^th ATM Seminar, 20.09.2021 – 24.09.2021, Virtual conference.

The common paper of DLR, Isavia ANS, Idiap, University of Brno (BUT), NATS and Austro Control shows that a recognition rate on command level of slightly above 50% is already sufficient to achieve a readback error detection of 50%, provided the error rate on command level is below 0.2%. Otherwise a readback error false alarm rate of more than 10% must be accepted.

Readback Error Detection by Automatic Speech Recognition and Understanding, H. Helmke, K. Ondřej, S. Shetty, H. Arilíusson, T. Simiganoschi, M. Kleinert, O. Ohneiser, H. Ehr, J. Zuluaga-Gomez, P Smrz, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8, 2022.

The common paper of DLR, BUT, Isavia ANS and Idiap presented two different algorithms for readback error detection: a rule-based one and a data-driven one, which is based on training a neural network by artificial readback error samples. The paper also presents two different approaches for command extraction, again a rule-based one and a data-driven one.

Application of HAAWAII architecture

The HAAWAII architecture was already successfully used in different projects. HAAWAII architecture means:

to use Assistant Based Speech Recognition (ABSR), which integrates contextual knowledge (e.g., callsigns) from flight plan and surveillance data into Speech Recognition (Speech-to-Text with so called callsign boosting) and Speech Understanding (Text-to-Concept),
to make very clear, that speech recognition (Speech-to-Text) does not automatically incorporate speech understanding (Text-to-Concept), only both together can enable an automatic speech recognition and understanding (ASRU)
to use contextual knowledge from the conversation (e.g., previous utterance) in Text-to-Concept, e.g. “two zero zero thank you” in a pilot readback is very probable an altitude readback, and not an speed or heading readback, if the ATCo has just given a CLIMB command to flight level 200 (RBA),
to integrate command validation in Text-to-Concept phase (VAL),
to have the same acoustic and language model for ATCo and pilot utterances (ONE),
to have a separate block for detection of voice transmissions, which either relies on push-to-talk (PTT) availability or needs to evaluate the input wave signal in more detail (Voice Activity Detection, VAD)
to repair over- or under-splitting in the Text-to-Concept phase (REP)

Apron Controller Support by Integration of Automatic Speech Recognition with an Advanced Surface Movement Guidance and Control System, M. Kleinert, H. Helmke, S. Shetty, O. Ohneiser, H. Ehr, I. Nigmatulina, H. Wiese, M. Maier, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.

The paper above written by DLR, Idiap, Fraport and Atrics benefits from HAAWAII elements ABSR, ASRU, VAL, VAD and REP. It integrates a modern A-SMGCS system with speech recognition and understanding to support apron controllers for maintaining flight strip information and supports simulation pilots to reduce their workload.

Understanding Tower Controller Communication for Support in Air Traffic Control Displays , O. Ohneiser, H. Helmke, S. Shetty, M. Kleinert, H. Ehr, G. Balogh, A. Tønnesen, W. Rinaldi, S. Mansi, G. Piazzolla, Š. Murauskas, T. Pagirys, G. Kis-Pál, R. Tichy, V. Horváth, F. Kling, H. Usanovic, , SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.

The common paper of DLR, Indra Navia AS, LEONARDO S.p.A., the Lithuanian ANSP Oro Navigacija, HungaroControl and Austro Control benefits from ABSR, ASRU, VAL, PTT and REP. It summarizes the results of 3 exercise conducted in solution 97 of SESAR Industrial Research with respect to speech recognition and understanding support for tower controllers.

Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator, Zuluaga-Gomez A. Prasad, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.

The paper from Idiap shows how to support simulations pilots with automatic speech recognition. The main ideas are:
(i) ASR to generate transcript from ATCo,
(ii) an entity generator to tag words (callsigns, commands, value) and
(iii) a repetition generator that uses a rule-based system to generate a pilot response based on the generated tags and a text-to-speech system that acts as a pseudo-pilot to repeat the generated pilot response.

Callsign Extraction

The first step for ATC utterance understanding is to extract the callsign. Knowing which numbers and letters belong to the callsign, extracting the values of the commands is eased. The following papers of the HAAWAII team addressed this challenge.

Early Callsign Highlighting using Automatic Speech Recognition to reduce Air Traffic Controller Workload, S. Shetty, H. Helmke, M. Kleinert, O. Ohneiser, International Conference on Applied Human Factors and Ergonomics (AHFE), 24 – 28 July 2022, New York, USA.

The paper of DLR shows the advantages when callsign information is available and used. The algorithm is described and the effect of the different algorithm parts are also shown. We see quantitative decrease in performance, when certain parts of the algorithm are excluded.

Improving callsign recognition with air-surveillance data in air-traffic communication, I. Nigmatulina, R. Braun, J. Zuluaga-Gomez, P. Motlicek, Interspeech 2021 Satellite Workshop, Brno, Chechia, 30 August – 3 September, 2021.

The paper from Idiap addresses the improvement on word level when callsign boosting is applied, by using information from the flight plan and surveillance data.

Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition, Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga Gomez, Igor Szöke, Jan Černocký, Dietrich Klakow Petr Motlicek, Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021.

The paper from University of Brno (BUT), Saarland University and Idiap addresses the improvement on word level when callsign boosting.

Speech-to-Text

The following papers concentrate on improvements on speech-to-text level.

M. Kleinert, N. Venkatarathinam, H. Helmke, O. Ohneiser, M. Strake, T. Fingscheidt: Easy Adaptation of Speech Recognition to Different Air Traffic Control Environments using the DeepSpeech Engine, 11^th SESAR Innovation Days, online conference, 2021.

The common paper of DLR and Braunschweig University shows the results of training the DeepSpeech Engine to recognize utterance from Prague and Vienna ops room environment.

Zuluaga-Gomez, J., Prasad, A., Nigmatulina, I., Sarfjoo, S., Motlicek, P., Kleinert, M., Helmke, H., Ohneiser, O. and Zhan, Q. “How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications”, 2023 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.

The common paper of Idiap, DLR and Beijing Institute of Technology addresses the usage of Pre-trained Wav2Vec2.0.

A two-step approach to leverage contextual data: speech recognition in air-traffic communications, Nigmatulina Iuliia, Zuluaga-Gomez. Juan, Amrutha Prasad, Seyyed Saeed Sarfjoo and Petr Motlicek, in: Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.

The paper from Idiap presents a two-step approach to leverage contextual data.

Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Karel Veselý, Martin Kocour Igor Szöke, Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021.

The paper above from Idiap, University of Brno (BUT) and ReplayWell, addresses Contextual Semi-Supervised Learning.

Detecting English Speech in the Air Traffic Control Voice Communication, Igor Szöke, Santosh Kesiraju, Ondřej Novotný, Martin Kocour, Karel Veselý, Jan Černocký, Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021.

The paper from Brno University of Technology (BUT) describes how to detect English speech in ATC utterances containing more than one language.

Speaker-Role Classification

The common paper from Idiap and DLR describe the application of BERTraffic to detect the speaker role, i.e. whether air traffic controller or pilot is speaking.

BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications J. Gomez, S. S. Sarfjoo, A. Prasad, I. Nigmatulina, P. Motlicek, O. Ohneiser, H. Helmke, H, 2023 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.

The following paper for Idiap and DLR uses a grammar-based approach for identifying the speaker role.

Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition, Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Oliver Ohneiser, Hartmut Helmke, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8, 2022.

A shorter version of the paper was presented at the Interspeech 2021.

Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR, Amrutha Prasad (Idiap), Juan Pablo Zuluaga, Petr Motlicek, Oliver Ohneiser, Hartmut Helmke, Seyyed Saeed Sarfjoo and Iuliia Nigmatulina, , Interspeech 2021 Satellite Workshop, Brno, Chechia, 30 August – 3 September, 2021.

Public Project Deliverables

Deliverable	Description	Link
D1-1	This document contains the operational concept of the HAAWAII project. It addresses the high-level Automatic Speech Recognition use cases read-back error detection, ATCO workload assessment, callsign highlighting, and integration of speech recognition with CPDLC, radar label prefilling, and consistency checking of manual versus verbal input. It is a living document. The final version will be submitted as D6.2 at the end of the project.	Click here
D3-2	This deliverable concentrates on the semantic interpretation, i.e., the annotation of the transcribed voice recordings by using the information from corresponding voice recordings from NATS. At the time of its submission, 7.5 hours of manually transcribed pilot and ATCo utterances were available from London airspace. Utterances corresponding to about 57 minutes of voice data were manually annotated, while the remaining 6.5 hours were annotated automatically.	Click here
D3-3	This deliverable concentrates on the semantic interpretation, i.e. the annotation, of the transcribed voice recordings by using the information from corresponding voice recordings from Isavia. At the time of its submission, 7.5 hours of manually transcribed pilot and ATCo utterances were available from Isavia airspace. 90 minutes of them were manually annotated, the remaining 6 hours were automatically annotated.	Click here
D5-5	Final Project Results Report	Click here
D6-1	This deliverable summarizes the dissemination of the HAAWAII project by conducting Stakeholder workshops. It gives a report summarizing the first Stakeholder workshop conducted at the end of June 2021 and of the second Stakeholder Workshop conducted end of September 2022.	Click here
D6-2	This deliverable is an update of D1-1 and contains the findings added to D1-1 during the project.The document was updated during the last months considering the feedback from SJU and especially from IFATCA.	Click here
D6-3	Updated Requirements Document	Click here
D6-5	Results of Dissemination, Communication and Exploitation. D6-4 is a living documented being updated during the lifetime of the HAAWAII project. D6-5 is the latest version of D6-4.	Click here

References used as starting point for the project

Helmke, J. Rataj, T. Mühlhausen, O. Ohneiser, H. Ehr, M. Kleinert, Y. Oualil, and M. Schulder, “Assistant-based speech recognition for ATM applications,” in 11^th USA/Europe Air Traffic Management Research and Development Seminar (ATM2015), Lisbon, Portugal, 2015.
Helmke, O. Ohneiser, Th. Mühlhausen, M. Wies, ”Reducing controller workload with automatic speech recognition,” in IEEE/AIAA 35^th Digital Avionics Systems Conference (DASC). Sacramento, California, 2016.
Helmke, O. Ohneiser, J. Buxbaum, C. Kern, “Increasing ATM efficiency with assistant-based speech recognition,” in 12^th USA/Europe Air Traffic Management Research and Development Seminar (ATM2017). Seattle, Washington, 2017.
Kleinert, H. Helmke, G. Siol, H. Ehr, A. Cerna, C. Kern, D. Klakow, P. Motlicek et al., ”Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas,” in IEEE/AIAA 37^th Digital Avionics Systems Conference (DASC). London, England, 2018.
Helmke, M. Slotty, M. Poiger, D. F. Herrer, O. Ohneiser et al., “Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ.16-04,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, United Kingdom, 2018.
Kleinert, H. Helmke, S. Moos, P. Hlousek, C. Windisch, O. Ohneiser, H. Ehr, and A. Labreuil, “Machine Learning of Air Traffic Controller Command Extraction Models for Speech Recognition Applications,” 9^th SESAR Innovation Days, Athens, Greece, 2019.
Helmke, M.Kleinert, O. Ohneiser, H. Ehr, and S. Shetty, “Reducing Controller Workload by Automatic Speech Recognition Assisted Radar Label Maintenance,” in IEEE/AIAA 39th Digital Avionics Systems Conference (DASC). Virtual Conference, 2020.
atco2.org