The Horizon 2020 funded HAAWAII project developed a reliable, error resilient and adaptable solution to automatically transcribe voice commands from air traffic controllers (ATCO) and pilots.
Using machine learning, the project build on very large collections of speech data, organized with a minimum expert effort, to develop a new set of speech recognition models for the complex ATM environments of the London terminal area (TMA) and Icelandic enroute airspace. Speech and surveillance data recordings from real-life pilot-controller communications, i.e., directly from the operations rooms, are used.
HAAWAII aimed to significantly enhance ATM safety and reduce ATCOs workload. The digitization of controller-pilot-communication can be used for a wide variety of safety and performance related ATM improvements. Proof-of-concept applications are readback-error-detection, callsign-highlighting and ATCO-workload-estimation.
In the short clip below you see the HAAWAII prototype in action with automatic radar label maintenance, early callsign highlighting, immediate online recognition and readback-error-detection:
About HAAWAII
Air traffic controllers use flight strips to manage information concerning an aircraft. If an aircraft is given clearance, this information must be logged in the flight strip. Paper flight strips are easy to maintain, but the information they contain is not available in digital form in the overall system. A remedy is offered by electronic flight strips. These, however, increase the workload and, depending on the implementation, also the length of time for which the controller has to turn his eyes away from the radar screen (head-down times).
Voice recognition based on artificial intelligence (AI) offers a solution here. The projects AcListant® and AcListant®-Strips have shown that both good recognition rates and low recognition-error rates can be achieved with assistance-based speech recognition, i.e. by coupling a controller assistance system with a speech recogniser. Both factors result in the assistance system being able to better recognise the intentions of the controller, as a result of which it can support the controller more efficiently in his work.
The subsequent project MALORCA showed that through machine learning, such assistance-based speech recognisers can be adapted – automatically and therefore inexpensively – to different airports. The prerequisite for this is that sufficient speech data and radar data are available to train the algorithms of machine learning.
The current project HAAWAII is based on the work of AcListant® and MALORCA. For the first time, it also includes the recognition of pilot radio traffic and will use significantly more voice data to train the AI algorithms: MALORCA used only 25 hours of voice data for learning, while HAAWAII will use more than 1,000 hours. As an example, HAAWAII selects the complex environments of en-route air traffic in Iceland as well as air traffic in the terminal area (TMA) of London. Particular challenges here are, in addition to diverse accents and significantly poorer speech signals, aspects of data protection.
The work in HAAWAII will both improve air traffic safety and reduce the workload of air traffic controllers. One of the main application areas of HAAWAII research will be to recognise whether the pilot has understood exactly what the controller has said to him. This can help to avoid misunderstandings in communication. In order to achieve this, the validity of speech-recognition models must be significantly improved.
The digitisation of spoken messages from air traffic controllers and pilots can be used for a multitude of safety and efficiency-enhancing applications, e.g. in order to create advance entries in electronic flight strips with little effort or to transmit controller commands directly to the aircraft’s on-board computer via data link (Controller Pilot Data Link Communication, CPDLC). A further application is the objective estimation of air traffic controllers’ workload by means of digitised voice recordings of the complex London terminal control area.
Participants
The following partners were involved in this project: Deutsches Zentrum für Luft- und Raumfahrt (German Aerospace Center), idiap Research Institute, BRNO University of Technology, austro CONTROL, NATS, ISAVIA ANS, CROATIA CONTROL
References
Metrics and Command Extraction
The common paper of DLR, Idiap and the Lithuanian Air Navigation Service Provider Oro navigacija describes the recognition performance on word and on semantic level for utterance from the Lithuanian airspace.
The common paper of DLR, Idiap and NATS describes the rule-based algorithm to transform a sequence of words from an air traffic controller or pilot utterance to its semantic interpretation defined an extension of the 16-04 ontology. The defined JSON format allows a consistent exchange concerning Speech-to-Text transformation, ontology information or both together between different systems and applications. The format is by definition machine readable and easy to expand with additional key-value pairs while ensuring compatibility with old data. The paper also impressively shows the break-down of extraction performance, when no surveillance data is used or available.
The common paper of DLR, Idiap, Austro Control and Czech air navigation service provider ANS CR introduces the metrics command extraction rate, callsign extraction rate, command extraction error rate. These rates are evaluated on utterances of Austro Control and ANS CR, which are recorded in the MALORCA project in the ops room environment and in the solution 16-04 in the lab environment.
A shorter version of this Paper was presented during the Satellite Workshop at the Interspeech 2021:
Readback Error Detection
The king’s discipline of Automatic Speech Recognition and Understanding is readback error detection. Noisy and very abbreviated readback of pilot utterances require speech recognition and its semantic interpretation even when word error rates are beyond 10%. And the even bigger challenge is that readback errors are, luckily, seldom events. Only 1% to 4% of the conversation contain readback errors.
The common paper of DLR, Isavia ANS, Idiap, University of Brno (BUT), NATS and Austro Control shows that a recognition rate on command level of slightly above 50% is already sufficient to achieve a readback error detection of 50%, provided the error rate on command level is below 0.2%. Otherwise a readback error false alarm rate of more than 10% must be accepted.
The common paper of DLR, BUT, Isavia ANS and Idiap presented two different algorithms for readback error detection: a rule-based one and a data-driven one, which is based on training a neural network by artificial readback error samples. The paper also presents two different approaches for command extraction, again a rule-based one and a data-driven one.
Application of HAAWAII architecture
The HAAWAII architecture was already successfully used in different projects. HAAWAII architecture means:
- to use Assistant Based Speech Recognition (ABSR), which integrates contextual knowledge (e.g., callsigns) from flight plan and surveillance data into Speech Recognition (Speech-to-Text with so called callsign boosting) and Speech Understanding (Text-to-Concept),
- to make very clear, that speech recognition (Speech-to-Text) does not automatically incorporate speech understanding (Text-to-Concept), only both together can enable an automatic speech recognition and understanding (ASRU)
- to use contextual knowledge from the conversation (e.g., previous utterance) in Text-to-Concept, e.g. “two zero zero thank you” in a pilot readback is very probable an altitude readback, and not an speed or heading readback, if the ATCo has just given a CLIMB command to flight level 200 (RBA),
- to integrate command validation in Text-to-Concept phase (VAL),
- to have the same acoustic and language model for ATCo and pilot utterances (ONE),
- to have a separate block for detection of voice transmissions, which either relies on push-to-talk (PTT) availability or needs to evaluate the input wave signal in more detail (Voice Activity Detection, VAD)
- to repair over- or under-splitting in the Text-to-Concept phase (REP)
The paper above written by DLR, Idiap, Fraport and Atrics benefits from HAAWAII elements ABSR, ASRU, VAL, VAD and REP. It integrates a modern A-SMGCS system with speech recognition and understanding to support apron controllers for maintaining flight strip information and supports simulation pilots to reduce their workload.
The common paper of DLR, Indra Navia AS, LEONARDO S.p.A., the Lithuanian ANSP Oro Navigacija, HungaroControl and Austro Control benefits from ABSR, ASRU, VAL, PTT and REP. It summarizes the results of 3 exercise conducted in solution 97 of SESAR Industrial Research with respect to speech recognition and understanding support for tower controllers.
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator, Zuluaga-Gomez A. Prasad, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.
The paper from Idiap shows how to support simulations pilots with automatic speech recognition. The main ideas are:
(i) ASR to generate transcript from ATCo,
(ii) an entity generator to tag words (callsigns, commands, value) and
(iii) a repetition generator that uses a rule-based system to generate a pilot response based on the generated tags and a text-to-speech system that acts as a pseudo-pilot to repeat the generated pilot response.
Callsign Extraction
The first step for ATC utterance understanding is to extract the callsign. Knowing which numbers and letters belong to the callsign, extracting the values of the commands is eased. The following papers of the HAAWAII team addressed this challenge.
The paper of DLR shows the advantages when callsign information is available and used. The algorithm is described and the effect of the different algorithm parts are also shown. We see quantitative decrease in performance, when certain parts of the algorithm are excluded.
The paper from Idiap addresses the improvement on word level when callsign boosting is applied, by using information from the flight plan and surveillance data.
The paper from University of Brno (BUT), Saarland University and Idiap addresses the improvement on word level when callsign boosting.
Speech-to-Text
The following papers concentrate on improvements on speech-to-text level.
The common paper of DLR and Braunschweig University shows the results of training the DeepSpeech Engine to recognize utterance from Prague and Vienna ops room environment.
The common paper of Idiap, DLR and Beijing Institute of Technology addresses the usage of Pre-trained Wav2Vec2.0.
The paper from Idiap presents a two-step approach to leverage contextual data.
The paper above from Idiap, University of Brno (BUT) and ReplayWell, addresses Contextual Semi-Supervised Learning.
The paper from Brno University of Technology (BUT) describes how to detect English speech in ATC utterances containing more than one language.
Speaker-Role Classification
The common paper from Idiap and DLR describe the application of BERTraffic to detect the speaker role, i.e. whether air traffic controller or pilot is speaking.
The following paper for Idiap and DLR uses a grammar-based approach for identifying the speaker role.
A shorter version of the paper was presented at the Interspeech 2021.
Public Project Deliverables
| Deliverable | Description | Link |
|---|---|---|
| D1-1 | This document contains the operational concept of the HAAWAII project. It addresses the high-level Automatic Speech Recognition use cases read-back error detection, ATCO workload assessment, callsign highlighting, and integration of speech recognition with CPDLC, radar label prefilling, and consistency checking of manual versus verbal input. It is a living document. The final version will be submitted as D6.2 at the end of the project. | Click here |
| D3-2 | This deliverable concentrates on the semantic interpretation, i.e., the annotation of the transcribed voice recordings by using the information from corresponding voice recordings from NATS. At the time of its submission, 7.5 hours of manually transcribed pilot and ATCo utterances were available from London airspace. Utterances corresponding to about 57 minutes of voice data were manually annotated, while the remaining 6.5 hours were annotated automatically. | Click here |
| D3-3 | This deliverable concentrates on the semantic interpretation, i.e. the annotation, of the transcribed voice recordings by using the information from corresponding voice recordings from Isavia. At the time of its submission, 7.5 hours of manually transcribed pilot and ATCo utterances were available from Isavia airspace. 90 minutes of them were manually annotated, the remaining 6 hours were automatically annotated. | Click here |
| D5-5 | Final Project Results Report | Click here |
| D6-1 | This deliverable summarizes the dissemination of the HAAWAII project by conducting Stakeholder workshops. It gives a report summarizing the first Stakeholder workshop conducted at the end of June 2021 and of the second Stakeholder Workshop conducted end of September 2022. | Click here |
| D6-2 | This deliverable is an update of D1-1 and contains the findings added to D1-1 during the project.The document was updated during the last months considering the feedback from SJU and especially from IFATCA. | Click here |
| D6-3 | Updated Requirements Document | Click here |
| D6-5 | Results of Dissemination, Communication and Exploitation. D6-4 is a living documented being updated during the lifetime of the HAAWAII project. D6-5 is the latest version of D6-4. | Click here |
References used as starting point for the project
- Helmke, J. Rataj, T. Mühlhausen, O. Ohneiser, H. Ehr, M. Kleinert, Y. Oualil, and M. Schulder, “Assistant-based speech recognition for ATM applications,” in 11th USA/Europe Air Traffic Management Research and Development Seminar (ATM2015), Lisbon, Portugal, 2015.
- Helmke, O. Ohneiser, Th. Mühlhausen, M. Wies, ”Reducing controller workload with automatic speech recognition,” in IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). Sacramento, California, 2016.
- Helmke, O. Ohneiser, J. Buxbaum, C. Kern, “Increasing ATM efficiency with assistant-based speech recognition,” in 12th USA/Europe Air Traffic Management Research and Development Seminar (ATM2017). Seattle, Washington, 2017.
- Kleinert, H. Helmke, G. Siol, H. Ehr, A. Cerna, C. Kern, D. Klakow, P. Motlicek et al., ”Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, England, 2018.
- Helmke, M. Slotty, M. Poiger, D. F. Herrer, O. Ohneiser et al., “Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ.16-04,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, United Kingdom, 2018.
- Kleinert, H. Helmke, S. Moos, P. Hlousek, C. Windisch, O. Ohneiser, H. Ehr, and A. Labreuil, “Machine Learning of Air Traffic Controller Command Extraction Models for Speech Recognition Applications,” 9th SESAR Innovation Days, Athens, Greece, 2019.
- Helmke, M.Kleinert, O. Ohneiser, H. Ehr, and S. Shetty, “Reducing Controller Workload by Automatic Speech Recognition Assisted Radar Label Maintenance,” in IEEE/AIAA 39th Digital Avionics Systems Conference (DASC). Virtual Conference, 2020.
- atco2.org
