The IT provider Amazon offers a variety of different technologies for automated speech recognition and speech processing in its AWS cloud. This also includes the Transcribe speech-to-text engine, which converts voice recordings into text for further processing. Like the company announces on his AWS blog, Transcribe should be able to automatically remove personal data from the texts created in the future.
According to the announcement, the Transcribe service is particularly popular in call centers to make it easier to process customer calls. According to the announcement, the offer will be used for further statistical analysis or for direct language processing tasks such as sentiment analysis.
Personal data will be removed
With a reference to the privacy of the recorded as well as to local legislation, Amazon introduces the possibility to remove information from the transcribed text that can lead to personal identification. This should include, for example, the social security number, credit card and account details, PINs or telephone numbers. In addition, there are data such as names and email addresses.
The service will initially only be provided for US English, other languages may follow. Amazon has also posted a fairly simple example on its blog to demonstrate how to remove the data.
Various machine learning systems have been noticed in the past by reproducing or even reinforcing existing prejudices. These include a sexist application tool or image recognition that only works well for white men.
Looking at these negative examples, it could well be that the tool presented by Amazon could have problems with actually recognizing names as such. In addition, the service precludes international phone numbers from the outset, which obviously cannot be recognized automatically.