Audio-Analysis using AWS Transcribe

Introduction to AWS Transcribe

AWS Transcribe is an on-demand, highly customizable speech to a text transcription tool. It is reliable for decoding several audio formats from data stored in S3 and provides correct interpretation for each word, including time marks. AWS Transcribe can also function on streaming audio, delivering a real-time stream of transcribed text. AWS Transcribe supports 16 languages plus four versions in English. But as of October 2019, online audio only supports 5 languages.

AWS Transcribe can distinguish multiple speakers since too many of our experiences include more than one voice. When the work is requested, the number of voices AWS Transcribe can remember will need to be specified. In comparison, where the audio is transmitted on various networks, says a client on one channel, and the call center agent on another, the channel will be used to determine which voice is what, and include explanations of each channel and the two networks. AWS Transcribe is using ML/AI for driving this so that it will go smarter day by day.

Uses Of AWS Transcribe

Nowadays Transcribe is used by the company for the different-different purpose which includes:

  • If you want to convert your audio file into text you can use AWS Transcribe for this.
  • To create any title or caption for your article/speech.
  • To analyze your interaction with customer calls.
  • you can convert the medical dictation into text note.
  • Enabling rich audio and video oral history search capabilities.

Learn how to get started with AWS Transcribe:

  • The first step is to log in to your AWS console as shown in our previous blog. and then click on services and select the transcribe option through the dropdown and you will get a screen like this in the below image.
  • This view shows the condition of all previous transcription workers. In that regard, we have no transcription work requested. Click on the Create button to start a new transcription work.
  • You have to know the route to the S3 file when you build the job in the terminal. You can’t search the screen for this. After the work, descriptions are given and the source file to be transcribed, there are a few production choices.
  • The Output position lets you select where the results of the transcription will be processed. If you pick “Amazon Normal,” the transcript will be stored in a given AWS, protected S3 bucket, and a URI will be provided for you to download the data. Selecting “Specified Customer” lets you specify the bucket S3. Your specific safety posture and organizational policies will determine which option is best for you.
    When you’ve given the transcription information, press the “create” button to file the order for transcription. When you apply for the job you will be added to the job page.

You can see the jobs in the progress, when it will be completed you can click on the job to see the details.

The first part of the transcribed text is shown in the Transcription Preview section.

After this, you need to convert this in the JSON format file so that you can use this for further purposes.

And if you want to do with the coding python SDK then you have to follow the process we have followed for the AWS Rekognition function, here I am attaching the link for your reference, and below is the code which you have to replace for the AWS Lambda function.

import json
import boto3
import time
from urllib.request import urlopen
def lambda_handler(event, context):

transcribe = boto3.client("transcribe")
s3 = boto3.client("s3")

if event:
file_obj = event["Records"][0]
bucket_name = str(file_obj["s3"]["bucket"]["name"])
file_name = str(file_obj["s3"]["object"]["key"])
s3_uri = create_uri(bucket_name, file_name)



file_type = file_name.split(".")[1]
job_name = context.aws_request_id

print(job_name)

transcribe.start_transcription_job(TranscriptionJobName = job_name,
Media = {'MediaFileUri': s3_uri},
MediaFormat = 'mp3',
LanguageCode = "en-US",
OutputBucketName = "aws-audio-analysis-output-2",
Settings={
# 'VocabularyName': 'string',
'ShowSpeakerLabels': True,
'MaxSpeakerLabels': 2,
'ChannelIdentification': False
})



"""while True:
status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["COMPLETED","FAILED"]:
break
print("its in progress")
time.sleep(5)

print(status["TranscriptionJob"]["Transcript"]["TranscriptFileUri"])
load_url = urlopen(status["TranscriptionJob"]["Transcript"]["TranscriptFileUri"])
load_json = json.dumps(json.load(load_url))

s3.put_object(Bucket= bucket_name, Key="transcribeFile/{}.json".format(job_name), Body = load_json)"""





# TODO implement
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

def create_uri(bucket_name,file_name):
return "s3://"+bucket_name+"/"+file_name

So this is how we use AWS Transcribe for the test to speech converting, hope you will like this, and this will e helpful to you!!!!!

basic architect of aws services uses

In my previous blogs including this I have what is cloud computing, About AWS services, we discussed many of the services like how we can use those in our real life, hope all those blogs will be helpful to you!!!

keep reading!!!!

--

--

Wakeupcoders - Digital Marketing & Web App Company

We make your business smarter and broader through the power of the internet. Researcher | Web developer | Internet of things | AI | www.wakeupcoders.com