AWS Glue & Athena Tutorial To Search IMDB Movie Ratings

In this video, we will look at a couple of AWS Services (Glue and Athena) and build a simple application to query movie ratings. Objective of this session is to get a basic understanding of the AWS Services (Athena and…

AWS Glue & Athena Tutorial To Search IMDB Movie Ratings

Source

0
(0)

In this video, we will look at a couple of AWS Services (Glue and Athena) and build a simple application to query movie ratings.

Objective of this session is to get a basic understanding of the AWS Services (Athena and Glue) and also use the public datasets. After the completion of this exercise, do remember to delete all the AWS resources created to avoid costs.

We use publicly available datasets from IMDb Movie titles & ratings. Visit the IMDb site (https://datasets.imdbws.com) for terms and conditions of using the IMDb data sets.

We store the data in S3 and use Glue to crawl and catalogue the data, which can be further queried by Athena. By the end of this video, you will learn about AWS Athena, Glue and CloudShell and when and how to use them.

Appreciate, if you can provide feedback / comments on this video and subscribe to the channel for receiving updates.

IMDb Datasets are available at : https://datasets.imdbws.com

CLI Commands in Cloud Shell to upload data files to S3 (Replace {your bucket name} with your S3 bucket name)
aws s3 mb s3://{your bucket name}
wget https://datasets.imdbws.com/title.basics.tsv.gz
wget https://datasets.imdbws.com/title.ratings.tsv.gz
aws s3 cp ./title.basics.tsv.gz s3://{your bucket name}/basics/basics.tsv.gz
aws s3 cp ./title.ratings.tsv.gz s3://{your bucket name}/ratings/ratings.tsv.gz

Athena Sample Query :
SELECT
bas.primarytitle, rat.averagerating
FROM
“athenademo”.”basics” bas
JOIN
“athenademo”.”ratings” rat
ON
bas.tconst = rat.tconst
WHERE
primarytitle like ‘%Shawshank%Redemption%’

0 / 5. 0