Add a data source

Adding Yellow Trips data from Amazon S3

  1. Click on the Sources icon, choose Amazon S3.

    Glue Jobs

  2. In the Data source – S3 bucket node, the specify the following information:

    • Node properties tab, Name - Yellow Trip Data
    • Data source properties tab, Databasenyctaxi_db
    • Data source properties tab, Tableraw_yellow_tripdata

    Glue Jobs

  3. Click Save. Remember to save your work as you progress on building the transformation steps.

Review resulting data schema and previewing data

  1. Go to the Output schema tab to review the resulting data schema.

    Glue Jobs

    AWS Glue Studio now allows you to preview your data at each step of the visual job authoring process so you can test and debug your transformations without having to save or run the job.

    The first time you choose the Data preview tab, you are prompted to choose an IAM role to use. The IAM role you choose must have the necessary permissions to create the data previews. This can be the same role that you plan to use for your job, or it can be a different role.

    After you choose an IAM role, it takes about 20 to 30 seconds before the data appears. You are charged for data preview usage as soon as you choose the IAM role.