Add and join another dataset

Add lookup table for Taxi Drop-off Zone

  1. Click on the Source icon, choose Amazon S3

  2. In the Data source – S3 bucket node, the specify the following information:

    • Node properties tab, Name - Dropoff Zone Lookup
    • Data source properties tab, Databasenyctaxi_db
    • Data source properties tab, Tableraw_taxi_zone_lookup

    Glue Jobs

  3. Remember to Save your work.

Modify column names of Drop-off Taxi Zone Lookup table

  1. Make sure the Amazon S3 - Dropoff Zone Lookup node is selected.

  2. Click on the Transform icon, choose Change Schema.

  3. Specify the following information:

    • Node properties tab, Name - Change Schema - Dropoff Zone Lookup

    • Transform tab, modify the target key of the following:

      • locationid to do_location_id
      • borough to do_borough
      • zone to do_zone
      • service_zone to do_service_zone

      Glue Jobs

  4. Remember to Save your work.

Join Yellow Trips data and Dropoff Taxi Zone Lookup data

  1. Click on the Transform icon, choose Join.

  2. Specify the following information:

    • Node properties tab, Name - Yellow Trips Data + Pickup Zone Lookup + Dropoff Zone Lookup

    • Node properties tab, Node Parents

    • Change Schema - Dropoff Zone Lookup

    • Yellow Trips Data + Pickup Zone Lookup

    • Transform properties tab, under the Join conditions, select the following keys:

      • Change Schema - Dropoff Zone Lookupdo_location_id
      • Yellow Trips Data + Pickup Zone Lookupdolocationid

      Glue Jobs

  3. Remeber to Save your work.