Catalog transformed data

Create a Glue Crawler

  1. Go to the AWS Glue Console.

  2. In the left navigation menu, click Crawlers.

  3. On the Crawlers page, click Create crawler.

  4. Specify nyc-yellow-tripdata-parquet-crawler as the crawler name, click Next.

  5. On the Choose data sources and classifiers screen, specify the following information, and then click Next.

    • Click Add a data source
    • Choose Data source – S3
    • Select Location of S3 data - In this account
    • Include S3 path – s3://serverlessanalytics-[your-account-id]-transformed/nyc-taxi/yellow-tripdata
    • For Subsequent crawler runs, select to Crawl all sub-folders
    • Then click Add an S3 data source.
  6. On the Configure security settings, choose ServerlessAnalyticsRole from the Existing IAM role, click Next.

  7. On the Set output and scheduling screen, choose nyctaxi_db as the database.

  8. On the Crawler schedule, leave the frequency On demand, click Next.

  9. Review the crawler details, click Create crawler.

  10. On the Crawlers page, select nyc-yellow-tripdata-parquet-crawler, and then click Run crawler.

    Glue Crawler