Skip to content

NYC Taxi Data Pipeline

This GitHub Actions workflow automates the end-to-end data pipeline, from initializing the Snowflake infrastructure to producing analytical tables and views using Python and dbt.


πŸ’» Project source code
πŸ“š Online dbt documentation

πŸ“Š Data Source

TLC Trip Record Data - NYC Taxi and Limousine Commission

The data includes:

  • Pickup and dropoff dates/times
  • Pickup and dropoff zones
  • Distances, detailed fares, payment types
  • Passenger count reported by the driver

The data is collected by authorized technology providers and provided to the TLC. The TLC does not guarantee the accuracy of this data.

πŸ“„ License

This project is licensed under the MIT License. The source data is provided by the NYC TLC and subject to their terms of use.