Welcome to the Uber Insight Engine project! This project aims to provide valuable insights into NYC taxi data from the TLC trip record dataset. The entire process involves data extraction, transformation, loading into Google Cloud Storage (GCS), building a data pipeline with mage.ai, creating data marts for visualization and ETA prediction, and implementing a test API for random average speed values.
- NYC TLC trip record data
- Python and Jupyter Notebook
- Google Cloud Storage account
- mage.ai
- Google BigQuery
- Power BI Desktop
-
Google Cloud Storage (GCS)
- Load data into GCS.
- Make the data public for accessibility via URL.
- Connect GCS with transformation step.
-
Transform data in Fact & Dimension tables
- Use a Python script for further transformation, converting a single flat file table to fact and dimension tables.
- Make data model
-
Google BigQuery
- Load multiple tables into Google BigQuery for analytics.
- Created two data marts for Power BI & ETA prediction.
-
Data Visualization Mart (Power BI)
- Create an interactive dashboard in Power BI.
- Display informative information about the overall business.
-
Test API for Average Speed
- Develop a test API to generate random average speed values within a specified range.
- Utilize the API to create a variable with details such as pickup and dropoff location, latitude and longitude information, trip distance, and time taken by the trip.
- Predict trip duration using the Random Forest model.
Reference(Darshil Parmar YT Channel) : https://youtu.be/WpQECq5Hx9g?si=_tKOwjNUkdusUWLt
Data Model:
Power BI Dashboard :
-
Clone the repository:
git clone https://github.com/vish1108/uber-insight-engine.git