A complete end-to-end data engineering pipeline that extracts, transforms, and loads stock market data for technical analysis and signal detection. Built using Python, pandas, yfinance, and PostgreSQL/SQLite.
This project demonstrates a full ETL pipeline with:
- Data extraction from Yahoo Finance using
yfinance - Data transformation with key financial indicators
- Loading into both SQLite and PostgreSQL databases
- SQL queries to analyze returns, volatility, and trends
- Optional: Visualization or signal-based backtesting extensions
src/
- extract.py # Pulls data from yfinance
- transform.py # Cleans and enriches data with metrics
- load.py # Loads into SQLite and PostgreSQL
- ueries.sql # SQL queries for analysis
- main.py # Runs full ETL pipeline
The following columns are added during the transformation step, grouped per ticker:
intraday_change– Close price minus open pricedaily_close_change– Close[today] - Close[yesterday]daily_percent_return– Percentage change from previous closemoving_avg_5_days– 5-day moving averagemoving_avg_20_days– 20-day moving averagevolatility_10_days– Rolling 10-day standard deviation of returnscumulative_returns– Cumulative price changecumulative_percent_returns– Cumulative return over time
Stored in queries.sql:
- Highest-returning stocks by cumulative return
- Most volatile tickers in the last 10 days
- Stocks trading above their 20-day moving average
- MA crossover signals (5-day > 20-day)
- Daily return outliers
SELECT ticker, ROUND(AVG(volatility_10_days), 2) AS avg_volatility
FROM stocks
WHERE date >= CURRENT_DATE - INTERVAL '10 days'
GROUP BY ticker
ORDER BY avg_volatility DESC;This query returns the average 10-day volatility for each stock, rounded to 2 decimal places, using only the most recent 10 days of data.
Language: Python 3.11 Libraries: pandas, yfinance, sqlalchemy Databases: SQLite (local), PostgreSQL (production) Optional Tools: cron
-
Clone the repository git clone https://github.com/colander21/Stock_Market_Project.git cd Stock_Market_Project
-
Set up your Python environment conda create -n stockmarketenv python=3.11 conda activate stockmarketenv pip install -r requirements.txt
-
Create Postgres database store credentials in .env file
- DB_NAME
- DB_PASSWORD
- DB_USER
- DB_HOST
- DB_PORT
- Run the pipeline python src/main.py
- Add signal-based backtesting (e.g. MA crossover strategies)
- Build a Streamlit dashboard to visualize trends and performance
- Deploy to cloud infrastructure (PostgreSQL on AWS, BigQuery, etc.)
- How to build a clean ETL pipeline with modular Python code
- Feature engineering for financial time-series data
- Managing data workflows with real-world databases
- Writing and organizing reusable SQL for analytics