GitHub - AyushCoder9/ContraLegal-AI: An AI-powered legal dashboard that uses NLP and Machine Learning to instantly identify, classify, and cluster high-risk clauses in PDF commercial contracts.

Team Null Set Ayush Kumar Singh | Isha Singh | Priyanka Gnana Karanam Newton School Of Technology

Project Overview

ContraLegal AI is an autonomous legal intelligence platform engineered to transform unstructured contract data into actionable risk distributions. By synthesizing Legal-BERT transformer architectures with Retrieval-Augmented Generation (RAG), the system provides granular multi-class risk scoring, automated redrafting, and spatial PDF highlights to eliminate manual bottlenecks in enterprise legal review.

Core Engineering Capabilities

Capability	Orchestration	Technical Specification
Trimodal Classification	Legal-BERT	High, Medium, and Low-intensity risk granularity
Dynamic Scoring	Hybrid Logic	Fusion of Transformer probabilities and deterministic keyword heuristics
Explainable AI	RAG + LangChain	Root-cause analysis of flagged clauses in professional nomenclature
Strategic Redrafting	Generative AI	Automated generation of balanced, legally-sound alternative phrasing
Conversational Querying	FAISS Vector Store	Real-time, document-grounded Q&A for complex legal inquiries
Spatial Annotation	PyMuPDF API	Physical coordinate-to-text mapping for in-situ PDF highlighting
Relational Data Export	openpyxl	Structured synthesis of risk distributions in Excel and CSV formats
Thematic Clustering	Scikit-Learn	Unsupervised K-Means grouping of obligation-specific clauses

Engineering Hierarchy & Contributions

The platform is built upon a high-concurrency architecture with a strict separation of concerns across research and deployment layers.

Deep Learning & Transformation | Ayush Kumar Singh

Fine-tuned the nlpaueb/legal-bert-base-uncased transformer using a weighted-trainer objective for imbalanced class distribution.
Engineered the 3-class quantitative heuristic for synthetic label generation spanning over 21,000 samples.
Developed the formal ablation study and multi-class ROC-AUC evaluation suite to validate transformer superiority over statistical baselines.

Generative AI & RAG Orchestration | Priyanka Gnana Karanam

Architected the retrieval-augmented generation pipeline utilizing FAISS for vectorized similarity search.
Engineered the LLM Provider Factory, enabling seamless interoperability between Google Gemini, Groq, and OpenAI.
Validated prompt-engineering strategies for deterministic clause synthesis and document-grounded conversational flows.

Spatial NLP & Deployment Systems | Isha Singh

Engineered the spatial highlighting engine using PyMuPDF to perform physical document marking via bounding-box coordinate tracking.
Implemented semantic document segmentation to optimize transformer context windows.
Architected the automated CI/CD infrastructure via GitHub Actions for continuous environment validation.

Quantitative Performance Matrix

The integration of transformer architectures resulted in a fundamental shift in both classification precision and recall intensity.

Metrical Indicator	Random Forest Baseline	Legal-BERT Transformer	Improvement (Δ)
Accuracy	94.44%	97.01%	+2.57%
Weighted F1	0.9441	0.9702	+2.76%
Macro F1	0.8901	0.9371	+5.28%
ROC-AUC (Macro)	0.9870	0.9948	+0.79%
High Risk Recall	73.96%	85.94%	+11.98%

Note: The 11.98% surge in High Risk Recall represents the most critical engineering milestone, ensuring safety in mission-critical legal review.

Global Repository Schema

ContraLegal-AI/
├── app.py                          # Streamlit Production Environment
├── .github/workflows/              # Automated CI/CD (python-app.yml)
├── src/
│   ├── model_trainer.py            # Phase-integrated Training Orchestrator
│   ├── data_pipeline/              # Semantic Extraction & Normalization
│   ├── inference/
│   │   ├── predictor.py            # Trimodal Detection Engine (BERT/RF)
│   │   ├── llm_engine.py           # RAG Orchestrator & Conversational Layer
│   │   └── keyword_engine.py       # Deterministic Rule Definitions
│   ├── model/
│   │   ├── bert_trainer.py         # Transformer Fine-tuning Suite
│   │   └── evaluator.py            # Quantitative Performance Metrics
│   └── utils/
│       └── pdf_annotator.py        # Spatial Coordinate Highlighting
├── models/
│   ├── legal_bert/                 # Fine-tuned Weights (nlpaueb)
│   └── ablation_study.png          # Baseline vs. Transformer Visualization
├── notebooks/
│   └── train_legal_bert_colab.py   # GPU-accelerated Training Script
└── report/
    ├── report.pdf                  # Formally Published IEEE Paper
    └── report.tex                  # Scientific Manuscript Source

Operational Deployment

Environment Initialization

git clone https://github.com/AyushCoder9/ContraLegal-AI.git
pip install -r requirements.txt

Application Execution

To initiate the production dashboard with the global pre-trained model:

streamlit run app.py

Analytical Training (Optional)

To execute the full analytical pipeline and regenerate performance artifacts:

python -m src.model_trainer

Scientific Publication

The technical methodology, algorithmic decisions, and empirical evaluations are documented in the associated IEEE conference-format manuscript located in the report/ directory.

Null Set | 2026
Engineered for Legal Precision.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Core Engineering Capabilities

Engineering Hierarchy & Contributions

Deep Learning & Transformation | Ayush Kumar Singh

Generative AI & RAG Orchestration | Priyanka Gnana Karanam

Spatial NLP & Deployment Systems | Isha Singh

Quantitative Performance Matrix

Global Repository Schema

Operational Deployment

Environment Initialization

Application Execution

Analytical Training (Optional)

Scientific Publication

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
data		data
models		models
notebooks		notebooks
report		report
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Core Engineering Capabilities

Engineering Hierarchy & Contributions

Deep Learning & Transformation | Ayush Kumar Singh

Generative AI & RAG Orchestration | Priyanka Gnana Karanam

Spatial NLP & Deployment Systems | Isha Singh

Quantitative Performance Matrix

Global Repository Schema

Operational Deployment

Environment Initialization

Application Execution

Analytical Training (Optional)

Scientific Publication

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages