Skip to content

Issue 7: Make data loaders from the data using a custom Image dataset classs #124

@kyanmahajan

Description

@kyanmahajan

Creating DataLoaders Before Model Training

Before applying any deep learning model using frameworks like PyTorch or TensorFlow, it is important to understand how data is supplied to the model during training and evaluation. This process is handled using DataLoaders.

What is a DataLoader?

A DataLoader acts as a bridge between raw data stored on disk and the training loop. Instead of loading the entire dataset into memory, it fetches data in small batches during runtime.

A DataLoader typically handles:

  • Reading data samples from disk
  • Converting samples into tensors
  • Batching multiple samples together
  • Shuffling data during training
  • Loading data efficiently (often in parallel)

Core Idea Behind Implementation

The data pipeline is usually split into two parts:

1. Dataset

  • Defines how a single data sample is accessed
  • Knows where the data is stored
  • Handles loading and basic preprocessing (e.g., reading an image, resizing, normalization)

2. DataLoader

  • Wraps the Dataset
  • Manages batching, shuffling, and parallel loading
  • Feeds batches of data to the model during training

This separation keeps data handling modular and independent of the model architecture.

Why This Step Is Important

Proper use of DataLoaders:

  • Ensures consistent preprocessing
  • Prevents data leakage between splits
  • Improves training performance
  • Makes experiments easier to reproduce and scale

Understanding and designing the data loading pipeline is a fundamental step that should be completed before experimenting with model architectures or training strategies.


Task Description

In this issue, you are expected to:

  • Design a custom image Dataset class implementing methods like __getitem__, __len__, and other relevant functions
  • Wrap the Dataset inside a DataLoader (PyTorch preferred, TensorFlow is also acceptable)
  • Write a small loop demonstrating how the DataLoader works during iteration

Contribution Details

Implementation Notes

  • This task must be completed inside the participants folder, under your enrolment number directory
  • You may:
    • Implement it in a separate notebook, or
    • Add it to a previously used notebook

If Working on Kaggle

  • Make the required changes directly in your existing Kaggle notebook

  • Download the updated notebook after making changes

  • Upload the updated version to the repository

  • Follow the PR template as specified in previous issues

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions