async backup job by hemant-endee · Pull Request #34 · endee-io/endee

hemant-endee · 2026-02-08T10:33:50Z

No description provided.

Copilot

Pull request overview

This PR implements asynchronous backup functionality for the index management system, changing from a synchronous blocking backup process to a background job-based approach. The changes also remove compression from backup archives (tar.gz → tar) for improved performance and simplicity.

Changes:

Converted synchronous createBackup to asynchronous createBackupAsync with job tracking and persistence
Removed gzip compression from backup archives, changing from .tar.gz to .tar format
Added new /api/v1/backups/jobs endpoint to query backup job status
Implemented write operation blocking during active backups to ensure consistency

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.

File	Description
src/utils/archive_utils.hpp	Renamed `createTarGz`/`extractTarGz` to `createTar`/`extractTar`, removed gzip compression
src/main.cpp	Updated backup API to return 202 with job_id, changed file extensions to .tar, added jobs endpoint, improved error handling
src/core/ndd.hpp	Added BackupJob struct and status tracking, implemented async backup execution, added checkBackupInProgress to all write operations, job persistence with recovery on restart

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T05:21:42Z

src/main.cpp

-                        return json_error(500, "Failed to read backup file");
-                    }
+
+


Remove unnecessary blank line with trailing whitespace. This appears to be an unintentional formatting artifact that should be cleaned up.

Suggested change

Copilot · 2026-02-14T05:21:43Z

src/core/ndd.hpp

+        auto completed_time_t = std::chrono::system_clock::to_time_t(completed_at);
+        j["completed_at"] = std::to_string(completed_time_t);


The completed_at field is always serialized to JSON, even for IN_PROGRESS jobs where it hasn't been set yet. This could result in serializing an uninitialized time_point (typically epoch time 0). Consider only including completed_at in the JSON when the job status is COMPLETED or FAILED, similar to how it's handled in the API endpoint at main.cpp:625-628.

Suggested change

auto completed_time_t = std::chrono::system_clock::to_time_t(completed_at);

j["completed_at"] = std::to_string(completed_time_t);

if (status == BackupJobStatus::COMPLETED ||

status == BackupJobStatus::FAILED) {

auto completed_time_t = std::chrono::system_clock::to_time_t(completed_at);

j["completed_at"] = std::to_string(completed_time_t);

}

Copilot · 2026-02-14T05:21:43Z

src/core/ndd.hpp

+        {
+            std::unique_lock<std::shared_mutex> lock(jobs_mutex_);
+            backup_jobs_[job_id] = job;
+            // Track active backup for this index
+            active_backup_jobs_[index_id] = job_id;
+            persistJobs();
+        }


Missing check to prevent multiple concurrent backup jobs for the same index. While active_backup_jobs_ tracks ongoing backups per index, createBackupAsync doesn't check if a backup is already in progress for the given index_id. This could allow multiple backup jobs to be initiated for the same index, potentially causing resource contention and inconsistent results. Consider checking active_backup_jobs_.count(index_id) before creating the new job.

Copilot · 2026-02-14T05:21:43Z

src/core/ndd.hpp

+            std::filesystem::rename(backup_tar_temp, backup_tar_final);
+


The filesystem::rename operation at line 570 can throw an exception if the rename fails (e.g., cross-filesystem move on some systems, insufficient permissions). If this happens after the backup tar has been successfully created and the operation_mutex released, the backup job will be marked as FAILED and the temp tar will be removed, but writes have already been re-enabled. This is unlikely to cause data corruption but could be confusing. Consider wrapping the rename in a try-catch to provide a more specific error message, or using filesystem::copy followed by filesystem::remove as a fallback.

Suggested change

std::filesystem::rename(backup_tar_temp, backup_tar_final);

try {

std::filesystem::rename(backup_tar_temp, backup_tar_final);

} catch (const std::filesystem::filesystem_error& fe) {

LOG_WARN("Failed to rename backup tar from temp to final location: " << fe.what()

<< " (from: " << backup_tar_temp << " to: " << backup_tar_final

<< "). Attempting copy/remove fallback.");

try {

std::filesystem::copy_file(

backup_tar_temp,

backup_tar_final,

std::filesystem::copy_options::overwrite_existing);

std::filesystem::remove(backup_tar_temp);

LOG_INFO("Backup tar moved to final location via copy/remove fallback: " << backup_tar_final);

} catch (const std::filesystem::filesystem_error& fe_fallback) {

LOG_ERROR("Fallback copy/remove for backup tar failed: " << fe_fallback.what()

<< " (from: " << backup_tar_temp << " to: " << backup_tar_final << ")");

throw std::runtime_error(

std::string("Failed to move backup tar to final location. rename error: ") +

fe.what() + ", fallback copy/remove error: " + fe_fallback.what());

}

}

Copilot · 2026-02-14T05:21:43Z

src/core/ndd.hpp

+            std::ifstream ifs(jobs_file);
+            nlohmann::json j_array;
+            ifs >> j_array;


The JSON deserialization at line 668 could throw an exception if the file contains invalid JSON, but the outer try-catch will handle it and log the error. However, if the file is partially written or corrupted, this could silently skip loading valid jobs. Consider checking ifs.good() after reading and providing a more specific error message to distinguish between read errors and parse errors.

Copilot · 2026-02-14T05:21:44Z

src/core/ndd.hpp

+        // 2. Check if backup file already exists
+        std::string backup_tar = data_dir_ + "/backups/" + backup_name + ".tar";
+        if(std::filesystem::exists(backup_tar)) {
+            return {false, "Backup already exists: " + backup_name};
+        }


There's no protection against concurrent backup jobs creating backups with the same backup_name. Two concurrent requests with the same backup_name could both pass the existence check at line 1065 and create duplicate jobs. While executeBackupJob rechecks at line 470, both jobs would initially be created and tracked. Consider checking if a job with the same backup_name is already in progress by iterating through backup_jobs_ while holding the jobs_mutex_.

Copilot · 2026-02-14T05:21:44Z

src/core/ndd.hpp

+                throw std::runtime_error("Insufficient disk space: need " +
+                    std::to_string(index_size * 2 / (1024 * 1024)) + " MB");


The error message for insufficient disk space is potentially misleading. It states "need X MB" based on index_size * 2, but it doesn't show how much is actually available. Consider including both required and available space in the error message for better debugging, e.g., "Insufficient disk space: need X MB, available Y MB".

Suggested change

throw std::runtime_error("Insufficient disk space: need " +

std::to_string(index_size * 2 / (1024 * 1024)) + " MB");

size_t required_bytes = index_size * 2;

size_t required_mb = required_bytes / (1024 * 1024);

size_t available_mb = space_info.available / (1024 * 1024);

throw std::runtime_error("Insufficient disk space: need " +

std::to_string(required_mb) + " MB, available " +

std::to_string(available_mb) + " MB");

Copilot · 2026-02-14T05:21:44Z

src/core/ndd.hpp

+    std::pair<bool, std::string> createBackupAsync(const std::string& index_id,
+                                                    const std::string& backup_name) {
+        // 1. Validate backup name
+        std::pair<bool, std::string> result = validateBackupName(backup_name);
+        if(!result.first) {
+            return result;
+        }
+
+        // 2. Check if backup file already exists
+        std::string backup_tar = data_dir_ + "/backups/" + backup_name + ".tar";
+        if(std::filesystem::exists(backup_tar)) {
+            return {false, "Backup already exists: " + backup_name};
+        }
+
+        // 3. Generate unique job_id (timestamp + random 6 chars)
+        auto now = std::chrono::system_clock::now();
+        auto timestamp = std::chrono::duration_cast<std::chrono::milliseconds>(
+            now.time_since_epoch()).count();
+        std::string job_id = std::to_string(timestamp) + "_" +
+            random_generator::rand_alphanum(6);
+
+        // 4. Create BackupJob record with IN_PROGRESS status
+        BackupJob job;
+        job.job_id = job_id;
+        job.index_id = index_id;
+        job.backup_name = backup_name;
+        job.status = BackupJobStatus::IN_PROGRESS;
+        job.started_at = now;
+        job.completed_at = std::chrono::system_clock::time_point();
+
+        {
+            std::unique_lock<std::shared_mutex> lock(jobs_mutex_);
+            backup_jobs_[job_id] = job;
+            // Track active backup for this index
+            active_backup_jobs_[index_id] = job_id;
+            persistJobs();
+        }


The createBackupAsync method doesn't validate that the index_id exists before creating and persisting the backup job. The validation only happens later in executeBackupJob when getIndexEntry is called. This means invalid backup jobs could be created, persisted, and only fail during execution. Consider checking if the index exists (e.g., by calling metadata_manager_->getMetadata(index_id)) before creating the job, so the API can return a synchronous error for non-existent indexes.

Copilot · 2026-02-14T05:21:44Z

src/core/ndd.hpp

+enum class BackupJobStatus : uint8_t {
+    IN_PROGRESS = 1,
+    COMPLETED = 2,
+    FAILED = 3
+};


The BackupJobStatus enum starts at 1 instead of the more conventional 0. While this works correctly, if the status value defaults to 0 in the fromJson method (line 179), it would create an invalid enum value that doesn't match any of IN_PROGRESS, COMPLETED, or FAILED. Consider either starting the enum at 0 or using a more defensive default value in fromJson that ensures the status is always valid.

Copilot · 2026-02-14T05:21:45Z

src/main.cpp

+                        } else if(job.status == BackupJobStatus::COMPLETED) {
+                            job_obj["status"] = "completed";
+                        } else if(job.status == BackupJobStatus::FAILED) {
+                            job_obj["status"] = "failed";


The status enum to string conversion lacks a default case or error handling. If job.status contains an invalid value (e.g., due to data corruption or programming error), the status field will simply be omitted from the response without any indication of the problem. Consider adding an else clause to handle unexpected status values, either by logging a warning and setting a default value like "unknown", or by throwing an exception.

Suggested change

job_obj["status"] = "failed";

job_obj["status"] = "failed";

} else {

CROW_LOG_WARNING << "Unexpected BackupJobStatus value for job_id="

<< job.job_id;

job_obj["status"] = "unknown";

hemant-endee added 2 commits February 8, 2026 14:21

async backup job

cb3487c

streaming backup downloads

b9ed5bf

hemant-endee requested a review from vindwid February 14, 2026 04:43

vindwid requested a review from Copilot February 14, 2026 05:14

Copilot started reviewing on behalf of vindwid February 14, 2026 05:14 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

backup-system docs

5dd1562

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async backup job#34

async backup job#34
hemant-endee wants to merge 3 commits intomasterfrom
hemant/async_backup

hemant-endee commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Copilot AI Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		auto completed_time_t = std::chrono::system_clock::to_time_t(completed_at);
		j["completed_at"] = std::to_string(completed_time_t);

-            std::filesystem::rename(backup_tar_temp, backup_tar_final);
+            try {
+                std::filesystem::rename(backup_tar_temp, backup_tar_final);
+            } catch (const std::filesystem::filesystem_error& fe) {
+                LOG_WARN("Failed to rename backup tar from temp to final location: " << fe.what()
+                         << " (from: " << backup_tar_temp << " to: " << backup_tar_final
+                         << "). Attempting copy/remove fallback.");
+                try {
+                    std::filesystem::copy_file(
+                            backup_tar_temp,
+                            backup_tar_final,
+                            std::filesystem::copy_options::overwrite_existing);
+                    std::filesystem::remove(backup_tar_temp);
+                    LOG_INFO("Backup tar moved to final location via copy/remove fallback: " << backup_tar_final);
+                } catch (const std::filesystem::filesystem_error& fe_fallback) {
+                    LOG_ERROR("Fallback copy/remove for backup tar failed: " << fe_fallback.what()
+                              << " (from: " << backup_tar_temp << " to: " << backup_tar_final << ")");
+                    throw std::runtime_error(
+                        std::string("Failed to move backup tar to final location. rename error: ") +
+                        fe.what() + ", fallback copy/remove error: " + fe_fallback.what());
+                }
+            }

		throw std::runtime_error("Insufficient disk space: need " +
		std::to_string(index_size * 2 / (1024 * 1024)) + " MB");

-                throw std::runtime_error("Insufficient disk space: need " +
-                    std::to_string(index_size * 2 / (1024 * 1024)) + " MB");
+                size_t required_bytes = index_size * 2;
+                size_t required_mb = required_bytes / (1024 * 1024);
+                size_t available_mb = space_info.available / (1024 * 1024);
+                throw std::runtime_error("Insufficient disk space: need " +
+                    std::to_string(required_mb) + " MB, available " +
+                    std::to_string(available_mb) + " MB");

-                            job_obj["status"] = "failed";
+                            job_obj["status"] = "failed";
+                        } else {
+                            CROW_LOG_WARNING << "Unexpected BackupJobStatus value for job_id="
+                                             << job.job_id;
+                            job_obj["status"] = "unknown";

Conversation

hemant-endee commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant