Skip to content

async backup job#34

Open
hemant-endee wants to merge 3 commits intomasterfrom
hemant/async_backup
Open

async backup job#34
hemant-endee wants to merge 3 commits intomasterfrom
hemant/async_backup

Conversation

@hemant-endee
Copy link
Collaborator

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements asynchronous backup functionality for the index management system, changing from a synchronous blocking backup process to a background job-based approach. The changes also remove compression from backup archives (tar.gz → tar) for improved performance and simplicity.

Changes:

  • Converted synchronous createBackup to asynchronous createBackupAsync with job tracking and persistence
  • Removed gzip compression from backup archives, changing from .tar.gz to .tar format
  • Added new /api/v1/backups/jobs endpoint to query backup job status
  • Implemented write operation blocking during active backups to ensure consistency

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.

File Description
src/utils/archive_utils.hpp Renamed createTarGz/extractTarGz to createTar/extractTar, removed gzip compression
src/main.cpp Updated backup API to return 202 with job_id, changed file extensions to .tar, added jobs endpoint, improved error handling
src/core/ndd.hpp Added BackupJob struct and status tracking, implemented async backup execution, added checkBackupInProgress to all write operations, job persistence with recovery on restart

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return json_error(500, "Failed to read backup file");
}


Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary blank line with trailing whitespace. This appears to be an unintentional formatting artifact that should be cleaned up.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +169
auto completed_time_t = std::chrono::system_clock::to_time_t(completed_at);
j["completed_at"] = std::to_string(completed_time_t);
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The completed_at field is always serialized to JSON, even for IN_PROGRESS jobs where it hasn't been set yet. This could result in serializing an uninitialized time_point (typically epoch time 0). Consider only including completed_at in the JSON when the job status is COMPLETED or FAILED, similar to how it's handled in the API endpoint at main.cpp:625-628.

Suggested change
auto completed_time_t = std::chrono::system_clock::to_time_t(completed_at);
j["completed_at"] = std::to_string(completed_time_t);
if (status == BackupJobStatus::COMPLETED ||
status == BackupJobStatus::FAILED) {
auto completed_time_t = std::chrono::system_clock::to_time_t(completed_at);
j["completed_at"] = std::to_string(completed_time_t);
}

Copilot uses AI. Check for mistakes.
Comment on lines +1085 to +1091
{
std::unique_lock<std::shared_mutex> lock(jobs_mutex_);
backup_jobs_[job_id] = job;
// Track active backup for this index
active_backup_jobs_[index_id] = job_id;
persistJobs();
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing check to prevent multiple concurrent backup jobs for the same index. While active_backup_jobs_ tracks ongoing backups per index, createBackupAsync doesn't check if a backup is already in progress for the given index_id. This could allow multiple backup jobs to be initiated for the same index, potentially causing resource contention and inconsistent results. Consider checking active_backup_jobs_.count(index_id) before creating the new job.

Copilot uses AI. Check for mistakes.
Comment on lines +570 to +571
std::filesystem::rename(backup_tar_temp, backup_tar_final);

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filesystem::rename operation at line 570 can throw an exception if the rename fails (e.g., cross-filesystem move on some systems, insufficient permissions). If this happens after the backup tar has been successfully created and the operation_mutex released, the backup job will be marked as FAILED and the temp tar will be removed, but writes have already been re-enabled. This is unlikely to cause data corruption but could be confusing. Consider wrapping the rename in a try-catch to provide a more specific error message, or using filesystem::copy followed by filesystem::remove as a fallback.

Suggested change
std::filesystem::rename(backup_tar_temp, backup_tar_final);
try {
std::filesystem::rename(backup_tar_temp, backup_tar_final);
} catch (const std::filesystem::filesystem_error& fe) {
LOG_WARN("Failed to rename backup tar from temp to final location: " << fe.what()
<< " (from: " << backup_tar_temp << " to: " << backup_tar_final
<< "). Attempting copy/remove fallback.");
try {
std::filesystem::copy_file(
backup_tar_temp,
backup_tar_final,
std::filesystem::copy_options::overwrite_existing);
std::filesystem::remove(backup_tar_temp);
LOG_INFO("Backup tar moved to final location via copy/remove fallback: " << backup_tar_final);
} catch (const std::filesystem::filesystem_error& fe_fallback) {
LOG_ERROR("Fallback copy/remove for backup tar failed: " << fe_fallback.what()
<< " (from: " << backup_tar_temp << " to: " << backup_tar_final << ")");
throw std::runtime_error(
std::string("Failed to move backup tar to final location. rename error: ") +
fe.what() + ", fallback copy/remove error: " + fe_fallback.what());
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +666 to +668
std::ifstream ifs(jobs_file);
nlohmann::json j_array;
ifs >> j_array;
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON deserialization at line 668 could throw an exception if the file contains invalid JSON, but the outer try-catch will handle it and log the error. However, if the file is partially written or corrupted, this could silently skip loading valid jobs. Consider checking ifs.good() after reading and providing a more specific error message to distinguish between read errors and parse errors.

Copilot uses AI. Check for mistakes.
Comment on lines +1063 to +1067
// 2. Check if backup file already exists
std::string backup_tar = data_dir_ + "/backups/" + backup_name + ".tar";
if(std::filesystem::exists(backup_tar)) {
return {false, "Backup already exists: " + backup_name};
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no protection against concurrent backup jobs creating backups with the same backup_name. Two concurrent requests with the same backup_name could both pass the existence check at line 1065 and create duplicate jobs. While executeBackupJob rechecks at line 470, both jobs would initially be created and tracked. Consider checking if a job with the same backup_name is already in progress by iterating through backup_jobs_ while holding the jobs_mutex_.

Copilot uses AI. Check for mistakes.
Comment on lines +484 to +485
throw std::runtime_error("Insufficient disk space: need " +
std::to_string(index_size * 2 / (1024 * 1024)) + " MB");
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message for insufficient disk space is potentially misleading. It states "need X MB" based on index_size * 2, but it doesn't show how much is actually available. Consider including both required and available space in the error message for better debugging, e.g., "Insufficient disk space: need X MB, available Y MB".

Suggested change
throw std::runtime_error("Insufficient disk space: need " +
std::to_string(index_size * 2 / (1024 * 1024)) + " MB");
size_t required_bytes = index_size * 2;
size_t required_mb = required_bytes / (1024 * 1024);
size_t available_mb = space_info.available / (1024 * 1024);
throw std::runtime_error("Insufficient disk space: need " +
std::to_string(required_mb) + " MB, available " +
std::to_string(available_mb) + " MB");

Copilot uses AI. Check for mistakes.
Comment on lines +1055 to +1091
std::pair<bool, std::string> createBackupAsync(const std::string& index_id,
const std::string& backup_name) {
// 1. Validate backup name
std::pair<bool, std::string> result = validateBackupName(backup_name);
if(!result.first) {
return result;
}

// 2. Check if backup file already exists
std::string backup_tar = data_dir_ + "/backups/" + backup_name + ".tar";
if(std::filesystem::exists(backup_tar)) {
return {false, "Backup already exists: " + backup_name};
}

// 3. Generate unique job_id (timestamp + random 6 chars)
auto now = std::chrono::system_clock::now();
auto timestamp = std::chrono::duration_cast<std::chrono::milliseconds>(
now.time_since_epoch()).count();
std::string job_id = std::to_string(timestamp) + "_" +
random_generator::rand_alphanum(6);

// 4. Create BackupJob record with IN_PROGRESS status
BackupJob job;
job.job_id = job_id;
job.index_id = index_id;
job.backup_name = backup_name;
job.status = BackupJobStatus::IN_PROGRESS;
job.started_at = now;
job.completed_at = std::chrono::system_clock::time_point();

{
std::unique_lock<std::shared_mutex> lock(jobs_mutex_);
backup_jobs_[job_id] = job;
// Track active backup for this index
active_backup_jobs_[index_id] = job_id;
persistJobs();
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The createBackupAsync method doesn't validate that the index_id exists before creating and persisting the backup job. The validation only happens later in executeBackupJob when getIndexEntry is called. This means invalid backup jobs could be created, persisted, and only fail during execution. Consider checking if the index exists (e.g., by calling metadata_manager_->getMetadata(index_id)) before creating the job, so the API can return a synchronous error for non-existent indexes.

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +146
enum class BackupJobStatus : uint8_t {
IN_PROGRESS = 1,
COMPLETED = 2,
FAILED = 3
};
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BackupJobStatus enum starts at 1 instead of the more conventional 0. While this works correctly, if the status value defaults to 0 in the fromJson method (line 179), it would create an invalid enum value that doesn't match any of IN_PROGRESS, COMPLETED, or FAILED. Consider either starting the enum at 0 or using a more defensive default value in fromJson that ensures the status is always valid.

Copilot uses AI. Check for mistakes.
} else if(job.status == BackupJobStatus::COMPLETED) {
job_obj["status"] = "completed";
} else if(job.status == BackupJobStatus::FAILED) {
job_obj["status"] = "failed";
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status enum to string conversion lacks a default case or error handling. If job.status contains an invalid value (e.g., due to data corruption or programming error), the status field will simply be omitted from the response without any indication of the problem. Consider adding an else clause to handle unexpected status values, either by logging a warning and setting a default value like "unknown", or by throwing an exception.

Suggested change
job_obj["status"] = "failed";
job_obj["status"] = "failed";
} else {
CROW_LOG_WARNING << "Unexpected BackupJobStatus value for job_id="
<< job.job_id;
job_obj["status"] = "unknown";

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant