Skip to content

Conversation

@ryanbreen
Copy link
Owner

Summary

Fixes intermittent TCP test failures in CI caused by exec syscall disabling interrupts during ext2 filesystem I/O.

Root Cause

The entire sys_execv_with_frame() was wrapped in without_interrupts(), causing:

  • ext2 filesystem I/O (path resolution, inode reads, file content) to run with interrupts disabled
  • Timer interrupts blocked during VirtIO disk operations
  • Scheduler starvation
  • TCP packet processing delays

Changes

  1. Restructure exec syscall - Only wrap the critical frame manipulation in without_interrupts(), not the ELF loading
  2. Increase TCP retry budget - MAX_LOOPBACK_RETRIES from 3 to 10 for CI resilience

Technical Details

Before:

pub fn sys_execv_with_frame(...) -> SyscallResult {
    x86_64::instructions::interrupts::without_interrupts(|| {
        // ALL code here - including ext2 I/O
    })
}

After:

pub fn sys_execv_with_frame(...) -> SyscallResult {
    // Preparation and ELF loading WITH interrupts enabled
    let elf_data = load_elf_from_ext2(...);  // ext2 I/O works properly
    
    // Only critical section disables interrupts
    x86_64::instructions::interrupts::without_interrupts(|| {
        // Frame manipulation only
    })
}

Test plan

  • All 217 boot stages pass locally
  • TCP tests (stages 85-127) pass
  • ext2 exec tests (stages 179-184) pass

🤖 Generated with Claude Code

Root cause: The entire sys_execv_with_frame() was wrapped in
without_interrupts(), causing ext2 filesystem I/O to run with
interrupts disabled. This blocked timer interrupts, starved the
scheduler, and caused TCP packet processing delays in CI.

Changes:
1. Restructure sys_execv_with_frame() to only disable interrupts
   for the critical frame manipulation section, not ELF loading
2. Increase TCP test MAX_LOOPBACK_RETRIES from 3 to 10 for CI
   environments where system load causes packet processing delays

The ext2 lookup (path resolution, inode reads, file content reads)
now runs with interrupts enabled, allowing proper VirtIO operation
and timer interrupt handling.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@ryanbreen ryanbreen merged commit 1cdf0fd into main Jan 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants