fork() and exec() - FAANG Classic
Quick Reference (TL;DR)
fork(): Creates a copy of the current process. Returns twice (once in parent with child PID, once in child with 0). Uses Copy-on-Write (COW) for efficiency.
exec(): Replaces current process's memory with new program. Loads new code, data, stack. Process ID stays same.
Why separate: fork() creates process, exec() loads program. Separation allows setup (redirection, environment) before loading program.
1. Clear Definition
fork()
The fork() system call creates a new process by duplicating the calling process. The new process (child) is an exact copy of the parent process, including memory, file descriptors, and execution state.
exec()
The exec() family of system calls replaces the current process's memory image with a new program. The process ID remains the same, but the code, data, and stack are completely replaced.
π‘ Key Insight: fork() creates a process; exec() loads a program. They're usually used together: fork() to create, exec() to run a different program.
2. Core Concepts
What fork() Copies vs Shares
Copied (separate copies):
- Memory: Code, data, stack, heap (initially shared via COW, then copied on write)
- File descriptors: Child gets copies (but point to same files)
- Process attributes: PID, PPID, signal handlers
- CPU state: Registers, program counter
Shared (same resource):
- Open files: File descriptors point to same file table entries
- File offsets: Reading in one affects the other (unless repositioned)
- Signal handlers: Initially same (but can be changed)
- Memory mappings: Initially shared (COW)
Example:
int x = 42;
pid_t pid = fork();
if (pid == 0) {
// Child process
x = 100; // Modifies child's copy
printf("Child: x = %d\n", x); // Prints 100
} else {
// Parent process
printf("Parent: x = %d\n", x); // Prints 42
}
Copy-on-Write (COW)
Mechanism: When fork() is called, the OS doesn't immediately copy all memory. Instead:
- Mark pages as COW: Both parent and child point to same physical pages, marked as read-only
- On write: When either process writes to a page:
- Page fault occurs (write to read-only page)
- OS allocates new physical page
- Copies original page content
- Updates page table
- Allows write to proceed
Why COW:
- Efficiency: Most processes don't modify all memory after fork()
- Speed: fork() is fast (just copy page tables, not memory)
- Memory: Saves memory (shared pages until modified)
Example:
// After fork():
// Parent and child share same physical pages
// Both point to same memory (marked read-only)
// Child writes:
x = 100; // Page fault!
// OS: Allocates new page, copies data, allows write
// Now parent and child have separate pages
Cost:
- fork() without COW: Copy entire address space (~milliseconds)
- fork() with COW: Copy page tables only (~microseconds)
- First write: Page fault + copy (~microseconds per page)
Why fork() is Cheap
Traditional thinking: fork() must be expensive (copying entire process)
Reality: fork() is relatively cheap because:
- Copy-on-Write: Only page tables are copied, not memory
- Lazy copying: Memory is copied only when modified
- Shared pages: Unmodified pages remain shared
- Optimized implementation: Modern OSes optimize fork() heavily
Typical costs:
- fork(): ~10-100 microseconds (just page table copy)
- First write: ~1-10 microseconds per page (COW page fault)
Comparison:
- Without COW: ~1-10 milliseconds (copy 1GB address space)
- With COW: ~10-100 microseconds (copy page tables only)
Why exec() Replaces Memory
What exec() does:
- Unmap old memory: Remove all existing memory mappings
- Load new program: Read executable file from disk
- Set up new address space:
- Code segment (text)
- Data segment (initialized data, BSS)
- Stack (initialized)
- Heap (empty)
- Reset state: Clear registers, set entry point
- Preserve: Process ID, parent, some file descriptors (unless closed)
Why replace (not append):
- Security: Old code shouldn't remain in memory
- Simplicity: Clean slate for new program
- Efficiency: No need to manage old memory
- Correctness: New program expects clean state
Example:
// Before exec():
// Process has: browser code, data, stack
exec("/bin/ls", "ls", "-l", NULL);
// After exec():
// Process has: ls code, data, stack
// Browser code is gone
// Process ID is same
Why fork() + exec() Instead of One Call?
π― Interview Focus: This is a classic question.
Historical Reason: Unix philosophy - "do one thing well"
fork(): Creates processexec(): Loads program- Separation allows flexibility
Practical Reasons:
-
Setup before execution:
pid_t pid = fork(); if (pid == 0) { // Child: Set up environment close(0); // Close stdin open("input.txt", O_RDONLY); // Redirect to file close(1); // Close stdout open("output.txt", O_WRONLY); // Redirect to file // Now load program exec("/bin/program", "program", NULL); } -
Process management:
pid_t pid = fork(); if (pid == 0) { exec("/bin/program", "program", NULL); } else { // Parent: Can wait, monitor, or continue waitpid(pid, &status, 0); } -
Flexibility: Can fork() without exec() (same program, different behavior)
-
Efficiency: Can optimize fork() and exec() separately
Alternative (Windows): CreateProcess() combines both, but less flexible
Unix way is better because:
- More flexible (can setup between fork and exec)
- Follows Unix philosophy
- Allows process management before exec
3. Use Cases
Common Patterns
-
Shell executing commands:
fork(); if (child) { exec(command); } else { wait(); } -
Process pools: fork() multiple workers, each runs same program
-
Daemon processes: fork() to background, exec() daemon program
-
Pipeline: fork() for each stage, exec() different programs
4. Advantages & Disadvantages
fork()
Advantages:
β
Fast: COW makes it efficient
β
Flexible: Can setup before exec()
β
Simple: One call creates process
Disadvantages:
β Confusing: Returns twice
β Memory: Initially shares memory (COW overhead on write)
β File descriptors: Shared (can cause issues)
exec()
Advantages:
β
Clean: Fresh address space
β
Secure: Old code removed
β
Simple: One call loads program
Disadvantages:
β Destructive: Replaces entire process
β No return: Can't go back to old program
5. Best Practices
- Always check fork() return: Handle errors
- Close unused file descriptors: Before exec()
- Use COW awareness: Minimize writes after fork()
- Handle zombie processes: Wait for children
6. Common Pitfalls
β οΈ Mistake: Not checking fork() return value
β οΈ Mistake: Forgetting that file descriptors are shared
β οΈ Mistake: Not understanding COW (thinking fork() copies everything)
β οΈ Mistake: Calling exec() without fork() (replaces current process)
7. Interview Tips
Common Questions:
- "What does fork() do?"
- "Explain Copy-on-Write."
- "Why is fork() cheap?"
- "Why fork() + exec() instead of one call?"
Key Points:
- fork() creates copy, exec() replaces
- COW makes fork() efficient
- fork() + exec() allows setup
- File descriptors are shared
8. Related Topics
- Process Management (Topic 5): Process creation
- Memory Management (Topic 11): COW, address space
- System Calls (Topic 4): How fork/exec work
9. Visual Aids
fork() with COW
Before fork():
Parent Process
ββββββββββββββββ
β Memory Pages β
ββββββββββββββββ
After fork() (COW):
Parent Process Child Process
ββββββββββββββββ ββββββββββββββββ
β Page Tables β β Page Tables β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β
ββββββββββββ¬βββββββββββ
β
ββββββββββΌβββββββββ
β Physical Pages β (shared, read-only)
β (COW marked) β
βββββββββββββββββββ
After child writes:
Parent Process Child Process
ββββββββββββββββ ββββββββββββββββ
β Page Tables β β Page Tables β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β
β β
ββββββββΌβββββββ ββββββββΌβββββββ
βPhysical Pageβ βPhysical Pageβ (separate)
β (original) β β (copy) β
βββββββββββββββ βββββββββββββββ
exec() Replacement
Before exec():
Process
ββββββββββββββββ
β Browser Code β
β Browser Data β
β Browser Stackβ
ββββββββββββββββ
exec("/bin/ls"):
Process
ββββββββββββββββ
β ls Code β β Replaced
β ls Data β β Replaced
β ls Stack β β Replaced
ββββββββββββββββ
(PID unchanged)
10. Quick Reference Summary
fork(): Creates process copy, returns twice, uses COW
exec(): Replaces process memory with new program
COW: Pages shared until written, then copied
Why separate: Allows setup between fork and exec
Cost: fork() ~10-100 ΞΌs (COW), exec() ~1-10 ms (load program)