Fix Critical Directory Traversal Vulnerability#87
Conversation
Signed-off-by: fortishield <161459699+FortiShield@users.noreply.github.com>
Reviewer's Guide by SourceryThis PR implements security improvements to prevent directory traversal vulnerabilities in the document upload endpoint. The changes focus on proper path sanitization, validation of upload locations, and improved error handling for temporary files. Sequence diagram for document upload processsequenceDiagram
actor User
participant API as Document Upload API
participant FS as File System
participant Service as Knowledge Space Service
User->>API: POST /knowledge/{space_name}/document/upload
API->>API: Sanitize inputs (space_name, doc_file.filename)
API->>FS: Verify and create upload directory
API->>FS: Create temp file
API->>FS: Write file content
API->>FS: Move temp file to target path
API->>Service: Get knowledge space
alt Space does not exist
API->>Service: Create default knowledge space
end
API->>Service: Create knowledge document
API-->>User: Return success or error
Class diagram for document upload changesclassDiagram
class DocumentUpload {
+document_upload(space_name: str, doc_name: str, doc_type: str, doc_file: UploadFile)
+sanitizeInputs()
+verifyUploadDirectory()
+createTempFile()
+moveFile()
+handleErrors()
}
class KnowledgeDocumentRequest {
+doc_name: str
+doc_type: str
+content: str
}
DocumentUpload --> KnowledgeDocumentRequest
note for DocumentUpload "Handles document upload with security checks"
note for KnowledgeDocumentRequest "Represents a request to create a knowledge document"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
WalkthroughThe changes in the pull request primarily focus on enhancing the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Hey @FortiShield - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding security event logging for failed upload attempts and path validation failures to help track potential attacks
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟡 Security: 1 issue found
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| os.makedirs(upload_dir) | ||
|
|
||
| # Create temp file | ||
| tmp_fd, tmp_path = tempfile.mkstemp(dir=upload_dir) |
There was a problem hiding this comment.
🚨 issue (security): Set secure file permissions when creating temporary file
Use mode=0o600 with mkstemp() to ensure temporary files are only readable by the creating user
|
|
||
| # Create absolute paths and verify they are within allowed directory | ||
| upload_dir = os.path.abspath(os.path.join(KNOWLEDGE_UPLOAD_ROOT_PATH, safe_space_name)) | ||
| target_path = os.path.abspath(os.path.join(upload_dir, safe_filename)) |
There was a problem hiding this comment.
suggestion (performance): Implement file size limits and use chunked reading for large files
Reading entire file into memory could cause issues with large files. Consider using a chunked approach with a maximum file size limit
| target_path = os.path.abspath(os.path.join(upload_dir, safe_filename)) | |
| MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB | |
| file_size = 0 | |
| with os.fdopen(tmp_fd, "wb") as tmp: | |
| while chunk := await doc_file.read(8192): | |
| file_size += len(chunk) | |
| if file_size > MAX_FILE_SIZE: | |
| os.unlink(tmp_path) | |
| raise ValueError("File too large") | |
| tmp.write(chunk) |
| @@ -332,54 +333,70 @@ def document_delete(space_name: str, query_request: DocumentQueryRequest): | |||
|
|
|||
| @router.post("/knowledge/{space_name}/document/upload") | |||
There was a problem hiding this comment.
issue (complexity): Consider refactoring the file upload logic into smaller, focused functions with better path handling
The security improvements are valuable but can be implemented more cleanly. Consider:
- Extract path validation into a utility function
- Use pathlib.Path for cleaner path handling
- Use a context manager for temp file handling
Example refactor:
def validate_upload_path(base_path: Path, space_name: str, filename: str) -> Path:
"""Validate and return safe upload path"""
safe_space = Path(space_name).name # Sanitize space name
safe_file = Path(filename).name # Sanitize filename
upload_dir = base_path / safe_space
target_path = upload_dir / safe_file
if not target_path.is_relative_to(base_path):
raise HTTPException(status_code=400, detail="Invalid path detected")
return target_path
@router.post("/knowledge/{space_name}/document/upload")
async def document_upload(
space_name: str,
doc_name: str = Form(...),
doc_type: str = Form(...),
doc_file: UploadFile = File(...),
):
if not doc_file:
return Result.failed(code="E000X", msg="doc_file is None")
try:
base_path = Path(KNOWLEDGE_UPLOAD_ROOT_PATH)
target_path = validate_upload_path(base_path, space_name, doc_file.filename)
target_path.parent.mkdir(parents=True, exist_ok=True)
with tempfile.NamedTemporaryFile(dir=target_path.parent, delete=False) as tmp:
tmp.write(await doc_file.read())
tmp_path = Path(tmp.name)
tmp_path.rename(target_path) # atomic operation
request = KnowledgeDocumentRequest(
doc_name=doc_name,
doc_type=doc_type,
content=str(target_path)
)
# Rest of the space creation logic...This maintains all security checks while reducing nesting and improving maintainability through:
- Centralized path validation
- Cleaner path handling with pathlib
- Automatic temp file cleanup
- Reduced nesting depth
| if len(space_res) == 0: | ||
| # create default space | ||
| if "default" != safe_space_name: | ||
| raise Exception(f"you have not create your knowledge space.") |
There was a problem hiding this comment.
issue (code-quality): Raise a specific error instead of the general Exception or BaseException (raise-specific-error)
Explanation
If a piece of code raises a specific exception type rather than the generic [`BaseException`](https://docs.python.org/3/library/exceptions.html#BaseException) or [`Exception`](https://docs.python.org/3/library/exceptions.html#Exception), the calling code can:- get more information about what type of error it is
- define specific exception handling for it
This way, callers of the code can handle the error appropriately.
How can you solve this?
- Use one of the built-in exceptions of the standard library.
- Define your own error class that subclasses
Exception.
So instead of having code raising Exception or BaseException like
if incorrect_input(value):
raise Exception("The input is incorrect")you can have code raising a specific error like
if incorrect_input(value):
raise ValueError("The input is incorrect")or
class IncorrectInputError(Exception):
pass
if incorrect_input(value):
raise IncorrectInputError("The input is incorrect")| upload_dir = os.path.abspath(os.path.join(KNOWLEDGE_UPLOAD_ROOT_PATH, safe_space_name)) | ||
| target_path = os.path.abspath(os.path.join(upload_dir, safe_filename)) | ||
|
|
||
| if not os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH) in target_path: |
There was a problem hiding this comment.
issue (code-quality): We've found these issues:
- Simplify logical expression using De Morgan identities (
de-morgan) - Ensure constant in comparison is on the right (
flip-comparison) - Replace f-string with no interpolated values with string [×2] (
remove-redundant-fstring)
…ssion Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: fortishield <161459699+FortiShield@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
gptdb/app/knowledge/api.py (4)
6-6: Remove unused import 'pathlib.Path'The
pathlib.Pathmodule is imported but not used in the code. Removing unused imports helps keep the code clean and maintainable.Apply this diff to remove the unused import:
-import logging -import os -import shutil -import tempfile -from typing import List -from pathlib import Path +import logging +import os +import shutil +import tempfile +from typing import List🧰 Tools
🪛 Ruff
6-6:
pathlib.Pathimported but unusedRemove unused import:
pathlib.Path(F401)
377-377: Rewrite condition for better readabilityThe condition
'if "default" != safe_space_name:'can be rewritten as'if safe_space_name != "default":'to improve readability.Apply this diff to adjust the condition:
-if "default" != safe_space_name: +if safe_space_name != "default":🧰 Tools
🪛 Ruff
377-377: Yoda condition detected
Rewrite as
safe_space_name != "default"(SIM300)
378-378: Correct the grammatical error and remove unnecessary f-stringThe exception message contains a grammatical error, and the f-string is unnecessary since no variables are being interpolated. The corrected message should read:
"You have not created your knowledge space."Apply this diff to correct the exception message:
-raise Exception(f"you have not create your knowledge space.") +raise Exception("You have not created your knowledge space.")🧰 Tools
🪛 Ruff
378-378: f-string without any placeholders
Remove extraneous
fprefix(F541)
397-397: Remove unnecessary f-string prefixThe f-string in the return statement is unnecessary since no variables are being interpolated. Remove the
fprefix.Apply this diff to remove the unnecessary f-string:
-return Result.failed(code="E000X", msg=f"doc_file is None") +return Result.failed(code="E000X", msg="doc_file is None")🧰 Tools
🪛 Ruff
397-397: f-string without any placeholders
Remove extraneous
fprefix(F541)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
gptdb/app/knowledge/api.py(2 hunks)
🧰 Additional context used
🪛 Ruff
gptdb/app/knowledge/api.py
6-6: pathlib.Path imported but unused
Remove unused import: pathlib.Path
(F401)
339-339: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
377-377: Yoda condition detected
Rewrite as safe_space_name != "default"
(SIM300)
378-378: f-string without any placeholders
Remove extraneous f prefix
(F541)
397-397: f-string without any placeholders
Remove extraneous f prefix
(F541)
| if not upload_dir.startswith(os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)) or not target_path.startswith(os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)): | ||
| raise HTTPException(status_code=400, detail="Invalid path detected") |
There was a problem hiding this comment.
Use a more secure method for path validation
Using startswith for path validation can be unsafe, as it may be bypassed with crafted inputs. To securely ensure that upload_dir and target_path are within the allowed directory, use os.path.commonpath for comparison.
Apply this diff to enhance path validation:
-if not upload_dir.startswith(os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)) or not target_path.startswith(os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)):
+if os.path.commonpath([upload_dir, os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)]) != os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH) or \
+ os.path.commonpath([target_path, os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)]) != os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH):
raise HTTPException(status_code=400, detail="Invalid path detected")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not upload_dir.startswith(os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)) or not target_path.startswith(os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)): | |
| raise HTTPException(status_code=400, detail="Invalid path detected") | |
| if os.path.commonpath([upload_dir, os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)]) != os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH) or \ | |
| os.path.commonpath([target_path, os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH)]) != os.path.abspath(KNOWLEDGE_UPLOAD_ROOT_PATH): | |
| raise HTTPException(status_code=400, detail="Invalid path detected") |
User description
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Snapshots:
Include snapshots for easier review.
Checklist:
PR Type
Bug fix
Description
document_uploadfunction by sanitizing input paths.HTTPExceptionif not.os.path.basenameto sanitizespace_nameanddoc_file.filenameto prevent malicious path inputs.Changes walkthrough 📝
api.py
Fix directory traversal vulnerability in document uploadgptdb/app/knowledge/api.py
HTTPExceptionfor invalid path detection.os.path.basenameto sanitize file and directory names.Summary by CodeRabbit
New Features
Bug Fixes
Refactor