Skip to content

fix: handle None paragraph style in DOCX extraction#7

Merged
Grandvizir merged 1 commit intoinformatique-cdc:mainfrom
mmaudet:fix/docx-null-style
Mar 26, 2026
Merged

fix: handle None paragraph style in DOCX extraction#7
Grandvizir merged 1 commit intoinformatique-cdc:mainfrom
mmaudet:fix/docx-null-style

Conversation

@mmaudet
Copy link
Copy Markdown
Contributor

@mmaudet mmaudet commented Mar 26, 2026

Summary

  • Guard against para.style being None in _extract_docx(), which caused ingestion to crash on certain DOCX files

Root cause

para.style can be None for paragraphs in DOCX files created by non-Microsoft editors or converted from other formats. The existing code only guarded against para.style.name being None, not para.style itself.

Fix

# Before
style_name = (para.style.name or "").lower()

# After
style_name = ((para.style.name if para.style else "") or "").lower()

Fixes #6

`para.style` can be None for paragraphs in some DOCX files (e.g.
documents converted from other formats). This caused ingestion to
crash with "'NoneType' object has no attribute 'name'", skipping
the entire document.

Fixes informatique-cdc#6
@Grandvizir Grandvizir merged commit 5384e3e into informatique-cdc:main Mar 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOCX ingestion crashes when paragraph has no style

2 participants