Feature/related work annotator #14187

jsochava · 2025-10-27T02:36:37Z

This Draft PR introduces a complete, deterministic pipeline for harvesting contextual summaries from a citing paper’s “Related Work” section and appending them to the appropriate BibEntry records.

What and why

RelatedWorkAnnotator.java

Appends contextual summaries from a citing paper’s “Related Work” section into a target BibEntry.
Uses JabRef’s comment- convention (resolved as UserSpecificCommentField).

HeuristicRelatedWorkExtractor.java

Deterministic parser that locates author–year citations (e.g. (Vesce et al., 2016), (Bianchi, 2021)) within Related Work text.
Extracts descriptive snippets surrounding each citation.
Matches each citation to an existing BibEntry by first author surname + year.
Implemented without AI dependencies; designed for reliability and transparent logic.

RelatedWorkHarvester.java

High-level orchestrator that connects the extractor and annotator:
Accepts PDF-extracted or plain text input.
Calls the extractor to identify citation–context pairs.
Invokes RelatedWorkAnnotator.appendSummaryToEntry(...) for each match.

RelatedWorkSectionLocator.java

Deterministically isolates the “Related Work” / “Literature Review” / “Prior Work” / “Background and Related Work” section from a paper’s full plain text.
Recognizes numeric and textual headers.
Captures content until the next top-level header, ignoring figure/table captions and unrelated content.

RelatedWorkPipeline.java

Introduces a convenience façade that chains the full extraction process:
- SectionLocator → HeuristicRelatedWorkExtractor → RelatedWorkAnnotator
Enables single-call usage for clients that have the full plain text and a list of candidate BibEntry objects.

PdfTextProvider.java
-A tiny SPI interface: Path -> Optional plain-text extraction.
-Keeps all PDF specifics behind a seam; facilitates unit testing with fakes/mocks and avoids hard deps in core logic.
PdfRelatedWorkTextExtractor.java

Adapter: PDF -> plain text (via PdfTextProvider) → “Related Work” block (via RelatedWorkSectionLocator).
Returns Optional with the body of the section (header stripped), or empty if not found/blank.
Validates inputs and surfaces IO errors; does not depend on any PDF library directly.

RelatedWorkPdfPipeline.java

End-to-end façade for callers that have a PDF and candidate entries:
PdfRelatedWorkTextExtractor → HeuristicRelatedWorkExtractor → RelatedWorkAnnotator.
Returns the number of annotations appended across matched entries.

RelatedWorkEvaluationRunner.java

Deterministic evaluator comparing extracted (citation → snippet) pairs against gold fixtures.
Computes precision, recall, F1, and coverage statistics.
Supports in-memory or JSON-based fixture definitions.

RelatedWorkMetrics.java

Immutable results object summarizing global and per-entry metrics.
Includes pretty-printed summaries and detailed statistics for debugging.

RelatedWorkFixture.java

Simple model for “gold” fixture data: includes related-work text and expected (author, year, snippet) expectations.
Supports direct in-memory creation or loading from a JSON file.

HeuristicExtractorAdapter.java

Bridge layer converting the HeuristicRelatedWorkExtractor output (Map<String, String>) into the Map<BibEntry, List> format expected by the evaluation runner.
Keeps the original extractor untouched.

RelatedWorkSummarizer.java / NoOpRelatedWorkSummarizer.java

SPI interface for optional snippet summarization.
Default no-op implementation returns empty → the harvester keeps individual snippets unchanged.
Future AI integrations (e.g., LangChain4j or OpenAI) can implement this interface.

CitationResolver.java / NoOpCitationResolver.java

SPI interface for resolving missing citations by key or author–year, optionally creating new entries.
Default implementation performs a simple local lookup and never creates entries.

RelatedWorkPluginConfig.java + RelatedWorkPluginsFactory.java

Lightweight configuration object with feature flags:
- enableSummarization
- enableResolution
Provides builder methods to safely compose plugin pipelines.
Used by RelatedWorkHarvester to inject summarizer and resolver instances.
Default build → no-op config (preserves old behavior).

Next steps

PDF text extraction

Integrate JabRef’s existing PDF parsing utilities or the LangChain4j interface to automatically extract the “Related Work” section from PDFs.
Focus on reliable section header detection (e.g., Related Work, Literature Review).

Reference lookup

For each parsed in-text citation:
- Match to an existing library entry.
- If missing, create a new BibEntry and annotate it.

AI-Assisted Context Summarization

Optional integration with LangChain4j for generating concise natural-language summaries when multiple snippets exist.

Steps to test

Run the unit tests for the new features only:

./gradlew :jablib:test --tests "org.jabref.logic.importer.RelatedWorkAnnotatorTest"
./gradlew :jablib:test --tests "org.jabref.logic.importer.relatedwork.HeuristicRelatedWorkExtractorTest"
./gradlew :jablib:test --tests "org.jabref.logic.importer.relatedwork.RelatedWorkHarvesterTest"
./gradlew :jablib:test --tests "org.jabref.logic.importer.relatedwork.RelatedWorkSectionLocatorTest"
./gradlew :jablib:test --tests "org.jabref.logic.importer.relatedwork.PdfRelatedWorkTextExtractorTest"
./gradlew :jablib:test --tests "org.jabref.logic.importer.relatedwork.RelatedWorkMetricsTest"

Mandatory checks

I own the copyright of the code submitted and I license it under the MIT license
I manually tested my changes in running JabRef (always required)
I added JUnit tests for changes (if applicable)
[/] I added screenshots in the PR description (if change is visible to the user)
[/] I described the change in CHANGELOG.md in a way that is understandable for the average user (if change is visible to the user)
[/] I checked the user documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request updating file(s) in https://github.com/JabRef/user-documentation/tree/main/en.

…ries to comment-<username> (JabRef#14085) This helper takes a BibEntry, a username, the citing paper's key, and a summary sentence, and appends a block like: [LunaOstos_2024]: <summary> to the field comment-<username>. If that field already has content, the new block is appended after a blank line. Includes unit tests verifying first append and multi-append behavior.

github-actions · 2025-10-27T02:36:53Z

Hey @jsochava!

Thank you for contributing to JabRef! Your help is truly appreciated ❤️.

We have automatic checks in place, based on which you will soon get automated feedback if any of them are failing. We also use TragBot with custom rules that scans your changes and provides some preliminary comments, before a maintainer takes a look. TragBot is still learning, and may not always be accurate. In the "Files changed" tab, you can go through its comments and just click on "Resolve conversation" if you are sure that it is incorrect, or comment on the conversation if you are doubtful.

Please re-check our contribution guide in case of any other doubts related to our contribution workflow.

…agged files (JabRef#14085)

…der) (JabRef#14085)

…k-annotator

…4085)

… to logic and model subfolders(JabRef#14085)

…f#14085) Implements a deterministic extractor for author–year style citations in "Related Work" sections and integrates it with RelatedWorkAnnotator. - Added org.jabref.logic.importer.relatedwork package - Introduced RelatedWorkExtractor interface - Implemented HeuristicRelatedWorkExtractor for author–year citation parsing - Implemented RelatedWorkHarvester orchestrator that uses the extractor and appends summaries via RelatedWorkAnnotator - Added comprehensive JUnit tests verifying extraction and annotation behavior This change completes the non-AI (LangChain4j-free) MVP for issue JabRef#14085. Future work may introduce an AI-based RelatedWorkExtractor using LangChain4j.

…rk-annotator

…critics(JabRef#14085) - Updated AUTHOR_YEAR_INNER regex to allow all-caps acronyms (e.g., "CIA") and Unicode names (e.g., "Šimić"). - Added acronym indexing in buildIndex() so corporate or multi-word authors (e.g., "Central Intelligence Agency") map to their acronyms. - Ensures citations like (CIA, 2021) correctly match entries such as "Central Intelligence Agency, 2021". - Keeps deterministic behavior while improving coverage of real-world citation formats in Related Work sections.

…f#14085)

…or section detection and overall run(JabRef#14085) Adds RelatedWorkSectionLocator, a deterministic logic class to extract the “Related Work” or “Literature Review” section from full plain text using common header variants and numeric patterns Introduces RelatedWorkPipeline as a high-level façade chaining: - RelatedWorkSectionLocator → HeuristicRelatedWorkExtractor → RelatedWorkAnnotator The pipeline provides a clean, dependency-free entry point for integrating related-work extraction into JabRef’s importer pipeline. Includes comprehensive unit tests for RelatedWorkSectionLocator to verify detection robustness across multiple header variants. This change enables downstream components to reliably isolate and process related-work text without external dependencies.

…sts(JabRef#14085)

…ider(JabRef#14085)

…JabRef#14085) Introduces a metrics module to evaluate the heuristic Related Work extractor. - Added RelatedWorkEvaluationRunner: computes precision, recall, F1, and coverage for extracted citation–context pairs against gold fixtures. - Added RelatedWorkMetrics: immutable summary object with per-entry statistics. - Added RelatedWorkFixture: compact JSON or in-memory format for evaluation data. - Added HeuristicExtractorAdapter: bridges HeuristicRelatedWorkExtractor output (Map<String,String>) to the runner’s expected Map<BibEntry,List<String>> form. - Added RelatedWorkMetricsTest: self-contained JUnit test that runs the extractor on a gold "Related Work" text and prints evaluation metrics. This provides a deterministic, non-AI benchmark for assessing heuristic coverage before integrating more complex methods.

…olution(JabRef#14085) - Introduce RelatedWorkSummarizer and CitationResolver interfaces - Add no-op default implementations (NoOpRelatedWorkSummarizer, NoOpCitationResolver) - Add RelatedWorkPluginConfig with feature toggles and DI-friendly builder - Wire RelatedWorkHarvester to optionally use resolver for missing keys and summarizer for multi-snippet entries - Keep default behavior unchanged (both features disabled)

…tion resolution(JabRef#14085)" This reverts commit 91af3d9.

…bRef#14085)

… test(JabRef#14085)

koppor

Please check your IDE config. It seems you refomatted too much with the wrong style. Hard to give content feedback.

jsochava · 2025-11-16T02:23:17Z

@koppor when I used the IDE config described in the project's setup instructions, I consistently failed the automatic format checks. Is that expected?

jsochava added 2 commits October 25, 2025 20:42

Update Project.xml

d1172c2

github-actions bot added the first contrib label Oct 27, 2025

jsochava added 2 commits October 27, 2025 15:39

Fixed formatting issues(JabRef#14085)

d60ffe4

Attempt to fix formatting issue (JabRef#14085)

711c3a9

jsochava force-pushed the feature/related-work-annotator branch from 21d4bac to 711c3a9 Compare October 30, 2025 00:47

jsochava added 23 commits October 29, 2025 21:14

Attempt to fix format issues in draft pr(JabRef#14085)

2925c4b

Another attempt to fix formatting- selected "reformat code" on all fl…

10e1621

…agged files (JabRef#14085)

Another attempt to fix formatting issues (JabRef#14085)

eb988d0

Reformat code applied (JabRef#14085)

adbaf3c

Reformat code applied to jablib folder(previous commit was jabgui fol…

b1a239d

…der) (JabRef#14085)

Merge remote-tracking branch 'upstream/main' into feature/related-wor…

9e01bb6

…k-annotator

Applied reformat code to build-support, jabsrv, test-support(JabRef#1…

73c38e2

…4085)

Attempt to fix failed jablib unit tests(JabRef#14085)

fab7ed2

Another attempt to fix format issues in jablib; applied reformat code…

4ce2e77

… to logic and model subfolders(JabRef#14085)

Applied reformat code to 2 files in jablib/src/test (JabRef#14085)

797ec1b

Merge branch 'feature/related-work-extractor' into feature/related-wo…

c044910

…rk-annotator

test: conform tests to OpenRewrite(JabRef#14085)

5016534

Fixed formatting issues in test files(JabRef#14085)

7c5d663

Merge branch 'feature/related-work-extractor' into feature/related-work

ae55296

Attempt to fix Format, Checkstyle, and OpenRewrite failed tests(JabRe…

c0b58d1

…f#14085)

Merge remote-tracking branch 'upstream/main' into feature/related-work

c48936e

Attempt to fix Checkstyle, OpenRewrite, format, and JavaDoc failed te…

404cf6d

…sts(JabRef#14085)

attempt to fix failed jablib unit tests(JabRef#14085)

0398051

Attempt to fix failed format check(JabRef#14085)

0a9903f

feat(jablib): add PdfRelatedWorkTextExtractor (adapter) + PdfTextProv…

3c5082e

…ider(JabRef#14085)

jsochava added 11 commits November 11, 2025 09:37

Trying to fix format isues (JabRef#14085)

05fbb08

Merge remote-tracking branch 'upstream/main' into feature/related-work

7cc10b0

attempt to fix format(JabRef#14085)

01fd87c

attempt to fix format issues(JabRef#14085)

f50e58d

Merge remote-tracking branch 'upstream/main' into feature/related-work

93d5469

Git merge updates

7af8d66

Revert "feat: add plug-in SPI for Related Work summarization and cita…

571250e

…tion resolution(JabRef#14085)" This reverts commit 91af3d9.

attempt to fix JBang (PR) (.jbang/JabKitLauncher.java) failed test(Ja…

27dc30c

…bRef#14085)

another attempt to fix JBang (PR) (.jbang/JabKitLauncher.java) failed…

297d0a3

… test(JabRef#14085)

koppor requested changes Nov 15, 2025

View reviewed changes

github-actions bot added the status: changes-required Pull requests that are not yet complete label Nov 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature/related work annotator #14187

Feature/related work annotator #14187

jsochava commented Oct 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

koppor left a comment

Uh oh!

jsochava commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Feature/related work annotator #14187

Are you sure you want to change the base?

Feature/related work annotator #14187

Conversation

jsochava commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What and why

Next steps

Steps to test

Mandatory checks

Uh oh!

github-actions bot commented Oct 27, 2025

Hey @jsochava!

Uh oh!

koppor left a comment

Choose a reason for hiding this comment

Uh oh!

jsochava commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsochava commented Oct 27, 2025 •

edited

Loading