-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Feature/related work annotator #14187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ries to comment-<username> (JabRef#14085) This helper takes a BibEntry, a username, the citing paper's key, and a summary sentence, and appends a block like: [LunaOstos_2024]: <summary> to the field comment-<username>. If that field already has content, the new block is appended after a blank line. Includes unit tests verifying first append and multi-append behavior.
Hey @jsochava!Thank you for contributing to JabRef! Your help is truly appreciated ❤️. We have automatic checks in place, based on which you will soon get automated feedback if any of them are failing. We also use TragBot with custom rules that scans your changes and provides some preliminary comments, before a maintainer takes a look. TragBot is still learning, and may not always be accurate. In the "Files changed" tab, you can go through its comments and just click on "Resolve conversation" if you are sure that it is incorrect, or comment on the conversation if you are doubtful. Please re-check our contribution guide in case of any other doubts related to our contribution workflow. |
21d4bac to
711c3a9
Compare
… to logic and model subfolders(JabRef#14085)
…f#14085) Implements a deterministic extractor for author–year style citations in "Related Work" sections and integrates it with RelatedWorkAnnotator. - Added org.jabref.logic.importer.relatedwork package - Introduced RelatedWorkExtractor interface - Implemented HeuristicRelatedWorkExtractor for author–year citation parsing - Implemented RelatedWorkHarvester orchestrator that uses the extractor and appends summaries via RelatedWorkAnnotator - Added comprehensive JUnit tests verifying extraction and annotation behavior This change completes the non-AI (LangChain4j-free) MVP for issue JabRef#14085. Future work may introduce an AI-based RelatedWorkExtractor using LangChain4j.
…critics(JabRef#14085) - Updated AUTHOR_YEAR_INNER regex to allow all-caps acronyms (e.g., "CIA") and Unicode names (e.g., "Šimić"). - Added acronym indexing in buildIndex() so corporate or multi-word authors (e.g., "Central Intelligence Agency") map to their acronyms. - Ensures citations like (CIA, 2021) correctly match entries such as "Central Intelligence Agency, 2021". - Keeps deterministic behavior while improving coverage of real-world citation formats in Related Work sections.
…or section detection and overall run(JabRef#14085) Adds RelatedWorkSectionLocator, a deterministic logic class to extract the “Related Work” or “Literature Review” section from full plain text using common header variants and numeric patterns Introduces RelatedWorkPipeline as a high-level façade chaining: - RelatedWorkSectionLocator → HeuristicRelatedWorkExtractor → RelatedWorkAnnotator The pipeline provides a clean, dependency-free entry point for integrating related-work extraction into JabRef’s importer pipeline. Includes comprehensive unit tests for RelatedWorkSectionLocator to verify detection robustness across multiple header variants. This change enables downstream components to reliably isolate and process related-work text without external dependencies.
…JabRef#14085) Introduces a metrics module to evaluate the heuristic Related Work extractor. - Added RelatedWorkEvaluationRunner: computes precision, recall, F1, and coverage for extracted citation–context pairs against gold fixtures. - Added RelatedWorkMetrics: immutable summary object with per-entry statistics. - Added RelatedWorkFixture: compact JSON or in-memory format for evaluation data. - Added HeuristicExtractorAdapter: bridges HeuristicRelatedWorkExtractor output (Map<String,String>) to the runner’s expected Map<BibEntry,List<String>> form. - Added RelatedWorkMetricsTest: self-contained JUnit test that runs the extractor on a gold "Related Work" text and prints evaluation metrics. This provides a deterministic, non-AI benchmark for assessing heuristic coverage before integrating more complex methods.
…olution(JabRef#14085) - Introduce RelatedWorkSummarizer and CitationResolver interfaces - Add no-op default implementations (NoOpRelatedWorkSummarizer, NoOpCitationResolver) - Add RelatedWorkPluginConfig with feature toggles and DI-friendly builder - Wire RelatedWorkHarvester to optionally use resolver for missing keys and summarizer for multi-snippet entries - Keep default behavior unchanged (both features disabled)
…tion resolution(JabRef#14085)" This reverts commit 91af3d9.
koppor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check your IDE config. It seems you refomatted too much with the wrong style. Hard to give content feedback.
|
@koppor when I used the IDE config described in the project's setup instructions, I consistently failed the automatic format checks. Is that expected? |
Closes #14085
This Draft PR introduces a complete, deterministic pipeline for harvesting contextual summaries from a citing paper’s “Related Work” section and appending them to the appropriate BibEntry records.
What and why
- SectionLocator → HeuristicRelatedWorkExtractor → RelatedWorkAnnotator
PdfTextProvider.java
-A tiny SPI interface: Path -> Optional plain-text extraction.
-Keeps all PDF specifics behind a seam; facilitates unit testing with fakes/mocks and avoids hard deps in core logic.
PdfRelatedWorkTextExtractor.java
- enableSummarization
- enableResolution
Next steps
- Match to an existing library entry.
- If missing, create a new BibEntry and annotate it.
Steps to test
Mandatory checks
CHANGELOG.mdin a way that is understandable for the average user (if change is visible to the user)