Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Show merge dialog when importing a single PDF

Context and Problem Statement

PDF files are one of the main format for transferring various documents, especially scientific papers. However, by itself, PDF is like a picture, it contains commands solely for displaying the human-readable text, but it might not contain computer-readable metadata.

To overcome these problems various heuristics and AI models are used to “convert” a PDF into a BibTeX entry. However, it also introduces a level of problems, as heuristics are not ideal: sometimes it works perfectly, but on others it generates random output.

PDF importing in JabRef is done via PdfImporter abstract class and its descendants, and via PdfMergeMetadataImporter. PdfImporter is typically a single heuristics or method of extracting a BibEntry from PDF. PdfMergeMetadataImporter collects BibEntry candidates from all PdfImporters and merges them automatically into a single BibEntry.

The specific problem JabRef has: should JabRef automate all heuristics (automatically merge all BibEntryies from several PdfImporters) when importing PDF files or should every file be analysed thoroughly by users?

Decision Drivers

  • Option should provide a good-enough quality.
  • It is desired to have a fine-grained controls of PDF importing for power-users.

Considered Options

  • Automatically merge all BibEntry candidates from PdfImporters.
  • Open a merge dialog with all candidates.
  • Open a merge dialog with all candidates if a single PDF is imported.

Decision Outcome

Chosen option: “Open a merge dialog with all candidates if a single PDF is imported”, because comes out best (see below).

Pros and Cons of the Options

Automatically merge all BibEntry candidates from PdfImporters

  • Good, because minimal user interaction and disruption of flow. It also allows batch-processing.
  • Bad, because heuristics are not ideal, and it is even harder to develop a “smarter” merging algorithm.

Open a merge dialog with all candidates

  • Good, because allows for fine-grained import. Some correct field may be overridden by a wrong field from other importer, which is undesirable for power-users.
  • Bad, because it is a dialog. If lots of PDFs are imported, then there will be lots of dialogs, which might be too daunting to process manually.

Open a merge dialog with all candidates if a single PDF is imported

Explanation:

  • If a single PDF is imported, then open a merge dialog.
  • If several PDFs are imported, merge candidates for each PDF automatically.

Outcomes:

  • Good, because it combines the best of the other two options: Allow both for PDF batch-processing and for fine-grained control.