Fully Support UTF-8 Only For LaTeX Files
Context and Problem Statement
The feature search for citations displays the content of LaTeX files. The LaTeX files are text files and might be encoded arbitrarily.
Considered Options
- Support UTF-8 encoding only
- Support ASCII encoding only
- Support (nearly) all encodings
Decision Outcome
Chosen option: “Support UTF-8 encoding only”, because comes out best (see below).
Positive Consequences
- All content of LaTeX files are displayed in JabRef
Negative Consequences
- When a LaTeX files is encoded in another encoding, the user might see strange characters in JabRef
Pros and Cons of the Options
Support UTF-8 encoding only
- Good, because covers most tex file encodings
- Good, because easy to implement
- Bad, because does not support encodings used before around 2010
Support ASCII encoding only
- Good, because easy to implement
- Bad, because does not support any encoding at all
Support (nearly) all encodings
- Good, because easy to implement
-
Bad, because it relies on Apache Tika’s
CharsetDetector
, which resides intika-parsers
.This causes issues during compilation (see https://github.com/JabRef/jabref/pull/3421#issuecomment-524532832).
Example:
error: module java.xml.bind reads package javax.activation from both java.activation and jakarta.activation
.