Fully Support UTF-8 Only For LaTeX Files

Context and Problem Statement

The feature search for citations displays the content of LaTeX files. The LaTeX files are text files and might be encoded arbitrarily.

Considered Options

Support UTF-8 encoding only
Support ASCII encoding only
Support (nearly) all encodings

Decision Outcome

Chosen option: “Support UTF-8 encoding only”, because comes out best (see below).

Positive Consequences

All content of LaTeX files are displayed in JabRef

Negative Consequences

When a LaTeX files is encoded in another encoding, the user might see strange characters in JabRef

Pros and Cons of the Options

Support UTF-8 encoding only

Good, because covers most tex file encodings
Good, because easy to implement
Bad, because does not support encodings used before around 2010

Support ASCII encoding only

Good, because easy to implement
Bad, because does not support any encoding at all

Support (nearly) all encodings

Good, because easy to implement
Bad, because it relies on Apache Tika’s CharsetDetector, which resides in tika-parsers.

This causes issues during compilation (see https://github.com/JabRef/jabref/pull/3421#issuecomment-524532832).

Example: error: module java.xml.bind reads package javax.activation from both java.activation and jakarta.activation.