Skip to content

LO fails to load document after saving with odftoolkit due to invalid UTF-16 entities #137

@FlorianBruckner

Description

@FlorianBruckner

Xalan contains a nasty bug that produces incorrect XML entities in the output, leading to a corrupt document. E.g. this input

<text:span text:style-name="T19">𝜈</text:span>

Is changed to this when saving this document with odftoolkit:

<text:span text:style-name="T19">&#55349;&#57096;</text:span>

More information about the root cause can be found here:
https://issues.apache.org/jira/browse/XALANJ-2419

As it seems unlikely that there will ever be a new Xalan release including a fix for this, one option (and that is what I have been doing now) is to replace the xalan serializer dependency with a known good version, e.g.

        <dependency>
            <groupId>org.docx4j.org.apache</groupId>
            <artifactId>xalan-serializer</artifactId>
            <version>11.0.0</version>
        </dependency>

I cannot vouch for the integrity of this package but I have verified that it actually fixes the invalid encoding.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions