Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).
Fix formatting in a few corner cases that automatic conversion can't
handle.
Rearrange some DOC_DISABLE blocks.
Allow drive letters in URI paths. Technically, these should be treated
as URI schemes, but this is not what users expect. This also makes sure
that paths with drive letters are resolved as filesystem paths and
unescaped, for example when used in libxslt's document() function.
Should fix#832.
Many strings are passed to the library that could be either URIs or
filesystem paths. We now assume that strings are a URI if they contain
the substring "://". This means that they have a scheme and an
authority. Otherwise, URI resolution wouldn't make much sense.
Fix xmlBuildURI to work with filesystem paths. If the base URI doesn't
contain "://" it is treated as filename. The resolved URI is unescaped,
appended and the result is normalized. Rewrite xmlNormalizePath to
handle Windows quirks.
All special handling for Windows paths is removed in xmlCanonicPath.
If the path looks like an URI, only escape characters allowed in Legacy
Extended IRIs.
Make xmlPathToURI only call xmlCanonicPath. Theh additional round-trip
through URI parser and serializer seems useless.
Add a helper function xmlConvertUriToPath in xmlIO.c which checks for
file URIs and unescapes them.
Always process strings with xmlCanonicPath in xmlLoadExternalEntity.
This should be harmless now.
Should help with #334, #387, #611.
Fix many places where malloc failures weren't reported, for example
after calling xmlStrdup.
Introduce new public API functions that return a separate error code if
a memory allocation fails:
- xmlParseURISafe
- xmlBuildURISafe
- xmlBuildRelativeURISafe
Update the fuzzer to check whether malloc failures are reported.
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
From what I can tell, some really early Cygwin versions from around
1998-2000 used to erroneously define _WIN32. This was eventually fixed,
but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is
unnecessary.
Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether
to use __declspec.
Found by running the fuzz/uri.c fuzzer under asan (internal Android bug
171610679).
Always free `ret` when exiting on failure. I've moved the definition of
NULLCHK down past where ret is always initialized to make it clear that
this is safe.
This patch also fixes the indentation of two of the NULLCHK call sites
to make it more obvious that NULLCHK isn't `if`-like.
Raised by Matthias Pigulla <mp@webfactory.de>
In a nutshell we had that bug on URI composition after some fixes in
the area of localhost empty shortcuts :
./testURI --base file:///some/where file
Without patch: file:/some/file
With patch: file:///some/file
If the relative URI started with './', the 'pos' index was increased
which also affected indexing into the base path. Aside from producing
wrong results, this could also lead to a heap overread of the base
path buffer. The data read from beyond the buffer was only compared
to some char values, so this is mostly harmless.
Inside libxml2, xmlBuildRelativeURI is only called from xinclude.c.
Found with libFuzzer and ASan.
For https://bugzilla.gnome.org/show_bug.cgi?id=765566
in xmlParse3986Port(), uri->port can overflow when parsing a the port number.
The type of uri->port is int, so the consequent behavior is undefined and
may differ between compilers and architectures
As written by Martin Kletzander <mkletzan@redhat.com>:
Since commit 8eb55d782a2b9afacc7938694891cc6fad7b42a5, when you parse
and save an URI that has no server (or similar) part, two slashes
after the 'schema:' get lost. It means 'uri:///noserver' is turned
into 'uri:/noserver'.
basically
foo:///only/path
means a host of "" while
foo:/only/path
means no host at all
So the best fix IMHO is to fix the URI parser to record the first
case and an empty host string and the second case as a NULL host string
I would not revert the initial patch, we should not 'invent' those
slash, but we should instead when parsing keep the information that
it's a host based path and that foo:/// means the presence of a host
but an empty one.
Once applied the resulting patch below, all cases seems to be saved
properly:
thinkpad:~/XML -> ./testURI uri:/noserver
uri:/noserver
thinkpad:~/XML -> ./testURI uri:///noserver
uri:///noserver
thinkpad:~/XML -> ./testURI uri://server/foo
uri://server/foo
thinkpad:~/XML -> ./testURI uri:/noserver/foo
uri:/noserver/foo
thinkpad:~/XML -> ./testURI uri:///
uri:///
thinkpad:~/XML -> ./testURI uri://
uri://
thinkpad:~/XML -> ./testURI uri:/
uri:/
thinkpad:~/XML ->
If you revert the initial patch that last case fails
The problem is that I don't want to change the xmlURI structure to
minimize ABI breakage, so I could not extend the field. The natural
solution is to denote that uri:/// has an empty host by making
the uri server field an empty string which works very well but breaks
applications (like libvirt ;-) who blindly look at uri->server
not being NULL to try to reach it !
Simplest was to stick the port to -1 in that case, instead of 0
application don't bother looking at the port of there is no server
string, this makes the patch more complex than a 1 liner, but
is better for ABI.
For https://bugzilla.gnome.org/show_bug.cgi?id=731063
xmlSaveUri() of libxml2 (snapshot 2014-05-31 and earlier) returns
bogus values when called with URIs that have rootless paths
(e.g. "urx🅱️b" becomes "urx://b%3Ab" where "urx:b%3Ab" would be
correct)
so we've got this patch to libxml2 2.7.6 in the LibreOffice code base,
inherited from OOo. it fixes a definite problem, which is that Windows
has a rather low maximum path length restriction, and there is a special
trick on NT whereby path names can be prefixed with "\\?\", in which
case the maximum length is 32k, which ought to be sufficient even for
bloated office suites :)
I'll attach the patch to the xmlCanonicPath function. note that i
didn't write this and am by no means an expert on either Microsoftean
platforms or libxml so maybe it's not the best way to do it.
* uri.c: cleanup the code doing the allocations, set up a structured
error handler to report memory errors, and set up an abitrary
limit on URI saving size
* error.c include/libxml/xmlerror.h: add a new FROM_URI indication
for structured error reporting, also adding strings for schematron
and buffer which were missing