Skip to content

Parser fails for lecker.de recipes: JSON-LD block not found (multi-block issue) #3224

Description

@desiue

Description
The Nextcloud Cookbook app fails to import recipes from lecker.de using the URL import feature.

The application returns the error: "Für den angegeben Import konnte kein Parser gefunden werden" (No parser found for the specified import).

Investigation shows that lecker.de provides multiple JSON-LD blocks on their recipe pages (e.g., Organization, WebSite, BreadcrumbList, etc.) before the actual Recipe block. The current implementation of the Cookbook parser appears to look only at the first JSON-LD block (which is an Organization type) or fails to identify the correct source block, leading to the error.

Since the Recipe schema exists further down on the page within the JSON-LD data (at block index 7), the parser should be updated to iterate through all JSON-LD blocks to locate the one with @type: Recipe.

Reproduction
Steps to reproduce the behavior:

  1. Copy a URL from lecker.de (e.g., https://www.lecker.de/haehnchen-doener-46850.html).
  2. Go to the Cookbook app in Nextcloud.
  3. Click on 'Import'.
  4. Paste the URL and submit.
  5. See error. "Für den angegeben Import konnte kein Parser gefunden werden."

Expected behavior
The parser should iterate through all available application/ld+json blocks on the page and identify the one where @type is Recipe, allowing a successful automatic import of the recipe data.

Actual behavior
The import fails with the error message: "Für den angegeben Import konnte kein Parser gefunden werden."
Analyzing the page source reveals 8 JSON-LD blocks, and the actual recipe data is located in the 8th block (index 7). A manual file-based JSON import of this specific block works flawlessly.

Browser
All Firefox, Chrome and Edge

Versions
Nextcloud server version: Nextcloud Hub 26 Spring (34.0.1 RC2)
Cookbook version: 0.11.7
Database system: MySQL/MariaDB (Standard installation)

Issue Description

When attempting to import recipes from certain websites (e.g., those using Cloudflare or specific compression configurations), the import fails with an internal server error.

The underlying cause is a Warning: DOMDocument::loadHTML(): Bytes: 0x1F 0x8B in Entity thrown by libxml inside the HttpJsonLdParser. This happens because the remote server sends the response headers or content compressed via GZIP/Deflate, but the cookbook app's internal cURL request does not handle or declare support for decompression. As a result, DOMDocument attempts to parse raw, binary GZIP data (which starts with the magic bytes 0x1F 0x8B), causing the parser to fail.


Workaround / Fix

We resolved the issue locally by applying two changes. The first one fixes the root cause, while the second acts as a defensive buffer for large/complex HTML documents.

1. Fix the Root Cause (Enable cURL Decompression)

We added CURLOPT_ENCODING => '' to the cURL options in HtmlDownloadService.php. This explicitly tells cURL to send all compression headers it supports (Accept-Encoding: gzip, deflate, br) and automatically decompresses the content in memory before passing the clean HTML string over to PHP.

File modified: lib/Service/HtmlDownloadService.php

// Inside fetchHtmlPage(string $url) method (around line 114)
$opt = [
    CURLOPT_USERAGENT => 'Mozilla/5.0 (X11; Linux x86_64; rv:129.0) Gecko/20100101 Firefox/129.0',
    CURLOPT_ENCODING  => '', // <-- ADDED THIS LINE to automatically handle GZIP/Deflate
];




File modified: lib/Helper/HTMLParser/HttpJsonLdParser.php
// Inside parse(\DOMDocument $document, ?string $url) method
// Change the loading configuration to include LIBXML_PARSEHUGE
@$document->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD | LIBXML_PARSEHUGE);

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions