Skip to content

Conversation

@JoaquinFernandez
Copy link

@JoaquinFernandez JoaquinFernandez commented Oct 22, 2025

Problem

Gmail API returns multipart/alternative emails with both text/plain and text/html parts. Some email senders (e.g., Rentalcars.com, Booking.com) include only a useless fallback message in text/plain like:

Your client does not support HTML emails. <!-- comment -->

While all actual content is in the HTML part (often 1000x+ longer).

The current implementation always prioritizes text/plain over text/html, causing these emails to return only the fallback message.

Root Cause

_format_body_content() returned text/plain immediately if it was non-empty, without checking if it was useful content or a fallback placeholder.

Solution

  1. Added BeautifulSoup4 dependency to convert HTML to readable plain text
  2. Implemented robust fallback detection:
    • Primary: Check for HTML comments (<!--) in text/plain
    • Rationale: Legitimate plain text never contains HTML comments
    • Fallback: Use HTML if it's 50x+ longer than text
  3. Convert HTML to text instead of returning raw HTML

Changes

  • pyproject.toml: Added beautifulsoup4>=4.12.0 dependency
  • gmail/gmail_tools.py:
    • Added _html_to_text() function to convert HTML to readable text
    • Modified _format_body_content() with fallback detection

Testing

Tested with Rentalcars.com confirmation emails:

  • Before: Returned only "Your client does not support HTML emails. " (99 chars)
  • After: Correctly extracts full booking information from HTML (183,065 chars → converted to readable text)

Impact

  • ✅ Fixes HTML-only emails from major booking platforms
  • ✅ No breaking changes - legitimate text/plain emails still work
  • ✅ No false positives - HTML comments never appear in real plain text
  • ✅ Language-independent detection

Note on uv.lock

The uv.lock file was intentionally not included in this PR to avoid format conflicts. The dependency changes in pyproject.toml are sufficient, and the maintainers can regenerate the lock file with their version of uv.

🤖 Generated with Claude Code

@JoaquinFernandez JoaquinFernandez changed the title Fix: Handle HTML-only emails with useless text/plain fallback fix: handle HTML-only emails with useless text/plain fallback Oct 22, 2025
@taylorwilsdon
Copy link
Owner

Hey there, I'd love to accomplish this without pulling in beautifulsoup - any alternative route we could take?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants