Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Mathjax error in SERP #22

Open
GaurangTandon opened this issue Feb 23, 2018 · 2 comments
Open

Unexpected Mathjax error in SERP #22

GaurangTandon opened this issue Feb 23, 2018 · 2 comments

Comments

@GaurangTandon
Copy link

GaurangTandon commented Feb 23, 2018

Search results page

  • Entry number 8 has an error in two mathjax displays ("Missing close brace" and "Extra close brace or missing open brace" respectively).
  • this is a screenshot
    image
  • the original post on MSE is free from any mathjax error though
  • A similar problem exists for entry 7 as well
  • I could not come up with any possible reason for this

Broken Mathjax of entry 8 copy-pasted for reference:

...Put it this way \int {\frac{x}{{\sqrt { ... t {\frac{{2ax + b}}{{\sqrt {a{x^2} + bx + c} }}dx}  - \frac{b}{{2a}}\int {\frac{{dx}}{{\sqrt {a{x^2} + bx + c} }}} 
\displaystyle \frac{c}{a} - \frac{{{b^2}}}{{4{a^2}}} < 0 =  -  ... 2}. 
@w32zhong
Copy link
Member

w32zhong commented Mar 28, 2018

@GaurangTandon Thank you for reporting, will investigate later when I get some time.

@w32zhong
Copy link
Member

w32zhong commented May 3, 2018

@GaurangTandon Hi, the reason is actually quite simple, since the search result snippet is trying to summarize a document in a short paragraph, it has to skip some content and show as many highlighted words as possible. This leads to the problem you have seen: In all the cases that you find this problem, the content that is skipped (those will be replaced by a ... string) is in the middle of a LaTeX expression, and that is very likely to invalidate a LaTeX expression.

The current content skipping strategy is simple: Given a number of keywords in the document (within a threshold limit MAX_HIGHLIGHT_OCCURS), pad the left and right side of each keywords, those content that are not padded will be skipped, the keywords along with their "padding" will be displayed.
The related logic is here:

left = h->kw_end + h->pad_right;
right = next_h->kw_pos - next_h->pad_left;

One way to fix this issue is not skipping any LaTeX content, but some LaTeX content are very long and this strategy will make some snippet unacceptable long. So a more smart algorithm is needed to either include complete LaTeX clip or do not include any part of that clip if it is too long.

We can leave this issue open before a better skipping strategy algorithm is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants