Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pagefind to version 1.1.0 #1750

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

HiDeoo
Copy link
Member

@HiDeoo HiDeoo commented Apr 12, 2024

What kind of changes does this PR include?

  • Something else!

Description

This pull request is a draft to update the pagefind version to the recently released version 1.1.0.

The major changes include improvements to the result ranking algorithm to align with BM25 and the abitity to configure the ranking algorithm.

The PR does not include any change to the ranking configuration so far but I think we would end up with some slight changes but I want to play more with the new options. Nevertheless, I wanted to post some preliminary results.

Configuration

Of the new ranking algorithm configuration options, I think the term frequency may be one of the most important to play with.

  • The term frequency impacts the ranking balance between frequency of the term relative to document length, versus weighted term count.
  • The default value is 1.0 which means the term frequency is the main factor in the ranking.
  • A value of 0 means the term frequency is not considered in the ranking.
  • In between values interpolate between the two methods.

Basically, a higher value will tend to favor short pages while a lower value will do the opposite as longer pages are penalized for having a way lower term frequency.

Found out that the next version of git-scm.com is also planning to use 0 to not favor short pages (I think this led the initiative to add these options to pagefind).

Page results comparison

The following tables shows page ranking comparison between the current version, the new version with the default term frequency and the new version with a term frequency of 0.5 and 0.

Note that the "Sidebar Navigation" guide is often an outlier as the examples mimic the Starlight documentation structure which pollutes the results. This will be addressed in another PR.

Query: setup

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Manual Setup Plugins Reference Plugins Reference Plugins Reference
2 Plugins Reference Manual Setup Manual Setup Manual Setup
3 Getting Started Getting Started Getting Started Customizing Starlight
4 Customizing Starlight Customizing Starlight Customizing Starlight Getting Started

Query: installation

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Site Search Site Search Site Search Customizing Starlight
2 Customizing Starlight Customizing Starlight Customizing Starlight Authoring Content in Markdown
3 CSS & Styling Authoring Content in Markdown Authoring Content in Markdown Site Search
4 Authoring Content in Markdown CSS & Styling CSS & Styling CSS & Styling
5 Manual Setup Manual Setup Manual Setup Manual Setup

Query: page

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Pages Pages Pages Pages
2 Overriding Components Overrides Reference Overrides Reference Overrides Reference
3 Site Search Frontmatter Reference Frontmatter Reference Frontmatter Reference
4 Customizing Starlight Customizing Starlight Customizing Starlight Customizing Starlight
5 Eco-friendly docs Overriding Components Eco-friendly docs Configuration Reference

Query: markdown

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Authoring Content in Markdown Authoring Content in Markdown Authoring Content in Markdown Authoring Content in Markdown
2 Pages Overrides Reference Overrides Reference Components
3 Make your docs shine with Starlight Components Components Overrides Reference
4 Components Pages Pages Pages
5 Overrides Reference Make your docs shine with Starlight Make your docs shine with Starlight Manual Setup

Query: component

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Overriding Components Sidebar Navigation Sidebar Navigation Overrides Reference
2 Overrides Reference Pages Pages Components
3 Components Components Components Overriding Components
4 Eco-friendly docs Overrides Reference Overrides Reference Sidebar Navigation
5 Make your docs shine with Starlight Overriding Components Overriding Components Configuration Reference
5 Configuration Reference Eco-friendly docs Eco-friendly docs Eco-friendly docs
5 Sidebar navigation Make your docs shine with Starlight Make your docs shine with Starlight Pages

Query: CSS

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 CSS & Styling CSS & Styling CSS & Styling CSS & Styling
2 Customizing Starlight Customizing Starlight Customizing Starlight Customizing Starlight
3 Configuration Reference Sidebar Navigation Configuration Reference Configuration Reference
4 Sidebar Navigation Configuration Reference Sidebar Navigation Sidebar Navigation
5 Overriding Components Overriding Components Overriding Components Overriding Components

Query: language

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Internationalization (i18n) Overrides Reference Overrides Reference Internationalization (i18n)
2 Make your docs shine with Starlight Internationalization (i18n) Internationalization (i18n) Configuration Reference
3 Configuration Reference Configuration Reference Configuration Reference Overrides Reference
4 Overrides Reference Authoring Content in Markdown Authoring Content in Markdown Authoring Content in Markdown
5 Pages Pages Pages Sidebar Navigation

Query: sidebar

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Sidebar Navigation Sidebar Navigation Sidebar Navigation Sidebar Navigation
2 Overrides Reference Overrides Reference Overrides Reference Overrides Reference
3 Pages Pages Pages Frontmatter Reference
4 Frontmatter Reference Frontmatter Reference Frontmatter Reference Configuration Reference
5 Configuration Reference Configuration Reference Configuration Reference Pages

Query: lastUpdated

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Overrides Reference Overrides Navigation Overrides Navigation Overrides Navigation
2 Frontmatter Reference Frontmatter Reference Frontmatter Reference Frontmatter Reference
3 Configuration Reference Configuration Reference Configuration Reference Configuration Reference

Query: plugin

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Plugins and Integrations Plugins Reference Plugins Reference Plugins and Integrations
2 Plugins Reference Plugins and Integrations Plugins and Integrations Plugins Reference
3 Configuration Reference CSS & Styling CSS & Styling Configuration Reference
4 CSS & Styling Configuration Reference Configuration Reference CSS & Styling
4 Site Search Site Search Site Search Site Search

Copy link

changeset-bot bot commented Apr 12, 2024

⚠️ No Changeset found

Latest commit: 7b515e8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Apr 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
starlight ✅ Ready (Inspect) Visit Preview May 3, 2024 5:09pm

HiDeoo added 2 commits May 3, 2024 18:21
* main: (74 commits)
  Add type checking job to the CI workflow (withastro#1827)
  [ci] format
  i18n(pt-BR): Update `components.mdx` (withastro#1815)
  [ci] format
  i18n(ru): update translations (withastro#1825)
  i18n(pt-BR): Update `css-and-tailwind.mdx` (withastro#1817)
  i18n(es): updates `pages` (withastro#1823)
  i18n(es): update `i18n` (withastro#1822)
  i18n(es): updates `overrides` (withastro#1820)
  i18n(es): update `guides/components` and add `syncKey` to various pages (withastro#1818)
  [ci] format
  i18n(es): update `community-content` (withastro#1824)
  i18n(es): update `configuration` (withastro#1821)
  i18n(es): update `frontmatter` (withastro#1819)
  i18n(fr): update `guides/pages.mdx` (withastro#1800)
  i18n(fr): update `reference/overrides.md` (withastro#1803)
  i18n(fr): update `reference/frontmatter.md` (withastro#1802)
  i18n(fr): update `reference/configuration.mdx` (withastro#1801)
  i18n(fr): update `guides/i18n.mdx` (withastro#1799)
  i18n(fr): update `guides/components` and add `syncKey` to various pages (withastro#1797)
  ...
@HiDeoo
Copy link
Member Author

HiDeoo commented May 3, 2024

Following #1751 which rewrites guides/sidebar examples to be more generic and avoid polluting some search results (as mentioned in the initial post of this PR), here is another round of page results comparison for various queries with the current documentation content:

Query: setup

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Manual Setup Plugins Reference Plugins Reference Plugins Reference
2 Plugins Reference Manual Setup Manual Setup Manual Setup
3 Getting Started Getting Started Getting Started Customizing Starlight
4 Customizing Starlight Customizing Starlight Customizing Starlight Getting Started

Query: installation

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Site Search Site Search Site Search Customizing Starlight
2 Customizing Starlight Customizing Starlight Customizing Starlight Authoring Content in Markdown
3 CSS & Styling Authoring Content in Markdown Authoring Content in Markdown Site Search
4 Authoring Content in Markdown CSS & Styling CSS & Styling CSS & Styling
5 Manual Setup Manual Setup Manual Setup Manual Setup

Query: page

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Pages Overrides Reference Internationalization (i18n) Overrides Reference
2 Overriding Components Internationalization (i18n) Overrides Reference Frontmatter Reference
3 Site Search Site Search Site Search Internationalization (i18n)
4 Customizing Starlight Frontmatter Reference Frontmatter Reference Configuration Reference
5 Eco-friendly docs Configuration Reference Configuration Reference Pages

Query: markdown

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Authoring Content in Markdown Overrides Reference Authoring Content in Markdown Authoring Content in Markdown
2 Pages Authoring Content in Markdown Overrides Reference Components
3 Make your docs shine with Starlight Pages Components Overrides Reference
4 Components Components Pages Pages
5 Overrides Reference Make your docs shine with Starlight Make your docs shine with Starlight Manual Setup

Query: component

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Overriding Components Components Overrides Reference Overrides Reference
2 Overrides Reference Pages Components Components
3 Components Overrides Reference Overriding Components Overriding Components
4 Eco-friendly docs Overriding Components Eco-friendly docs Configuration Reference
5 Make your docs shine with Starlight Eco-friendly docs Pages Eco-friendly docs
5 Configuration Reference Make your docs shine with Starlight Make your docs shine with Starlight Pages
5 Sidebar navigation Configuration Reference Configuration Reference Make your docs shine with Starlight

Query: CSS

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 CSS & Styling CSS & Styling CSS & Styling CSS & Styling
2 Customizing Starlight Customizing Starlight Customizing Starlight Customizing Starlight
3 Configuration Reference Configuration Reference Configuration Reference Configuration Reference
4 Sidebar Navigation Overriding Components Overriding Components Overriding Components
5 Overriding Components Eco-friendly docs Eco-friendly docs Overrides Reference

Query: language

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Internationalization (i18n) Overrides Reference Overrides Reference Internationalization (i18n)
2 Make your docs shine with Starlight Internationalization (i18n) Internationalization (i18n) Configuration Reference
3 Configuration Reference Configuration Reference Configuration Reference Overrides Reference
4 Overrides Reference Authoring Content in Markdown Authoring Content in Markdown Authoring Content in Markdown
5 Pages Pages Pages Sidebar Navigation

Query: sidebar

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Sidebar Navigation Sidebar Navigation Sidebar Navigation Sidebar Navigation
2 Overrides Reference Overrides Reference Overrides Reference Overrides Reference
3 Pages Pages Pages Frontmatter Reference
4 Frontmatter Reference Frontmatter Reference Frontmatter Reference Configuration Reference
5 Configuration Reference Configuration Reference Configuration Reference Pages

Query: lastUpdated

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Overrides Reference Overrides Navigation Overrides Navigation Overrides Navigation
2 Frontmatter Reference Frontmatter Reference Frontmatter Reference Frontmatter Reference
3 Configuration Reference Configuration Reference Configuration Reference Configuration Reference

Query: plugin

Rank 1.0 1.1 1.1 with 0.5 term frequency 1.1 with 0 term frequency
1 Plugins and Integrations Plugins Reference Plugins Reference Plugins and Integrations
2 Plugins Reference Plugins and Integrations Plugins and Integrations Plugins Reference
3 Configuration Reference CSS & Styling CSS & Styling Configuration Reference
4 CSS & Styling Configuration Reference Configuration Reference CSS & Styling
4 Site Search Site Search Site Search Site Search

I think the effect of #1751 is clearly beneficial.

Regarding the pagefind update itself now, with the explanation in the initial post of this PR and the current result for queries like markdown, I think I'm leaning towards reducing the term frequency a bit. This PR has been updated to use a termFrequency of 0.5 instead of the default 1.0.

I'm still a bit unsure about some results like for the page query. The results make sense as-is, but I would have expected that reducing the termSaturation would have bumped the guides/pages page higher in the results but this not something I ever observed.

Would love to get some opinions on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌟 core Changes to Starlight’s main package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant