Skip to content

Commit

Permalink
Merge pull request #132 from codedex-io/exaai-pt
Browse files Browse the repository at this point in the history
Add Exa AI Project Tutorial
  • Loading branch information
Dusch4593 committed May 10, 2024
2 parents 6c87177 + 38da8dc commit fa80e3a
Show file tree
Hide file tree
Showing 8 changed files with 266 additions and 0 deletions.
@@ -0,0 +1,249 @@
---
title: Build a Custom Search Engine with Exa
author: The Curriculum Team
uid: 3L0xTnnjaJaT53AlwRyTWFR1OXT2
datePublished: 2024-05-10
description: Learn how to build a custom search engine using Exa's API with Large Language Models (LLMs) capabilities.
header: https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/header.gif
published: false
bannerImage: https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/header.gif
tags:
- advanced
- python
---

<BannerImage link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/header.gif" description="Title Image" uid={true} cl="for-sidebar"/>

# Build a Custom Search Engine with Exa

<AuthorAvatar author_name="Curriculum Team" author_avatar="/images/projects/authors/codedex_logo.png" uid={true}/>

<BannerImage link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/header.gif" description="Title Image" uid={true}/>

**Prerequisites:** Python, Command Line
**Versions:** Python 3.10
**Read Time:** 45 minutes

## Introduction

Have you ever read something amazing, then completely blanked out on where you read it or who wrote it?

Imagine there was a search engine that could re-discover exactly what you're thinking about. It's now possible with a specific kind of machine learning called **natural language processing (NLP)**. It teaches computers to understand, interpret, and generate human language.

In this tutorial, we'll build a custom search engine using an API with LLM capabilities! 🚀

**LLMs** (short for large language models) are a powerful tool within the broader field of NLP. These models are trained on insane amounts of data and can grasp the intricacies of human language. 🤯 A popular LLM that you might already be using is GPT-3.

<RoundedImage link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/token.gif" description="Tokenization Example" />

LLMs leverage deep learning techniques to generate text, translate languages, answer questions, and so much more. The model estimates the probability of a token or a sequence of tokens in a sequence. A token refers to a unit of language extracted from a larger piece of text.

However, accessing LLMs is a little tricky since training models is both time-consuming and expensive. APIs solve this. APIs serve as a tool to access NLP tasks, such as text generation, translation, summarization, and more, which we'll explore today! Let's learn how we can even build our very own custom search engine! 👀

### Exa API

[Exa](https://exa.ai) (formerly "Metaphor") is an API (application programming interface) that retrieves the best content on the web. With Exa, anyone can semantically search the web to get high-quality, relevant information. With Exa's technology, we can rediscover the content on the internet. Unlike Google, which relies on keyword search (matching the exact words of the query to the web content), Exa can understand both the user's input and the content out there. Wow, right?

Exa utilizes LLMs as a core component, and has been extensively trained to help computers understand human language.

In this tutorial, we will use Exa's capabilities to build a basic search engine that can help you find exactly what you are searching for.

## Set Up

First, we need to [create an Exa account](https://dashboard.exa.ai/login) to access an API key for building out the "search" functionality of our search engine! You'll automatically get 1000 free requests just for signing up. After creating an account, navigate to "Overview" to retrieve your API key.

<RoundedImage link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/ExaDashboard.JPG" description="Exa Dashboard" />

**Note:** Keep your API keys private and safe and avoid sharing and posting them online. 🔒

For this project, we'll need [Python 3](https://www.python.org/downloads/) and [pip](https://pip.pypa.io/en/stable/) (package installer) installed.

Assuming that we have those two installed, let's open up our preferred code editor (we recommend VS Code) and create a new file called **main.py**.

### Packages and Imports

With Python installed, let's go ahead and use pip to download Exa:

```py
pip install exa_py
```

**Note:** If this command doesn't work, try `pip3 install exa_py`.

In our **main.py** file, we're going to import Exa, and initialize Exa with our API key!

```py
from exa_py import Exa

exa = Exa('YOUR_KEY_HERE')
```

## Getting Started with Searching! ⚡

In the same file, let's create a variable called `query`, which will hold the response to the input of what we want to search!

```py
from exa_py import Exa

exa = Exa('YOUR_KEY_HERE')

query = input('Search here: ')
```

Now, let's take a look at Exa's `.search()` function:

```py
from exa_py import Exa

exa = Exa('YOUR_KEY_HERE')

query = input('Search here: ')

response = exa.search(
'Best Chicago cold brew',
num_results=10,
)
```

Here's one result (of 10) from the returned data for searching "Best Chicago cold brew":

```terminal
{
"score": 0.1940334290266037,
"title": "13 Innovative Cold Brew Drinks from Coffee Shops Around the Country",
"id": "https://www.thrillist.com/drink/nation/best-cold-brew-coffee-drinks",
"url": "https://www.thrillist.com/drink/nation/best-cold-brew-coffee-drinks",
"publishedDate": "2017-07-10",
"author": "Dan Gentile"
},
```

For each search result, we get:

- `score`: The score associated indicates its relevance within the query system; higher scores mean greater relevance.
- `title`: This is the name of the article/blog/website.
- `id` & `url`: The link to the result's web page.
- `publishedDate`: This lists the publish date.
- `author`: This credits a writer. This will return "None" if there is no credited writer.

To customize our Exa results further, we can play around with Exa's filters to find the exact type of content we need. You can explore the filters here [Exa's API Search documentation](https://docs.exa.ai/reference/search). Here are some of our favorite filters:

- `numResults`: Number of search results to return.
- `includeDomains`: List of domains to include in the search.
- `category`: A data category to focus on, with higher comprehensivity and data cleanliness. Categories right now include company, news, pdf, papers, tweet, GitHub, movies, songs, and personal sites.

The search example below will return 10 results for "best pizza in Brooklyn" searching through Twitter since May 2023:

```py
response = exa.search(
'best pizza in Brooklyn',
num_results=10,
start_published_date='2023-05-01',
category='tweet',
use_autoprompt= True,
)
```

## Search Coffee on TikTok

This is where the fun customization begins! ✨💖

For this tutorial, you'll learn how to search TikTok to obtain top coffee drink accounts!

In our case, we will want only the top 5 search results, and only from TikTok [https://www.tiktok.com](https://www.tiktok.com/). Additionally, we're going to treat our search function like a keyword, so we need to declare that as well!

Let's go ahead and use the Exa parameters to accomplish this. To display this, let's also add a print statement at the end of our file to search!

```py
from exa_py import Exa

exa = Exa('YOUR_KEY_HERE')

query = input('Search here: ')

response = exa.search(
query,
num_results=5,
type='keyword',
include_domains=['https://www.tiktok.com/'],
)
print(response)
```

Run the script by going to your terminal and typing:

```terminal
python3 main.py
```

After pressing <kbd>Enter</kbd>, you should now see the following:

<RoundedImage link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/Coffee_Exa_Search.gif" description="Input Search typing the word coffee" />

You might notice some information on each result like score, author, and text. Depending on your search query, you may or may not want these to be displayed every time you write a query.

Let's format our code so that for our React documentation search engine, we only get the **Title** and the **URL**. Delete the print statement, and add the following to **main.py**.
```py
for result in response.results:
print(f'Title: {result.title}')
print(f'URL: {result.url}')
print()
```

Now, your search results should look something like this!

<RoundedImage link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/Coffee_Search_Results.png" description="Query Results including Title and URL" />

A lot easier on the eyes, right? 🤩💫

At this point, your **main.py** file should look like this:

```py
from exa_py import Exa

exa = Exa('YOUR_KEY_HERE')

query = input('Search here: ')

response = exa.search(
query,
num_results=5,
type='keyword',
include_domains=['https://www.tiktok.com'],
)

for result in response.results:
print(f'Title: {result.title}')
print(f'URL: {result.url}')
print()
```
## Mission Accomplished

Congrats, you just created a custom search engine!✨

<RoundedImage link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/build-a-custom-search-engine-with-exaai/neural_search.gif" description="video summary of searching on Exa and TikTok" />

Here are some additional ideas on what you could search with the Exa API!

- Wikipedia links
- Coding documentation
- Academic papers
- Tweets
- Research papers
- News and media

## More Resources

Thanks for following along, here are some helpful links!

- [Source Code](https://github.com/codedex-io/projects/blob/main/projects/build-a-custom-search-engine-with-exaai/main.py)
- [Exa.io Documentation](https://docs.exa.ai/reference/getting-started)
- [Exa.io Cheat Sheet](https://docs.exa.ai/reference/cheat-sheet)
- [Intro to LLM](https://developers.google.com/machine-learning/resources/intro-llms)
- [Codedex Discord](https://discord.com/invite/HCShtBqbfV)
- [Exa's Discord](https://discord.com/invite/HCShtBqbfV)

We would love to see what you build with this tutorial! Tag [@codedex_io](https://twitter.com/codedex_io) and [@ExaAILabs](https://twitter.com/ExaAILabs) on Twitter if you make something cool! ⚡



Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 17 additions & 0 deletions projects/build-a-custom-search-engine-with-exaai/main.py
@@ -0,0 +1,17 @@
from exa_py import Exa

exa = Exa('YOUR_KEY_HERE')

query = input('Search here: ')

response = exa.search(
query,
num_results=5,
type='keyword',
include_domains=['https://www.tiktok.com'],
)

for result in response.results:
print(f'Title: {result.title}')
print(f'URL: {result.url}')
print()
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit fa80e3a

Please sign in to comment.