Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README Fixing #3481

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 7 additions & 7 deletions backend/README.md
Expand Up @@ -11,7 +11,7 @@ In root directory, run
a database. The default settings are already configured to connect to the
database at `localhost:5432`. (See
[FAQ](https://projects.laion.ai/Open-Assistant/docs/faq#enable-dockers-buildkit-backend)
if you face any docker problems).
if you face any Docker problems).

> **Note:** when running on MacOS with an M1 chip you have to use:
> `DB_PLATFORM=linux/x86_64 docker compose ...`
Expand All @@ -21,7 +21,7 @@ the `.python-version` in the project root directory.

### Python Packages

Next, to install all requirements, You can run
Next, to install all requirements, you can run:

1. `pip install -r backend/requirements.txt`
2. `pip install -e ./oasst-shared/.`
Expand Down Expand Up @@ -58,7 +58,7 @@ information.
Once you have successfully started the backend server, you can access the
default api docs at `localhost:8080/docs`. If you need to update the exported
openapi.json in the docs/ folder you can run below command to `wget` them from
the relevant local fastapi endpoint. This will enable anyone to just see API
the relevant local FastAPI endpoint. This will enable anyone to just see API
docs via something like
[Swagger.io](https://editor.swagger.io/?url=https://raw.githubusercontent.com/LAION-AI/Open-Assistant/main/docs/docs/api/openapi.json)
without having to actually set up and run a development backend.
Expand All @@ -68,16 +68,16 @@ without having to actually set up and run a development backend.
wget localhost:8080/api/v1/openapi.json -O docs/docs/api/backend-openapi.json
```

Note: The api docs should be automatically updated by the
Note: The API docs should be automatically updated by the
`test-api-contract.yaml` workflow. (TODO)

## Running Celery Worker(s) for API and periodic tasks

Celery workers are used for Huggingface API calls like toxicity and feature
Celery workers are used for HuggingFace API calls like toxicity and feature
extraction. Celery Beat along with worker is used for periodic tasks like user
streak update

To run APIs locally
To run APIs locally:

- update HUGGING_FACE_API_KEY in backend/oasst_backend/config.py with the
correct API_KEY
Expand All @@ -87,7 +87,7 @@ To run APIs locally
- run start_worker.sh in backend dir
- to see logs , use `tail -f celery.log` and `tail -f celery.beat.log`

In CI
In CI:

- set `DEBUG_SKIP_TOXICITY_CALCULATION=False` and
`DEBUG_SKIP_EMBEDDING_COMPUTATION=False` in docker-compose.yaml
Expand Down
2 changes: 1 addition & 1 deletion copilot/README.md
Expand Up @@ -25,7 +25,7 @@ Replace with a proper domain to setup SSL certificates.
copilot env deploy
```

This will create a variety of aws roles and services needed for deployment.
This will create a variety of AWS roles and services needed for deployment.

```sh
copilot deploy
Expand Down
Expand Up @@ -3,7 +3,7 @@ from an annotated version of the code-search-net dataset. The annotated version
of code-search-net dataset can be found
[here](https://huggingface.co/datasets/Nan-Do/code-search-net-python).

The dataset contains around 450000 python annotated functions. The dataset is
The dataset contains around 450000 Python annotated functions. The dataset is
split into two blocks, one in which the task is starting from the annotated
summary to generate an instruction to generate the code as a response, and
another one in which the expected response is to generate a description of the
Expand Down
8 changes: 4 additions & 4 deletions data/datasets/poetry_instruction/README.md
Expand Up @@ -10,16 +10,16 @@ Languages English

Dataset Structure This dataset follows the OA format, which is:

INSTRUCTION (string): The user asks for a poem (from a variety of premade
- NSTRUCTION (string): The user asks for a poem (from a variety of premade
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be INSTRUCTION

prompts) with topics (tags). If the given poem has no tags, the user asks for a
poem on it's own.

RESPONSE (string): The assistant replies with the poem and title (from a variety
- RESPONSE (string): The assistant replies with the poem and title (from a variety
of premade prompts).

SOURCE (string): The source is PoetryFoundation.org and the poet's name.
- SOURCE (string): The source is PoetryFoundation.org and the poet's name.

METADATA (JSON String): {"author": "author of the original poem", "title":
- METADATA (JSON String): {"author": "author of the original poem", "title":
andrewm4894 marked this conversation as resolved.
Show resolved Hide resolved
"title of the poem", "tags": "tags from poetry foundation."}

Preparing the Dataset The dataset can be created with prepare.py. Make sure to
Expand Down
2 changes: 1 addition & 1 deletion data/datasets/prosocial_confessions/README.md
Expand Up @@ -6,7 +6,7 @@
- A [classifier](https://huggingface.co/shahules786/prosocial-classifier)
trained on prosocial dialog dataset is used for pseudo labeling.
- More information on dataset can be found
[here](https://huggingface.co/datasets/shahules786/prosocial-confessions)
[here](https://huggingface.co/datasets/shahules786/prosocial-confessions).

## Example

Expand Down
2 changes: 1 addition & 1 deletion data/datasets/reasoning_gsm_qna_oa/README.MD
Expand Up @@ -5,7 +5,7 @@
License: MIT. Contains Parquet of a list of instructions and answers (English
only). Reasoning, logic and programming.

Each row consists of
Each row consists of:

- INSTRUCTION
- RESPONSE
Expand Down
4 changes: 2 additions & 2 deletions data/datasets/recipes/README.md
Expand Up @@ -2,7 +2,7 @@

Here we convert several existing recipe ingredient and instructions datasets
into dialogue. Each notebook processes a different dataset and creates a final
dataset to be uploaded to huggingface.
dataset to be uploaded to HuggingFace.

## tasty_recipes.ipynb

Expand All @@ -14,7 +14,7 @@ dialogue using a preset list of user prompt templates.
### Some ideas for extending this dataset

This dataset is nicely structured, and the ingredients section includes the
quantities and units separated out. Some, but not all already include a
quantities and units separated out. Somehow, but not all already include a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct - "Some" is right

primary_unit (US) and metric_unit. We could find all recipes with both units and
generate dialogue for the prompt 'convert the ingredients into metric', 'what
are the ingredients in UK measurements'? etc..
2 changes: 1 addition & 1 deletion docker/grafana/README.md
Expand Up @@ -9,6 +9,6 @@ This folder contains various configuration files for Grafana.
Grafana where some pre-configured dashboards live.
- [`./dashboards/fastapi-backend.json`](./dashboards/fastapi-backend.json) - A
json representation of a saved Grafana dashboard focusing on some high level
api endpoint metrics etc.
API endpoint metrics etc.
- [`./datasources/datasource.yml`](./datasources/datasource.yml) - A config file
to set up Grafana to read from the local Prometheus source.
4 changes: 2 additions & 2 deletions inference/README.md
Expand Up @@ -75,8 +75,8 @@ Navigate to http://0.0.0.0:8089/ to view the locust UI.

## API Docs

To update the api docs, once the inference server is running run below command
to download the inference openapi json into the relevant folder under `/docs`:
To update the API docs, once the inference server is running run below command
to download the inference OpenAPI json into the relevant folder under `/docs`:

```bash
wget localhost:8000/openapi.json -O docs/docs/api/inference-openapi.json
Expand Down
8 changes: 4 additions & 4 deletions model/README.md
Expand Up @@ -83,7 +83,7 @@ To change the model used, i.e. larger pythia version create a new config in
`EleutherAI/pythia-{size}-deduped`. Larger models will probably need to also
adjust the `--learning_rate` and `--per_device_train_batch_size` flags.

4. Get SFT trained model
4. Get SFT trained model.

```bash
# choose a specific checkpoint
Expand All @@ -95,14 +95,14 @@ export SFT_MODEL=$MODEL_PATH/sft_model/$(ls -t $MODEL_PATH/sft_model/ | head -n

### RM Training

5. Train the reward model
5. Train the reward model.

```bash
cd ../reward/instructor
python trainer.py configs/deberta-v3-base.yml --output_dir $MODEL_PATH/reward_model
```

6. Get RM trained model
6. Get RM trained model.

```bash
# choose a specific checkpoint
Expand All @@ -114,7 +114,7 @@ export REWARD_MODEL=$MODEL_PATH/reward_model/$(ls -t $MODEL_PATH/reward_model/ |

### RL Training

7. Train the RL agent
7. Train the RL agent.

```bash
cd ../../model_training
Expand Down
2 changes: 1 addition & 1 deletion notebooks/closed-book-qa/README.md
@@ -1,6 +1,6 @@
# Generate Topics, Questions, and Answers from a paragraph of text

This python code can be used to generate topics, questions, and answers from a
This Python code can be used to generate topics, questions, and answers from a
paragraph of text. This is a good way to generate ground truth knowledge about a
topic from a trusted source.

Expand Down
4 changes: 2 additions & 2 deletions oasst-data/README.md
Expand Up @@ -8,7 +8,7 @@ If you got the exception `ModuleNotFoundError: No module named 'oasst_data'` you
first need to install the `oasst_data` package:

Run `pip install -e .` in the `oasst-data/` directory of the Open-Assistant
repository to install the `oasst_data` python package in editable mode.
repository to install the `oasst_data` Python package in editable mode.

## Reading Open-Assistant Export Files

Expand Down Expand Up @@ -41,7 +41,7 @@ which is used to load Open-Assistant export data for supervised fine-tuning
(training) of our language models.

You can also load jsonl data completely without dependencies to `oasst_data`
solely with standard python libraries. In this case the json objects are loaded
solely with standard Python libraries. In this case the json objects are loaded
as nested dicts which need to be 'parsed' manually by you:

```python
Expand Down
2 changes: 1 addition & 1 deletion scripts/data-collection/twitter/README.md
Expand Up @@ -75,6 +75,6 @@ conversation, or at least as a prompt with replies.
- Write script that matches the original tweets and their text with the archive
data to create the prompt/reply dataset. (Optional)
- Decide on final output format and storage options for the dataset. Currently
in JSONL with tree / node architecture as python dicts which is acceptable I
in JSONL with tree / node architecture as Python dicts which is acceptable I
believe.
- Alternatively: Store processed tweets into DB or alternative option.(Optional)
8 changes: 4 additions & 4 deletions website/README.md
Expand Up @@ -36,7 +36,7 @@ To contribute to the website, make sure you have the following setup and install

1. Node 16: if you are on windows, you can [download node from their website](https://nodejs.org/en/download/releases),
if you are on linux, use [NVM](https://github.com/nvm-sh/nvm) (Once installed, run `nvm use 16`)
1. [Docker](https://www.docker.com/): We use docker to simplify running dependent services.
1. [Docker](https://www.docker.com/): We use Docker to simplify running dependent services.

### Getting everything up and running

Expand All @@ -48,11 +48,11 @@ If you're doing active development we suggest the following workflow:
- If you want to work on the chat api, you need to run the inference profile as well. Your new command would look
like: `docker compose --profile frontend-dev --profile inference up --build --attach-dependencies`
- See [FAQ](https://projects.laion.ai/Open-Assistant/docs/faq#enable-dockers-buildkit-backend) if you face any
docker problems.
Docker problems.
- Leave this running in the background and continue:
1. Open another terminal tab, navigate to `${OPEN_ASSISTANT_ROOT/website`.
1. Run `npm ci`
1. Run `npx prisma db push` (This is also needed when you restart the docker stack from scratch).
1. Run `npx prisma db push` (This is also needed when you restart the Docker stack from scratch).
1. Run `npm run dev`. Now the website is up and running locally at `http://localhost:3000`.
1. To create an account, login via the user using email authentication and navigate to `http://localhost:1080`. Check
the email listed and click the log in link. You're now logged in and authenticated.
Expand All @@ -63,7 +63,7 @@ If you're doing active development we suggest the following workflow:
You can use the debug credentials provider to log in without fancy emails or OAuth.

1. This feature is automatically on in development mode, i.e. when you run `npm run dev`. In case you want to do the
same with a production build (for example, the docker image), then run the website with environment variable
same with a production build (for example, the Docker image), then run the website with environment variable
`DEBUG_LOGIN=true`.
1. Use the `Login` button in the top right to go to the login page.
1. You should see a section for debug credentials. Enter any username you wish, you will be logged in as that user.
Expand Down