Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: rtf_checkpoints/not-best-disc-model does not appear to have a file named config.json. #66

Open
AhmadKajjan-QU opened this issue Mar 26, 2024 · 2 comments

Comments

@AhmadKajjan-QU
Copy link

AhmadKajjan-QU commented Mar 26, 2024

# pip install realtabformer
import pandas as pd
from realtabformer import REaLTabFormer

df = pd.read_csv("./data/02-14-2018 - test.csv")

# NOTE: Remove any unique identifiers in the
# data that you don't want to be modeled.

# Non-relational or parent table.
rtf_model = REaLTabFormer(
    model_type="tabular",
    epochs=1,
    gradient_accumulation_steps=4,
    logging_steps=100)

print("fitting the model")
# Fit the model on the dataset.
# Additional parameters can be
# passed to the `.fit` method.
rtf_model.fit(df, num_bootstrap=1)

print("generating synthetic data")
# Generate synthetic data with the same
# number of observations as the real dataset.
samples = rtf_model.sample(n_samples=len(df))

print("saving synthetic data")
# Save the generated synthetic data to a CSV file
samples.to_csv("synthetic_data.csv", index=False)

print("saving the model")
# Save the model to the current directory.
# A new directory `rtf_model/` will be created.
# In it, a directory with the model's
# experiment id `idXXXX` will also be created
# where the artefacts of the model will be stored.
rtf_model.save("rtf_model/")

print("loading the model")
# Load the saved model. The directory to the
# experiment must be provided.
rtf_model2 = REaLTabFormer.load_from_dir(
    path="rtf_model/IDX")

the error I got
PS C:\Users\Qatar University\Desktop\Akef> python .\start.py
fitting the model
ate (0.757) in the data. This will not give a reliable early stopping condition. Consider using qt_max="compute" argument.an the duplicate
warnings.warn(
Computing the sensitivity threshold...
C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\realtabformer\realtabformer.py:597: UserWarning: qt_interval adjusted from 100 to 16...
warnings.warn(
Using parallel computation!!!
Bootstrap round: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Sensitivity threshold summary:
count 1.000000
mean 0.433998
std NaN
min 0.433998
25% 0.433998
50% 0.433998
75% 0.433998
max 0.433998
dtype: float64
Sensitivity threshold: 0.43399772209567195 qt_max: 0.05
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:10<00:00, 194.27 examples/s]
C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\accelerator.py:432: FutureWarning: Passing the following arguments to Accelerator is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an accelerate.DataLoaderConfiguration instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
warnings.warn(
{'train_runtime': 14.4486, 'train_samples_per_second': 138.422, 'train_steps_per_second': 4.291, 'train_loss': 1.2673711469096522, 'epoch': 0.99}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 62/62 [00:14<00:00, 4.29it/s]
1024it [08:01, 2.13it/s]
Generated 0 invalid samples out of total 1024 samples generated. Sampling efficiency is: 100.0000%
Critic round: 5, sensitivity_threshold: 0.43399772209567195, val_sensitivity: -0.020717878427690663, val_sensitivities: [-0.01820568252007412, -0.022510889856876166, -0.022523219814241484, -0.022510889856876166, -0.02251552795031056, -0.020073891625615764, -0.022527812113720645, -0.022539975399753998, -0.01519607843137255, -0.021314496314496313, -0.016960420531849103, -0.022512437810945272, -0.016347342398022248, -0.022513983840894966, -0.02251552795031056]
C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\realtabformer\realtabformer.py:834: UserWarning: No best model was saved. Loading the closest model to the sensitivity_threshold.
warnings.warn(
Traceback (most recent call last):
File "C:\Users\Qatar University\Desktop\Akef\start.py", line 21, in
rtf_model.fit(df, num_bootstrap=1)
File "C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\realtabformer\realtabformer.py", line 458, in fit
trainer = self._train_with_sensitivity(
File "C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\realtabformer\realtabformer.py", line 839, in _train_with_sensitivity
self.model = self.model.from_pretrained(loaded_model_path.as_posix())
File "C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 3006, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\configuration_utils.py", line 602, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\configuration_utils.py", line 631, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\configuration_utils.py", line 686, in _get_config_dict
resolved_config_file = cached_file(
File "C:\Users\Qatar University\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\utils\hub.py", line 369, in cached_file
raise EnvironmentError(
OSError: rtf_checkpoints/not-best-disc-model does not appear to have a file named config.json. Checkout 'https://huggingface.co/rtf_checkpoints/not-best-disc-model/tree/main' for available files.
PS C:\Users\Qatar University\Desktop\Akef>

some rows of my data
Pkt Len Std,Flow Byts/s,Flow Pkts/s,Flow IAT Mean,Flow IAT Std,Flow IAT Max,Flow IAT Min,Fwd IAT Tot,Fwd IAT Mean,Fwd IAT Std,Fwd IAT Max,Fwd IAT Min,Bwd IAT Tot,Bwd IAT Mean,Bwd IAT Std,Bwd IAT Max,Bwd IAT Min,Fwd PSH Flags,Bwd PSH Flags,Fwd URG Flags,Bwd URG Flags,Fwd Header Len,Bwd Header Len,Fwd Pkts/s,Bwd Pkts/s,Pkt Len Min,Pkt Len Max,Pkt Len Mean,Pkt Len Std,Pkt Len Var,FIN Flag Cnt,SYN Flag Cnt,RST Flag Cnt,PSH Flag Cnt,ACK Flag Cnt,URG Flag Cnt,CWE Flag Count,ECE Flag Cnt,Down/Up Ratio,Pkt Size Avg,Fwd Seg Size Avg,Bwd Seg Size Avg,Fwd Byts/b Avg,Fwd Pkts/b Avg,Fwd Blk Rate Avg,Bwd Byts/b Avg,Bwd Pkts/b Avg,Bwd Blk Rate Avg,Subflow Fwd Pkts,Subflow Fwd Byts,Subflow Bwd Pkts,Subflow Bwd Byts,Init Fwd Win Byts,Init Bwd Win Byts,Fwd Act Data Pkts,Fwd Seg Size Min,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min,Label
0,0,14/02/2018 08:31:01,112641719,3,0,0,0,0,0,0,0,0,0,0,0,0,0.026633116,56320859.5,139.3000359,56320958,56320761,112641719,56320859.5,139.3000359,56320958,56320761,0,0,0,0,0,0,0,0,0,0,0,0.026633116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,-1,-1,0,0,0,0,0,0,56320859.5,139.3000359,56320958,56320761,Benign
0,0,14/02/2018 08:33:50,112641466,3,0,0,0,0,0,0,0,0,0,0,0,0,0.026633176,56320733,114.5512986,56320814,56320652,112641466,56320733,114.5512986,56320814,56320652,0,0,0,0,0,0,0,0,0,0,0,0.026633176,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,-1,-1,0,0,0,0,0,0,56320733,114.5512986,56320814,56320652,Benign
0,0,14/02/2018 08:36:39,112638623,3,0,0,0,0,0,0,0,0,0,0,0,0,0.026633848,56319311.5,301.9345956,56319525,56319098,112638623,56319311.5,301.9345956,56319525,56319098,0,0,0,0,0,0,0,0,0,0,0,0.026633848,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,-1,-1,0,0,0,0,0,0,56319311.5,301.9345956,56319525,56319098,Benign

@limhasic
Copy link

me too shit

@limhasic
Copy link

Dear my Ottoman friend,

i found something, its work on based before

!pip install transformers==4.24.0

cuda - pytorch env is very important to RUN

Before not matching enviroment -> we encounter error like

  1. RuntimeError: NCCL Error 2: unhandled system error (run with NCCL_DEBUG=INFO for details)
  2. old driver

but not finished,

we encounter error : out of memory

I thought I was running out of memory, so I changed the environment 3 times and in the end I used it almost like a super computer, but it said I was out of memory. Oh, fuck, the computer is great, why is that happening?

The answer was in

" output_max_length=None, "

I just accepted unlimited tokens, so 80GB or whatever was blown. Plus, I had to do it separately for multi-GPU settings, so I went crazy.
Is this a reasonably even number, like 1024 or 2048? This seems to be part of the rule regarding tokens, but anyway, it works because I set it to 1024 on the A100 80GB.

So now I'm going to test this with the HMA model and generate AIRBNB data.

have a nice day friend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants