You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of MLRun CE.
Reproducible Example
importoracledbimportpandasaspdusername="C##xxx"password="xxx"host="192.168.1.6"port="1521"service_name="xe"conn=oracledb.connect(user=username, password=password, host=host, port=port, service_name=service_name)
# Create a cursorcursor=conn.cursor()
# Define the SQL statement for selecting dataselect_sql="SELECT * FROM NPRA_IMPORT"# Execute the SQL statement to fetch datacursor.execute(select_sql)
# Fetch all rows of datarows=cursor.fetchall()
# Define the SQL statement to fetch column names for a tabletable_name='NPRA_IMPORT'select_sql2=f"SELECT column_name FROM all_tab_columns WHERE table_name = '{table_name}'"# Execute the SQL statement to fetch datacursor.execute(select_sql2)
# Fetch all rows of datacolumns=cursor.fetchall()
# Define column namescolumn_names= [column[0].lower() forcolumnincolumns]
# Create a DataFrame from the fetched rows with column namesdf=pd.DataFrame(rows, columns=column_names)
%%writefiledata-prep-oracle-db.pyimportpandasaspdfromsklearn.preprocessingimportLabelEncoderfromsklearn.feature_selectionimportSelectKBest, f_classifimportoracledbimportmlrundefget_oracle_data():
# Connection detailsusername="C##xxx"password="xxx"host="192.168.1.6"port="1521"service_name="xe"conn=oracledb.connect(user=username, password=password, host=host, port=port, service_name=service_name)
# Create a cursorcursor=conn.cursor()
# Define the SQL statement for selecting dataselect_sql="SELECT * FROM NPRA_IMPORT"# Execute the SQL statement to fetch datacursor.execute(select_sql)
# Fetch all rows of datarows=cursor.fetchall()
# Define the SQL statement to fetch column names for a tabletable_name='NPRA_IMPORT'select_sql2=f"SELECT column_name FROM all_tab_columns WHERE table_name = '{table_name}'"# Execute the SQL statement to fetch datacursor.execute(select_sql2)
# Fetch all rows of datacolumns=cursor.fetchall()
# Define column namescolumn_names= [column[0].lower() forcolumnincolumns]
# Create a DataFrame from the fetched rows with column namesdf=pd.DataFrame(rows, columns=column_names)
returndfdefload_data(df):
df.drop("customer_id", axis=1, inplace=True)
df.drop_duplicates(inplace=True)
df.dropna(axis=0, inplace=True)
returndfdefremove_outliers_iqr(df, col):
Q1=df[col].quantile(0.25)
Q3=df[col].quantile(0.75)
IQR=Q3-Q1lower_bound=Q1-1.5*IQRupper_bound=Q3+1.5*IQRdf=df[(df[col] >=lower_bound)]
df=df[(df[col] <=upper_bound)]
returndfdeflabel_encode_categorical_columns(df):
le=LabelEncoder()
forcolindf.select_dtypes(include='object'):
df[col] =le.fit_transform(df[col])
returndfdefpreprocess_data(data):
can_have_outlier= ["person_age","person_income","person_emp_length","cb_person_cred_hist_length","loan_amnt"]
forcolincan_have_outlier:
data=remove_outliers_iqr(data, col)
preprocessed_data=label_encode_categorical_columns(data)
returnpreprocessed_datadefdata_balance(new_df):
# Divide by classdf_class_0=new_df[new_df['loan_status'] ==0]
df_class_1=new_df[new_df['loan_status'] ==1]
count_class_0, count_class_1=new_df['loan_status'].value_counts()
# random over samplingdf_class_1_over=df_class_1.sample(count_class_0, replace=True)
new_df=pd.concat([df_class_0, df_class_1_over], axis=0)
returnnew_df@mlrun.handler(outputs=["dataset", "label_column"])defcredit_risk_dataset_generator():
""" A function which generates the credit risk dataset """dataset=get_oracle_data()
data=load_data(dataset)
preprocessed_data=preprocess_data(data)
for_model_df=data_balance(preprocessed_data)
returnfor_model_df, "loan_status"
Issue Description
I am trying to fetch data from Oracle database and trying to fed it to mlrun architecture. I'm able to access database locally, but getting this error when trying to ingest it in pipeline.
Expected Behavior
I should be able to ingest data into pipeline from oracle database.
Installation OS
Windows
Installation Method
Kubernetes
Python Version
3.9.13
MLRun Version
Mlrun CE 0.6.2
Additional Information
I'm Using docker Desktop to work with Kubernetes Clusture.
Docker Version- v4.9.1
Kubectl version - v1.24
Helm version - v3.11.3
Dataset using - "credit risk dataset"
The text was updated successfully, but these errors were encountered:
MLRun Version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of MLRun CE.
Reproducible Example
Issue Description
I am trying to fetch data from Oracle database and trying to fed it to mlrun architecture. I'm able to access database locally, but getting this error when trying to ingest it in pipeline.
Expected Behavior
I should be able to ingest data into pipeline from oracle database.
Installation OS
Windows
Installation Method
Kubernetes
Python Version
3.9.13
MLRun Version
Mlrun CE 0.6.2
Additional Information
I'm Using docker Desktop to work with Kubernetes Clusture.
Docker Version- v4.9.1
Kubectl version - v1.24
Helm version - v3.11.3
Dataset using - "credit risk dataset"
The text was updated successfully, but these errors were encountered: