Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Pipeline run status Failed: "InvalidHeader: Invalid leading whitespace, reserved character(s)" #4369

Open
2 tasks done
harishgawade1999 opened this issue Sep 28, 2023 · 2 comments

Comments

@harishgawade1999
Copy link

harishgawade1999 commented Sep 28, 2023

MLRun Version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of MLRun CE.

Reproducible Example

import oracledb
import pandas as pd

username = "C##xxx"
password = "xxx"
host = "192.168.1.6"
port = "1521"
service_name = "xe"
    
conn = oracledb.connect(user=username, password=password, host=host, port=port, service_name=service_name)
    
# Create a cursor
cursor = conn.cursor()

# Define the SQL statement for selecting data
select_sql = "SELECT * FROM NPRA_IMPORT"

# Execute the SQL statement to fetch data
cursor.execute(select_sql)

# Fetch all rows of data
rows = cursor.fetchall()
    
# Define the SQL statement to fetch column names for a table
table_name = 'NPRA_IMPORT'
select_sql2 = f"SELECT column_name FROM all_tab_columns WHERE table_name = '{table_name}'"

# Execute the SQL statement to fetch data
cursor.execute(select_sql2)

# Fetch all rows of data
columns = cursor.fetchall()
        
# Define column names
column_names = [column[0].lower() for column in columns]

# Create a DataFrame from the fetched rows with column names
df = pd.DataFrame(rows, columns=column_names)



%%writefile data-prep-oracle-db.py

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import SelectKBest, f_classif
import oracledb

import mlrun

def get_oracle_data():
    # Connection details
    username = "C##xxx"
    password = "xxx"
    host = "192.168.1.6"
    port = "1521"
    service_name = "xe"
    
    conn = oracledb.connect(user=username, password=password, host=host, port=port, service_name=service_name)
    
    # Create a cursor
    cursor = conn.cursor()

    # Define the SQL statement for selecting data
    select_sql = "SELECT * FROM NPRA_IMPORT"

    # Execute the SQL statement to fetch data
    cursor.execute(select_sql)

    # Fetch all rows of data
    rows = cursor.fetchall()
    
    # Define the SQL statement to fetch column names for a table
    table_name = 'NPRA_IMPORT'
    select_sql2 = f"SELECT column_name FROM all_tab_columns WHERE table_name = '{table_name}'"

    # Execute the SQL statement to fetch data
    cursor.execute(select_sql2)

    # Fetch all rows of data
    columns = cursor.fetchall()
        
    # Define column names
    column_names = [column[0].lower() for column in columns]

    # Create a DataFrame from the fetched rows with column names
    df = pd.DataFrame(rows, columns=column_names)

    return df

def load_data(df):
    df.drop("customer_id", axis=1, inplace=True)
    df.drop_duplicates(inplace=True)
    df.dropna(axis=0, inplace=True)
    return df


def remove_outliers_iqr(df, col):
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1

    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    df = df[(df[col] >= lower_bound)] 
    df = df[(df[col] <= upper_bound)]
    return df


def label_encode_categorical_columns(df):
    le = LabelEncoder()
    for col in df.select_dtypes(include='object'):
        df[col] = le.fit_transform(df[col])
    return df


def preprocess_data(data):
    can_have_outlier = ["person_age","person_income","person_emp_length","cb_person_cred_hist_length","loan_amnt"]
    for col in can_have_outlier:
        data = remove_outliers_iqr(data, col)
    preprocessed_data = label_encode_categorical_columns(data) 
    return preprocessed_data


def data_balance(new_df):
    # Divide by class
    df_class_0 = new_df[new_df['loan_status'] == 0]
    df_class_1 = new_df[new_df['loan_status'] == 1]
    count_class_0, count_class_1 = new_df['loan_status'].value_counts()
    # random over sampling
    df_class_1_over = df_class_1.sample(count_class_0, replace=True)
    new_df = pd.concat([df_class_0, df_class_1_over], axis=0)
    return new_df


@mlrun.handler(outputs=["dataset", "label_column"])
def credit_risk_dataset_generator():
    """
    A function which generates the credit risk dataset
    """
    dataset = get_oracle_data()
    data = load_data(dataset)
    preprocessed_data = preprocess_data(data)
    for_model_df = data_balance(preprocessed_data)

    return for_model_df, "loan_status"

Issue Description

I am trying to fetch data from Oracle database and trying to fed it to mlrun architecture. I'm able to access database locally, but getting this error when trying to ingest it in pipeline.

Expected Behavior

I should be able to ingest data into pipeline from oracle database.

Installation OS

Windows

Installation Method

Kubernetes

Python Version

3.9.13

MLRun Version

Mlrun CE 0.6.2

Additional Information

I'm Using docker Desktop to work with Kubernetes Clusture.
Docker Version- v4.9.1
Kubectl version - v1.24
Helm version - v3.11.3
Dataset using - "credit risk dataset"

@xsqian
Copy link
Contributor

xsqian commented Sep 28, 2023

Hi @harishgawade1999
Can you please post 2 more items here?

  • your code for setting the mlrun function
  • the full error messages

@harishgawade1999
Copy link
Author

Hello @xsqian ,

  1. my code for setting the mlrun function-" https://github.com/harishgawade1999/test-repo/blob/main/MLRUN_Oracle_Fin%20(2).ipynb"
  2. the full error messages - "https://github.com/harishgawade1999/test-repo/blob/main/error%20log.txt "

I have attached files here. please go through it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants