Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting in-memory list from the database stream #428

Open
mickvikt opened this issue Mar 22, 2023 · 13 comments
Open

Getting in-memory list from the database stream #428

mickvikt opened this issue Mar 22, 2023 · 13 comments
Assignees
Labels
documentation Documentation may be added or completed on the portal question Further information is requested

Comments

@mickvikt
Copy link

mickvikt commented Mar 22, 2023

Hello,
is there a way to get an in memory list as an instantiated IList<T> from SQL Server stream and have the database cursor consumed/closed?
First I thought extension method ToList() is for this purpose, but it appears it returns an ISingleStream<List<T>>

        var inMemoryCarrierList = contextStream
            .CrossApplySqlServerQuery("carrier stream", builder => builder
                .FromQuery("select carr_code, carr_name from dbo.carr_old order by carr_code")
                .WithMapping(m => new
                {
                    carr_code = m.ToColumn("carr_code"),
                    carr_name = m.ToColumn("carr_name")
                }), "source1")
            .ToList("in memory carrier list");

If I get a list this way and I don't consume this stream by, say, outputting to a text file and then I try a query like this on the same ODBC connection:

        var arch1 = contextStream
            .CrossApplySqlServerQuery("first query from source 1", o => o
                    .FromQuery("select * from dbo.carr")
                    .WithMapping(i => new
                    {
                        carr_code = i.ToColumn("carr_code"),
                        carr_name = i.ToColumn("carr_name")
                    })
                , "source1")
            .Select("create row to save source1 from first query", i => new { i.carr_name, i.carr_code });

I get error:

Unhandled exception. Paillave.Etl.Core.JobExecutionException: Job execution failed
 ---> System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt
@paillave
Copy link
Owner

@Radamonas found the way as a workaround for the issue related to ODBC drivers. It works with a client cursor, witch is the recommended way to avoid memory overload:
#423 (comment)

@paillave paillave added the question Further information is requested label Mar 22, 2023
@mickvikt
Copy link
Author

mickvikt commented Mar 23, 2023

It does work in provided example, the thing is that it drains both stream cursors to files. What makes me wonder is whether it is possible to have definitions of several streams in the same process without actually outputting them to a file, so that they could be used for combining streams, e.g. look-ups, etc. later on?

@paillave
Copy link
Owner

It does work in provided example, the thing is that it drains both stream cursors to files. What makes me wonder is whether it is possible to have definitions of several streams in the same process without actually outputting them to a file, so that they could be used for combining streams, e.g. look-ups, etc. later on?

yes, you can do what ever you want from what you issue from any operators, and that counts as well for CrossApply operators.

As maybe there is something I didn't get in your question, give me a sample of the kind of code you would like to achieve, and I will adapt it to work with existing operators.

@mickvikt
Copy link
Author

mickvikt commented Mar 23, 2023

What I was referring to with this question is actually this statement and example in the documentation:

To make a lookup, extensions for Sql Server don't provide any operator out of the box. The work around 
is to use the in memory lookup of ETL.NET core.

var authorStream = contextStream
    .CrossApplySqlServerQuery("get authors", o => o
        .FromQuery("select a.* from dbo.Author as a")
        .WithMapping(i => new
        {
            Id = i.ToNumberColumn<int>("Id"),
            Name = i.ToColumn("Name"),
            Reputation = i.ToNumberColumn<int>("Reputation")
        }));

postStream
    .Lookup("get related author", authorStream,
        l => l.AuthorId,
        r => r.Id,
        (l, r) => new { Post = l, Author = r })
    .Do("show value on console", i => Console.WriteLine($"{i.Post.Title} ({i.Author.Name})"));

The question is: how to you define the postStream here? We're calling a method on postStream, but how do we define it?

@paillave
Copy link
Owner

you just have to define your postStream exactly in the same way you defined authorStream, and combine them the way you want.

Maybe this near real life example will help you to understand:
https://github.com/paillave/Etl.Net/blob/master/src/Tutorials/Paillave.Etl.Samples/TestImport2.cs
FYI, in this example, the target data is this one: https://github.com/paillave/Etl.Net/blob/master/src/Tutorials/Paillave.Etl.Samples/DataAccess/DbStructure.md
and the input files are here: https://github.com/paillave/Etl.Net/tree/master/src/Tutorials/Paillave.Etl.Samples/InputFiles
There are 2 kind of files: files that define portfolios, and files that define positions

@mickvikt
Copy link
Author

Sorry to bother you again, but the examples deal with files, and I'm referring to the SQL Server data source and this note in documentation: "To make a lookup, extensions for Sql Server don't provide any operator out of the box. The work around
is to use the in memory lookup of ETL.NET core."

How do you actually accomplish this workaround and perform in memory look up if you have two database streams?
Thank you

@paillave
Copy link
Owner

paillave commented Mar 23, 2023

yes but in the same way you produce data from a file, you produce data from an in memory stream.

Does the following example represent what you expected?

var categoriesStream = contextStream.CrossApply("create in memory categories", ctx => new[] {
    new { Code = "a", Label = "label A", Tax = 0.1 },
    new { Code = "b", Label = "label B", Tax = 0.2 },
    new { Code = "c", Label = "label C", Tax = 0.3 },
});

contextStream
    .CrossApplyFolderFiles("list all required files", "*.zip", true).CrossApplyZipFiles("extract files from zip", "*.csv") // could be retrieved from database... anything...
    .CrossApplyTextFile("parse file", FlatFileDefinition.Create(i => new Article
    {
        Name = i.ToColumn("name"),
        Category = i.ToColumn("first name"),
        Price = i.ToNumberColumn<double?>("price", ".")
    }).IsColumnSeparated(','))
    .Lookup("get category label", categoriesStream, l => l.Category, r => r.Code, (l, r) => new
    {
        Name = l.Name,
        CategoryLabel = r.Label,
        PriceWithTax = l.Price * r.Tax,
    })
    [...];

@mickvikt
Copy link
Author

It is clearer now, thank you. The one thing I still can't understand is how to declare an in-memory stream from the SQL Database data, because this is, as I understand, currently the only way to combine two database streams.

@paillave
Copy link
Owner

paillave commented Mar 23, 2023

It is clearer now, thank you. The one thing I still can't understand is how to declare an in-memory stream from the SQL Database data, because this is, as I understand, currently the only way to combine two database streams.

I still don't understand your point. In the previous example, categoriesStream is fully in-memory, and it is combined with something that comes with file (could be database or anything)

FYI, here is the documentation that shows how to combine streams: https://paillave.github.io/Etl.Net/docs/recipes/linkStreams

@mickvikt
Copy link
Author

Yes, categoriesStream is in-memory, but it is initialized by using array initializer at compile time. My question is how do you populate an in-memory stream from a database table? Documentation says you cannot combine two database streams unless one of the streams is an in memory stream. Is there actually no way to combine to database streams?

@paillave
Copy link
Owner

If you want the join to be done by the database engine, your only choice is to make the proper SQL query with the right join statement.
If you want to the join/lookup to be made at ETL side, but you want to combine dataset that come from two different databases (for example) then you do the following:

But I still have the feeling this is not what you expect, so I'm a bit lost about what puzzles you.

var categoriesStream = contextStream.CrossApplySqlServerQuery("get categories", o => o
                    .FromQuery("select * from dbo.ProductCategory")
                    .WithMapping(i => new
                    {
                        Code = i.ToColumn("Code"),
                        Label = i.ToColumn("Label"),
                        Tax = i.ToNumberColumn<decimal>("Tax")
                    }));

contextStream
    .CrossApplyFolderFiles("list all required files", "*.zip", true).CrossApplyZipFiles("extract files from zip", "*.csv") // could be retrieved from database... anything...
    .CrossApplyTextFile("parse file", FlatFileDefinition.Create(i => new Article
    {
        Name = i.ToColumn("name"),
        Category = i.ToColumn("first name"),
        Price = i.ToNumberColumn<double?>("price", ".")
    }).IsColumnSeparated(','))
    .Lookup("get category label", categoriesStream, l => l.Category, r => r.Code, (l, r) => new
    {
        Name = l.Name,
        CategoryLabel = r.Label,
        PriceWithTax = l.Price * r.Tax,
    })
    [...];

@mickvikt
Copy link
Author

This sentence made everything clear,

If you want the join to be done by the database engine, your only choice is to make the proper SQL query with the right join statement.

Thank you, @paillave, and sorry for bombarding you with questions ;)

@paillave
Copy link
Owner

leave it opened for documentation purpose till the answers are on the documentation website

@paillave paillave reopened this Mar 23, 2023
@paillave paillave added the documentation Documentation may be added or completed on the portal label Mar 23, 2023
@paillave paillave self-assigned this Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation may be added or completed on the portal question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants