-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dual Transformer for long term and short term seq[QST] #725
Comments
I am doing the same. Any leads will be appreciated |
@vivpra89 is there a specific reason you don't want to use a single transformer architecture with single sequence? is the reason accuracy? by using a single sequence the position embeddings would help the network to learn about what the most recent interactions are, or you might use If you really want to have to sequences, yes you can have two transformer block and concat them without applying masking, which means the target (the last item in the user interaction sequence) should be in a different column and not in the input-sequence. |
@rnyak In two sequences approach, we create two separate sequences: one for long-term behavior and another for short-term behavior. The long-term sequence captures the historical preferences and patterns of the user, while the short-term sequence represents recent interactions or activities. Each sequence is trained individually using a transformer architecture, which allows the model to capture complex dependencies and patterns within the data. What is your opinion on this architecture? |
Hi @NamartaVij and @vivpra89 . Transformers4Rec provides some masking options for training sequential models: Causal Language Modeling and Masked Language Modeling, that you set in @NamartaVij You proposed training two models separately and that definitely is possible. Maybe you train one of those models first and share the embedding weights with the second model, using the |
What I have in mind is to train both the transformers together like a two tower model. I am referencing this example to concat before the prediction task. Please take a look at below code and share your thoughts. long_seq_inputs = mm.InputBlockV2( short_term_inputs = mm.InputBlockV2( mlp_block1 = mm.MLPBlock( mlp_block2 = mm.MLPBlock( lt_dense_block = mm.SequentialBlock( st_dense_block = mm.SequentialBlock( concats = mm.ParallelBlock( mlp_block2 = mm.MLPBlock( prediction_task= mm.CategoricalOutput( optimizer = tf.keras.optimizers.Adam( model_transformer = mm.Model(concats, mlp_block2, prediction_task) model_transformer.compile( |
@gabrielspmoreira If we assume Long Term sequences are purchases and short term are ATCs, do you think this works with causal masking for both transformers? also, I have couple of questions : 1. whats the difference in using merlin-models vs transformers4rec 2. how do we modify the head part (number of layers and nodes in mlp etc) Create a schema or read one from disk: tr.Schema().from_json(SCHEMA_PATH).schema: tr.Schema = tr.data.tabular_sequence_testing_data.schema max_sequence_length, d_model = 20, 64 Define the input module to process the tabular input features.input_module_lt = tr.TabularSequenceFeatures.from_schema( input_module_st = tr.TabularSequenceFeatures.from_schema( Define a transformer-config like the XLNet architecture.transformer_config_lt = tr.XLNetConfig.build( transformer_config_st= tr.XLNetConfig.build( Define the model block including: inputs, masking, projection and transformer block.lt_body = tr.SequentialBlock( st_body = tr.SequentialBlock( Define the evaluation top-N metrics and the cut-offsmetrics = [NDCGAt(top_ks=[20, 40], labels_onehot=True), body_concats = mm.ParallelBlock( Define a head with NextItemPredictionTask.head = tr.Head( Get the end-to-end Model class.model = tr.Model(head) |
|
@gabrielspmoreira Can you please help me with a snipped on how to concat TabularSequenceFeatures, or use TensorInitializer to share embedding weights, couldnt find them in examples. |
`# Define input block pre-trained embeddingpretrained_dim = 256 embeddings_op = EmbeddingOperator( set dataloader with pre-trained embeddingsdata_loader = MerlinDataLoader.from_schema( set the model schema from data-loadermodel_schema = data_loader.output_schema Define input module to process tabular input-features and to prepare masked inputsinputs= tr.TabularSequenceFeatures.from_schema( Define XLNetConfig class and set default parameters for HF XLNet configtransformer_config = tr.XLNetConfig.build( Define the model block including: inputs, masking, projection and transformer block.body = tr.SequentialBlock( prediction_task = tr.NextItemPredictionTask(weight_tying=True, Define the head related to next item prediction taskhead = tr.Head( Get the end-to-end Model classmodel_si = tr.Model(head) #Set arguments for training Instantiate the T4Rec Trainer, which manages training and evaluationtrainer = Trainer( |
@gabrielspmoreira @rnyak how to concatenate in the end both outputs to get the final prediction? |
❓ Questions & Help
Details
Im working on a project that requires me to produce a stable long-term user and item representation and also use short-term user behavior for next action prediction. Is it possible to create a custom architecture to train two transformer towers together with different inputs and then have concat at later point. What is the recommended architecture for problems like this.
The text was updated successfully, but these errors were encountered: