This Repository contains our submission that got us 3rd place in Amazon ML challenge 2025
undefinedTeam Name: 00_Team_Rocket
undefinedTeam Members: Parth Rastogi, Angadjeet Singh, Abhishek Jha, Harsh Kumar
undefinedSecured Second Runner-Up Position in Amazon ML Challenge 2025undefined
Our team developed a hybrid deep learning approach combining DeBERTa-large-v3 with engineered features through a cross-attention fusion mechanism.
This architecture leverages pretrained language understanding enriched with domain-specific product attributes, resulting in robust and generalizable price prediction performance.
Through extensive exploratory data analysis (EDA), we observed that product pricing is influenced by both semantic content and explicit attributes.
Key findings include:
undefinedApproach Type: Hybrid (Pretrained LM + Feature Engineering + Cross-Attention)
undefinedCore Innovation:undefined
A two-stream architecture that fuses DeBERTa’s [CLS] embedding with engineered feature embeddings via cross-attention, capturing complex relationships between product descriptions and structured metadata.
undefinedDataset Insight:undefined
As the log(price) distribution approximated a Gaussian, training was performed on log(price + 1).
Final predictions were obtained via exp(pred) - 1.
We also evaluated CLIP-based image embeddings, but analysis via UMAP clustering revealed poor consistency of embedding clusters with price values. Hence, we relied solely on text-based modeling.
undefinedLoss Function:undefined
We used Smooth L1 loss instead of MSE, as it yielded lower SMAPE and more stable training.
| Stage | Model/Component | SMAPE (Validation) | SMAPE (Unstop) |
|---|---|---|---|
| First Approach | Bert Base | 49.1 | 48.1 |
| Pretraining | DeBERTa (Regression Task) | 44.2 | 43.2 |
| Final Hybrid Training | DeBERTa + Cross-Attention Fusion | undefined39.86undefined | undefined40.329undefined |
is_organic, is_gourmet, is_gluten_free, is_bulk, special_charsunit_typevalue, num_words, num_sentences, uppercase_ratioundefinedFusion Strategy:undefined
Concatenated feature embeddings + DeBERTa [CLS] → Cross-Attention → Linear Regression Head
| Model | Validation SMAPE | Notes |
|---|---|---|
| Pretrained DeBERTa Baseline | 43.4 | Text-only |
| Final Hybrid Model | undefined39.83undefined | 5% hold-out |
| Challenge Submission | undefined40.329undefined | on Unstop |
Our hybrid model successfully demonstrates that structured features can enhance the predictive power of pretrained language models for complex pricing regression tasks.
The cross-attention fusion effectively learns the interplay between semantic understanding and explicit product attributes, achieving scalable and interpretable performance gains for real-world e-commerce applications.
Amazon-ML-Challenge-2025-3rd/
│
├── Data/
│ ├── preprocessed_train.csv
│ ├── preprocessed_test.csv
│
├── Preprocess.py # Data preprocessing
├── Pretraining.py # DeBERTa pretraining on regression
├── Main_Training.py # Final hybrid model training
├── Inference.py # Inference and submission CSV generation
├── README.md
You can access the final trained model (DeBERTa + Cross-Attention Fusion) used for the final submission here:
👉 undefinedFinal_Model.pt (Google Drive)undefined
We use cookies
We use cookies to analyze traffic and improve your experience. You can accept or reject analytics cookies.