Search Ads Web Service

Online search advertisement platform & Realtime Campaign Monitoring

Project Description

Designed and developed web crawler which crawled 500000 product data from Amazon (Java, JSoup, Proxy)
Developed Search Ads workflow support: Query understanding, Ads selection from inverted index (with MemCached), Ads ranking, Ads filter, Ads pricing, Ads allocation
Designed and implemented feature engineering pipeline which generate features for query understanding and click prediction with Spark MapReduce

Crawler

Used Jsoup to crawler information on Amazon.

Finished
- extract price, product detail url, product image url, category from web page
- convert each product to Ads
- store Ads to file, each ads in JSON format.
- support paging
- log all exception

Avoid Bot Detection

Proxy IP and rotating Brower
Distribute Crawler

Online Search Ads Platform

Search advertising is placing online advertisments on front end pages that show results to users from their search engine queries. This search ads server takes thousands of product data as ads candidates and selects, filters, ranks, allocates and prices the ads when search query comes in. The selection and ranking of search ads is based on the quality of ads and the bid price offered by advertisers.

Query Understanding

clean the text by Lucean
train word2vector model using ads keywords corpus and use synonyms to rewrite query

Query Relevancy Matching

Ads candiate will first be evaluated and filtered by relevance score. Relevance score is to measure how relevant query is to key words in ads. Here the relevance score = number of word match query / total number of words in key words. For quick retreival of ads infomation, the inverted index of ads keywords were built and store in cache.

The data layer for supporting online system:

Forward index for Ad detail information (MySQL)
Inverted index for Ad keywords (Memcached)

P-Click Prediction

The probability of user click (p-click) plays an important role in ads ranking.

Use spark ML process simulated user click log data and generate prediction model.

Click log

log: Device IP, Device id,Session id,Query,AdId,CampaignId,Ad_category_Query_category(0/1),clicked(0/1)

Feature space

pClick Features extracted from search log and stored in key-value store

Model

Logistic Regression

Gradient Boosting Tree

Online Ads Ranking and Pricing

Quality Score = 0.25 * Relevance Score + 0.75 * pClick

Rank Score = Quality Score * Bid

Price(Cost Per Click) = next rank score / current quality score + 0.01

System

When receiving search query, the system matchs rewrote query with keywords of ads using inverted index to get relevance score, and predict the probability of click by the regression model generated from 50GB historical click data. The quality of ads will be determined by both relevance score and the probability of click. The ads engine calculates the quality score and combines it with ads bid price for final ranking and pricing.

Real Time Campaign Monitor

The real time campaign monitor system is built for collecting the ads relevant events generated by online ads server and visulizing the trending of campaigns.

Join Events Streams

he real time campaign monitoring system is a streaming pipeline which collects and processes the ads events generated by online search ads engine. The chance events, impression events and click events of ads are published to message queue and processed to store in database in streaming way. The front end dashboard visualizes the budget status and dynamic impression, click and pricing trending of campaigns.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
AdServer/src/io		AdServer/src/io
Batch		Batch
Crawler		Crawler
JarPackage		JarPackage
Streaming		Streaming
slides		slides
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdServer/src/io

AdServer/src/io

Batch

Batch

Crawler

Crawler

JarPackage

JarPackage

Streaming

Streaming

slides

slides

.DS_Store

.DS_Store

README.md

README.md

Repository files navigation

Search Ads Web Service

Project Description

Crawler

Avoid Bot Detection

Online Search Ads Platform

Query Understanding

Query Relevancy Matching

P-Click Prediction

Online Ads Ranking and Pricing

System

Real Time Campaign Monitor

Join Events Streams

Streaming Pipeline

Dashboard Visualization

About

Releases

Packages

Languages

youhusky/Search_Ads_Web_Service

Folders and files

Latest commit

History

Repository files navigation

Search Ads Web Service

Project Description

Crawler

Avoid Bot Detection

Online Search Ads Platform

Query Understanding

Query Relevancy Matching

P-Click Prediction

Online Ads Ranking and Pricing

System

Real Time Campaign Monitor

Join Events Streams

Streaming Pipeline

Dashboard Visualization

About

Topics

Resources

Stars

Watchers

Forks

Languages