【AI】DSMM Item-based系统过滤

 

DSMM

Advantage: Use DNN models to do this. Key word: DNN Usage: Used in text-based query-to-document retrieval.

Objective

Retrieve documents related with an input query.

Proposed DNN architecture:

Image

Term vector

Term vector use letter or word as a dimension and counts of that term.

Word hashing

In order to reduce the dimensionality of bag-of-words term vectors, it is based on letter n-gram. In this paper, the author used letter trigrams. The collisions are rare under this condition.

Algorithm

They have click-through logs that consist of a list of queries and their clicked documents. This indicates that this query is related with this document. So how they train this model? They want to measure that given a query and its correlated document, maximize the likelihood of the clicked documents. The likelihood of the clicked document is

Image

The probability of a document given the semantic relevance score is

Image

They tried to maximize the