【AI】Wide&Deep Item-based content retrieval(基于内容的协同过滤)
Wide&Deep
Advantage: 记忆和泛化能力的高度结合。
Key word: DNN, Generalization, Regressor
Usage: 用于推荐,结合线性模型的记忆能力,和不需要做出很好的特征工程就能拥有强大泛化能力的DNN模型,来达到更加精准的预测效果
Motivation
Categorial data using one-hot encoding representation could memorize a feature pair that correlates with the target label. But this requires feature engineering.
DNNs could learn low-di...
【AI】Logistic Regression详解
Logistic Regression
Advantage: Use logistic function
Usage: Used to predict classification problems
Introduction
Logistic regression is used to predict classification result
Details
Suppose we try to predict whether an email is spam, we have attributes $x_1, x_2, … x_n$ for prediction.
We define a function $z = \theta_0 + \theta_1x_1 + \th...
【AI】DSMM Item-based系统过滤
DSMM
Advantage: Use DNN models to do this.
Key word: DNN
Usage: Used in text-based query-to-document retrieval.
Objective
Retrieve documents related with an input query.
Proposed DNN architecture:
Term vector
Term vector use letter or word as a dimension and counts of that term.
Word hashing
In order to reduce the dimensionality of bag...
【AI】Cross Entropy Loss详解
Cross-entropy loss
Usage: Optimize classifier.
Main idea:
It measures the distance between predicted output probability distribution and actual probability distribution.
https://gombru.github.io/2018/05/23/cross_entropy_loss/
【AI】Unigram Distribution(NLP中常用)
unigram distribution
Key word: Evaluation, Loss Function
Usage: non-contextual probability of finding a specific word form in a corpus.
【AI】GBDT(Gradient Boosting Decision Tree)详解
GBDT
Advantage: The advantage of GBDT is that it achieve great performance.
对数据特征尺度不敏感,自动填补确实特征,可做特征筛选,效果较为突出。
Key word: Regressor
Usage: A regressor to forecast different labels or values
Great Explanation
GBDT Algorithms: Principles - Develop Paper
Watch chapter 3 for formulation
Adobe Acrobat
Core idea
Whether weak learners could be m...
【AI】Arima模型详解
ARIMA
Key word: Model, Time Series
Usage: A statistical analysis model that uses time series data to either better understand the data set or to better predict future trends.
Background
Stationary and differencing
Stationary time series does not depend on time at which the series is observed.
Stationary time series will not have any predict...
【AI】Term-Vector含义
Term-vector
It means that each word forms a separate dimension:
For a model containing only three words you would get:
dict = { dog, cat, lion }
Document 1
“cat cat” → (0,2,0)
Document 2
“cat cat cat” → (0,3,0)
Document 3
“lion cat” → (0,1,1)
Document 4
“cat lion” → (0,1,1)
共计 124 篇文章,16 页。