【AI】One-hot Vector实例
One-Hot Encoding
Key word: Encoding
Label Encoding
| Food Name | Categorical | Calories |
| — | — | — |
| Apple | 1 | 95 |
| Chicken | 2 | 231 |
| Broccoli | 3 | 50 |
One-hot encoding
Apple
Chicken
Broccoli
Calories
1
0
0
95
0
1
0
...
【AI】ESMM CVR预测详解
ESMM
Advantage: Combine two nn together with by a dot product finally.
Key word: DNN
Usage: Predict CVR and CTR
Predicting post-click conversion rate is the problem they try to solve.
Why the Problem Is Difficult
Current method has two problems, one is illustrated as the following.
Sample selection bias problem means that the traini...
【AI】Wide&Deep Item-based content retrieval(基于内容的协同过滤)
Wide&Deep
Advantage: 记忆和泛化能力的高度结合。
Key word: DNN, Generalization, Regressor
Usage: 用于推荐,结合线性模型的记忆能力,和不需要做出很好的特征工程就能拥有强大泛化能力的DNN模型,来达到更加精准的预测效果
Motivation
Categorial data using one-hot encoding representation could memorize a feature pair that correlates with the target label. But this requires feature engineering.
DNNs could learn low-di...
【AI】Logistic Regression详解
Logistic Regression
Advantage: Use logistic function
Usage: Used to predict classification problems
Introduction
Logistic regression is used to predict classification result
Details
Suppose we try to predict whether an email is spam, we have attributes $x_1, x_2, … x_n$ for prediction.
We define a function $z = \theta_0 + \theta_1x_1 + \th...
【AI】DSMM Item-based系统过滤
DSMM
Advantage: Use DNN models to do this.
Key word: DNN
Usage: Used in text-based query-to-document retrieval.
Objective
Retrieve documents related with an input query.
Proposed DNN architecture:
Term vector
Term vector use letter or word as a dimension and counts of that term.
Word hashing
In order to reduce the dimensionality of bag...
【AI】Cross Entropy Loss详解
Cross-entropy loss
Usage: Optimize classifier.
Main idea:
It measures the distance between predicted output probability distribution and actual probability distribution.
https://gombru.github.io/2018/05/23/cross_entropy_loss/
【AI】Unigram Distribution(NLP中常用)
unigram distribution
Key word: Evaluation, Loss Function
Usage: non-contextual probability of finding a specific word form in a corpus.
【AI】GBDT(Gradient Boosting Decision Tree)详解
GBDT
Advantage: The advantage of GBDT is that it achieve great performance.
对数据特征尺度不敏感,自动填补确实特征,可做特征筛选,效果较为突出。
Key word: Regressor
Usage: A regressor to forecast different labels or values
Great Explanation
GBDT Algorithms: Principles - Develop Paper
Watch chapter 3 for formulation
Adobe Acrobat
Core idea
Whether weak learners could be m...
共计 142 篇文章,18 页。