While the group-sparsity constraint forces our model to only consider a few features, these features are largely used across all tasks. All of the previous approaches thus assume that the tasks used in multi-task learning are closely related. However, each task might not be closely related to all of the available tasks. In those cases, sharing information with an unrelated task might actually hurt performance, a phenomenon known as negative transfer.

Minimizing L2 loss comes from the assumption that the data is drawn from Gaussian distribution The sentence “Minimizing L2 loss comes from the assumption that the data is drawn from Gaussian distribution” is taken from one paper Deep multi-scale video prediction beyond mean square error . The following is my understanding based on some useful resources.
Suppose we have a dataset $\mathcal{X}={\mathbf{x}^{(1)},\mathbf{x}^{(2)}, \cdots, \mathbf{x}^{(m)}}$. The data comes from an unknown distribution $p_{data}(\mathbf{x})$ and we use $p_{model}(\mathbf{x}; \theta)$, which is parametrized by $\theta$, to model the unknown data distribution $p_{data}(x)$.

今晚听了李文博士的报告“Domain Generalization and Adaptation using Low-Rank Examplar Classifiers”，讲的很精彩。自己第一次听说domain generalization

Top Journals and Conferences 通过上网查询以及看同行对会议的公共认识，数据挖掘领域的顶级会议是KDD（ACM SIGKDD Conference on Knowledge Discovery and Data Mining），公认的、排名前几位的会议

主要步骤 Step 1 对于wordcount1.0 ，按照http://www.cloudera.com/content/cloudera/en/ documentation/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_usage.html# topic_5_2 执