Proper grammar is telling you how to use your language. Linguists call it prescriptive grammar because it prescribes certain words and structures. There's also such a thing as proscription.
Lately many ML theorists have become interested in the generalization mystery: Their main experimental finding is that if you take a classic convnet architecture, say Alexnetand train it on images with random labels, then you can still achieve very high accuracy on the training data. Furthermore, usual regularization strategies, which are believed to promote better generalization, do not help much.
The paper notes that the ability to fit a classifier to data with random labels is also a traditional measure in machine learning called Rademacher complexity which we will discuss shortly and thus Rademacher complexity gives no meaningful bounds on sample complexity.
I found this paper entertainingly written and recommend reading it, despite having given away the punchline. Congratulations to the authors for winning best paper at ICLR They felt that similar issues had been Descriptive versus prescriptive theory studied in context of simpler models such as kernel SVMs which, to be fair, is clearly mentioned in the paper.
It is trivial to design SVM architectures with high Rademacher complexity which nevertheless train and generalize well on real-life data. Furthermore, theory was developed to explain this generalization behavior and also for related models like boosting. But regardless of such complaints, we should be happy about the attention brought by Zhang et al.
Indeed, the passionate discussants at the Simons semester themselves banded up in subgroups to address this challenge: Before surveying these results let me start by suggesting that some of the controversy over the title of Zhang et al.
These confusions arise from the standard treatment of generalization theory in courses and textbooks, as I discovered while teaching the recent developments in my graduate seminar. Prescriptive versus descriptive theory To illustrate the difference, consider a patient who says to his doctor: Doctor 2 after careful physical examination: Removing it will resolve your problems.
Generalization theory notions such as VC dimension, Rademacher complexity, and PAC-Bayes bound, consist of attaching a descriptive label to the basic phenomenon of lack of generalization. When this fails to happen, we have: See also scribe notes of my lecture.
But a glance at 3 shows that it implies high Rademacher complexity: The VC dimension bound is similarly descriptive. Why do students get confused and think that such tools of generalization theory gives some powerful technique to guide design of machine learning algorithms?
Probably because standard presentation in lecture notes and textbooks seems to pretend that we are computationally-omnipotent beings who can compute VC dimension and Rademacher complexity and thus arrive at meaningful bounds on sample sizes needed for training to generalize.
While this may have been possible in the old days with simple classifiers, today we have complicated classifiers with millions of variables, which furthermore are products of nonconvex optimization techniques like backpropagation.
The only way to actually lowerbound Rademacher complexity of such complicated learning architectures is to try training a classifier, and detect lack of generalization via a held-out set.
Every practitioner in the world already does this without realizing itand kudos to Zhang et al. Toward a prescriptive generalization theory: The authors of the new papers intuitively grasp this point, and try to identify properties of real-life deep nets that may lead to better generalization.
Both Bartlett et al. I will present my take on these results as well as some improvements in a future post.
Note that these methods do not as yet give any nontrivial bounds on the number of datapoints needed for training the nets in question. Dziugaite and Roy take a slightly different tack. Now we arrive at the crucial idea: Assuming a and b can be suitably bounded, it follows that the average classifier from Q works reasonably well on unseen data.
Note that this method only proves generalization of a noised version of the trained classifier. Hence the title of their paper, which promises nonvacuous generalization bounds. Subscribe to our RSS feed.Introduction to Nursing Theory. Nursing Theory. STUDY.
PLAY. what are the building blocks of theory (prescriptive theories) Ontology. Study of theory of being (what is or what exists). are related.
The purpose is to provide observation and meaning regarding the phenomena. It is generated and tested by descriptive research techniques.
Descriptive vs Prescriptive Theory Descriptive Theories phenomenon: an observable fact that can perceived through the senses and explained. Descriptive Theories.
Descriptive, Predictive, and Prescriptive Analytics Explained The two-minute guide to understanding and selecting the right Descriptive, Predictive, and Prescriptive Analytics With the flood of data available to businesses regarding their supply chain these days, companies are turning to analytics solutions to extract meaning from the huge.
Results showed that in all eight countries, prescriptive normativity was the primary determinant of respondents' social control reactions. In addition, respondents from collectivistic cultures reported that they would exert more social control than respondents from individualistic cultures.
Descriptive Grammar vs Prescriptive Grammar. Uploaded by. Wahida Wahab. Descriptive Grammar A descriptive grammar is a set of rules about language based on how it is actually used. In a descriptive grammar there is no right or wrong language.
The most widely accepted descriptive account of the passive, be + Past Participle, is one based on voice.
A voice analysis of the passive brings with it a range of problems, for which generative grammarians have “postulated” various “formal” devices in order to “explain.”.