Criteria for the selection of important features for SVM classification (support vector machines)?

0 like 0 dislike
5 views
Hi, friends!


Please help Council or the link.


How to choose the right features for SVM classification? Do I need to normalize the numerical values of these characteristics?


There is a task using SVM to learn to separate the wheat from the chaff.

The beans there are certain characteristic signs by which they can distinguish, but what kind of signs to take?

Here is an example. Let's say the grain is the weight in milligrams. From Pljevlja also has weight, but the average is different from grain. Is it possible as a feature to take the weight of the grain or need to take the logarithm of the weight, because there are very fine grains, and are very large?


How to choose the right ratio of beans and chaff in the training set? What should it be? 50/50? Or taken from real life — the harvest of grain, took from it a handful and it did sample (i.e., the ratio is close to real)?


What to do if the number of beans in reality (in the training set) refers to the amount of chaff as 1/200? If it spoils a training sample?

After all, you need to isolate grain — they are important, and their is just very little.


Is there any manual from the "SVM for dummies", where would highlight these questions simple questions on the fingers, and without solving complex systems of equations?
by | 5 views

1 Answer

0 like 0 dislike
First of all, no need to dwell on SVM: this is just one of many classification methods. Yes, SVM has its own specifics (other methods — own), but at this stage You can use the common algorithms for data preprocessing.
\r
\r
what kind of signs to take?
This is called feature selection and feature extraction.
\r
In simple words, the process looks like this:
1. Make a list of the available characteristics.
2. Adding various functions from signs (as mentioned the logarithm of the weight), and combination of different characteristics (e.g. length*width*height), etc. What to combine and what transformations to use, shall give, knowledge of tasks and common sense. This process relates to feature extraction.
3. Define the error function, that is defined as will be assessed by the classification accuracy. For example, it may be the ratio of correctly recognized examples to their total. Here it is useful to read about precision and recall.
4. Move one level of abstraction higher.
Imagine a kind of black box, inside of which is a classifier with training and test samples. At the entrance of the box a binary vector indicating which characteristics need to use the classifier output is the value of the classification error rate (on test set).
\r
Thus, the task of feature selection is reduced to the optimization problem: find such input vector, in which the output value of box (error classification) will be minimal. You can, for example, to add characteristics one by one (starting with those that most improve the result) — see gradient descent. You can use something more serious, like genetic algorithms.
\r
\r
Do I need to normalize the numerical values of these characteristics?
It strongly depends on the specific task and the characteristics.
\r
\r
What to do if the number of beans in reality (in the training set) refers to the amount of chaff as 1/200? If it spoils a training sample?
In General, spoils: if one of the examples is much smaller than the other, there is a risk that the classifier will "learn" examples from the training set, and will not be able adequately to find out other similar examples (over-fitting).
Besides, if you are using a simple error function (pravilnosti / razmernye), philosophically tuned classifier can always meet "the chaff" and in 99.5% of cases will be right :)
by

Related questions

0 like 0 dislike
1 answer
0 like 0 dislike
5 answers
0 like 0 dislike
5 answers
110,608 questions
257,186 answers
0 comments
28,875 users