The reconstruction of 3D objects using machine learning, what am I doing wrong?


Warning: count(): Parameter must be an array or an object that implements Countable in /home/styllloz/public_html/qa-theme/donut-theme/qa-donut-layer.php on line 274
0 like 0 dislike
9 views
Good evening. 3 months struggling with the task of rebuilding the 3D models of hands in the photo.
Generated the sample data is 10 000 images hand made in Blender, with different positions of bones. As the output vector decided to take the position of the vertices of the 3D model. (Yes, I realize that it makes more sense to define positions of bones, but I decided to try so)
Did the augmentation data in such a way that two identical images of the sample to meet almost impossible. (image of a hand superimposed on any other photo and apply filters, to not stand out from the crowd. Also added noise simulating a not very good recording quality.) Looks like one of the photos like this:
5ae5e080b2dae428024412.jpeg
In Keras was the model of the convolutional neural network (a picture can not provide, because could not install graphvis).

inp = Input(shape=(res,res,3)) bath_0 = BatchNormalization(axis=1)(inp) x1 = Conv2D(primitives, kernel_size=(9, 9), border_mode='same' activation='relu')(bath_0) pool_1 = MaxPooling2D(pool_size=(2, 2))(x1) bath_1 = BatchNormalization(axis=1)(pool_1) x2 = Conv2D(primitives*2, kernel_size=(3, 3), border_mode='same' activation='relu')(bath_1) x3 = Conv2D(primitives*2, kernel_size=(3, 3), border_mode='same' activation='relu')(x2) x4 = Conv2D(primitives*2, kernel_size=(3, 3), border_mode='same' activation='relu')(x3) pool_2 = MaxPooling2D(pool_size=(2, 2))(x4) bath_2 = BatchNormalization(axis=1)(pool_2) x5 = Conv2D(primitives*4, kernel_size=(3, 3), border_mode='same' activation='relu')(bath_2) x6 = Conv2D(primitives*4, kernel_size=(3, 3), border_mode='same' activation='relu')(x5) x7 = Conv2D(primitives*4, kernel_size=(3, 3), border_mode='same' activation='relu')(x6) pool_3 = MaxPooling2D(pool_size=(2, 2))(x7) x8 = Flatten()(pool_3) x9 = Dense(1700,activation='relu')(x8) d_1 = Dropout(0.5)(x9) x10 = Dense(1700,activation='relu')(d_1) d_2 = Dropout(0.5)(x10) x11 = Dense(1700 ,activation='relu')(d_2) out = Dense(out_size,activation='tanh')(x11)

What I got to do: trained naranca learned to bend the brush in the right direction, but the fingers always remain in the same position, regardless of the photo.
By the way, initially there was a problem: naranca always built the same model (absolutely identical). Decided that added in the sample pictures without hands. The output layer in this case got zero for all neurons.

What is your problem, you explain it!
The problem is that the fingers are always in the same position for all models. Bent only brush. As here:
5ae5e3e4e2d15891957013.png

Please answer the following questions, because I don't know what to think.
1. What is my mistake? What am I doing wrong?
2. Maybe should think about using Convolution3D?
3. How would you solve the task?

Thank you for your attention.
by | 9 views

1 Answer

0 like 0 dislike
Maybe dataset too complicated - start with more pictures and less aggressive augments (you can start with just a black background).

Well, the architecture is not very suitable, read what you are doing in this task, the last years https://github.com/xinghaochen/awesome-hand-pose-e...
by
110,608 questions
257,186 answers
0 comments
35,536 users