30 Oct 2016

Pewter - Data Visualization and Analysis

What is Pewter

Pewter is an open-source project for Acquisition, Analysis and Visualisation of raw data from Myo and conduct experiments on it. I developed this application when I started working on the project SigVoiced for sign language to speech conversion. Feel free to contribute or make use of it if you are working on the raw data from MYO Armband. Github Link : https://github.com/sigvoiced/pewter

Data Acquisition

Well, data acquisition is not a trivial task. A lot of things need to be taken care of and it needs to be done very carefully and precisely. So, to make sure that we are on the right track, we first acquire a small set of data that we will use for building a proof of concept and then we will plan real data acquisition accordingly. I will continue with the ’Sign language to voice’ example that I had mentioned in my previous post for explaining further.

Step 1 - Planning

Select Gestures: For the proof of concept select a few gestures that you want to work on. In our case we took the following single handed signs in Indian Sign Language (refer to the following link for reference), whih are most commonly used.
- Deaf
- Hearing
- Parent
- Father
- Mother
- Name
- Bad
- Good

Reference: Talking Hands

Number of participants: Since there were 2 people (including me) in my lab who were willing to give time for the data acquisition task, I took 10 instances per person per sign for making a proof of concept and as mentioned in the previous post, carry out STEP-1 of the modelling process.If you are an expert then you might skip this step and plan modelling and data acquisition. But I highly recommend some analysis before jumping into generating different models and evaluating them.
Experiment Setup and Ground Truth: Before conducting any experiment, think about the conditions that you are going to experiment in and how you are going to use the device for experimenting. In our case, we had the following conditions,
- Wear the MYO armband and follow the instructions provided in the MYO armband manual while wearing it.
- Experiment with right-handed people: Since, we just had right handed people for data acquisition, we will stick to that for now.
- Single handed Signs: Collect data for only single handed signs as we just had one armband to work with.
- Learn the Signs: Since we were not pros in ISL so we referred to Talking Hands before collecting data.
- Ground Truth: Data for four signs, Deaf, Hearing, Parent, Father, Mother, Name, Bad, Good in ISL
Data Analysis: We planned to analyse two things,
1. Universal model: Whether there is any significant difference in the signal data of two different participants for the same sign? If there is then which sensors are significant differentiating data?
2. Personal Model: Whether there is consistency in the data from a single person for the same sign? Does the data from a single person seem to be differentiating different signs?

Step-2 Data Acquisition

We collected 10 instances per sign per person using Pewter for analysis. Please find the data here. The data is in JSON format and the schema can be found in Pewter’s readme file.
Documentation: We documented some metadata for reference about the participants like name, age, gender etcetera.

We asked the participants whether we can share their data publically and they were absolutely ok with it.

Step-3 Visualization

We used the visualisation module of Pewter for analysing our data

Personal Model: We observed the following,
- IMU (Inertial Measurement Unit) signals had less variation for signs with fewer hand movement. This was quite obvious but now it was evident too. .
- All EMG pods did not have differentiating data. But, signs with finger movements had highly variant signals.
- There was a little noise in the signals.
- We could see some observable patterns in the data, giving us an idea for feature selection.
Universal Model: We observed the following,
- The observations were similar as above but the data had shifted for the IMU sensors.

Step-4 Preprocessing analysis

Noise Removal: We decided not to go for noise removal as no matter how hard we try, we will not be able to get rid of all the noise.
Resampling: Because of different sampling rates for EMG and IMU sensors, we decided to resample the data and normalise it.
Scaling: We planned to use an absolute scaler for scaling the data to range from [0,1].

Step-5 Deciding Algorithms

Since we want to classify time series data, the best approach would be to use Hidden Markov Models which are quite popular. Another model that is quite well known is Recurrent Neural Networks (in this case** LSTMs) with **Connectionist Temporal Classification. These are state of the art classifiers for classifying time series data. With these we would also evaluate a few other models too to do a fair compairision.

Hidden Markov Model : With raw data
Hidden Markov Model: With features from sliding windows
Support Vector Machine: With global features
K-Nearest Neighbors With Dynamic Time Warping: With signal templates
Naive Bayes: With global Features
Recurrent Neural Networks (Long Short Term Memory - LSTM with Connectionist Temporal Classification): With raw data

Step-6 Features

The following are the features that we plan to extract from the processed signals.

Time domain
1. Mean
2. Variance
3. Sum of Squares
4. Zero Crossings
5. Gradient Changes
6. 1st Order Derivative (of raw signal)
7. 2nd Order Derivative (of raw signal)
8. Root Mean Square
9. Peaks
10. Maxima
11. Minima
Frequency Domain
1. Mean Power
2. 1st Dominant Frequency
3. 2nd Dominant Frequency
4. Number of peaks
5. Variance
6. Total Power
7. Maxima
8. Minima
MFCC (Mel-frequency cepstral coefficients): These have been widely used in audio and speech analysis. We thought of experimenting with MFCCs as features as well.

We would analyse the features and remove insignificant features by visualising the features as parallel coordinates.

What Next?

In my next series of post I will cover the following,

Data Preprocessing
Feature Extraction
Model Evaluation for all the methods mentioned above

I will provide all the data and scripts required for everything that I will do. If you know the concepts that I have mentioned above then well an good otherwise you can go through the references that I have provided for further knowledge on the mentioned topics.

References

Hidden Markov Model: http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf
LSTM: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks: https://en.wikipedia.org/wiki/Recurrent_neural_network
Connectionist Temporal Classification: ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf
Dynamic time warping: https://en.wikipedia.org/wiki/Dynamic_time_warping
K-Nearest Neighbor: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Dynamic Time Warping: https://en.wikipedia.org/wiki/Dynamic_time_warping
Naive Bayes Classifier: https://en.wikipedia.org/wiki/Naive_Bayes_classifier
Mel-frequency cepstrum: https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
Parallel Coordinates: https://en.wikipedia.org/wiki/Parallel_coordinates

Stats:

0 comments