https://bbengfort.github.io/2016/05/text-classification-nltk-sckit-learn/