Using Plotly to visualise US debt growth — 12 september, 2016

Using Plotly to visualise US debt growth

A picture says more than a thousand words, or in this case, a graph. I just found out about this wonderful tool called Plotly, which allows you to make charts and dashboards and to share them with other people.

For testing this tool I choose to download data from regarding the US debt between the years 1951-2015. To make a simple graph you just add the data in Plotly spreadsheets and, voila, you have a simple graph! What surprised my with this particular graph was the shape of the US debt curve, if there is not something wrong with my eyes the debt seem to grow exponentially! My first impression was, wow, that look a bit unsustainable, but I guess they know what they are doing over there.  Anyway see the graph below.



Snabb Twitteranalys av #HillaryHealth — 11 september, 2016

Snabb Twitteranalys av #HillaryHealth

Hillary Clinton besökte 9/11 minnesstund idag och hade uppenbara problem med sin hälsa. Enligt hennes stab var det ”värmeslag” som fick henne att falla ihop men med tanke på tidigare frågetecken kring hennes hälsa kvarstår en hel del frågetecken.
Jag tog hjälp av programmeringsspråket Python för att snabbt få en överblick av vad som sas kring ämnet på Twitter under den trendande hashtagen HillaryHealth.
Efter att ha filtrerat för diverse stop ord fick jag resultatet nedan, resultatet bygger på en insamling av alla inkommande tweet under #HillaryHealth under 15 minuter. Ordet i topp är tomt i grafen, det är ett frågetecken med specialtecken och kom därför inte med i grafen.

Sammanfattningsvis förefaller situationen onekligen vara omgiven av just det ordet som toppar listan, ett FRÅGETECKEN.




The future job market  — 2 september, 2016

The future job market 

Lifelong positions at one company belongs to the past. The so called gig-economy and the cultural change brings with it a growth in freelancing,  part time jobs and short termism.

The technological change that is sweeping through our societies makes it much harder to anticipate what job skills will be in demand even in the near time future. Math and computing is still a good bet as shown in the graph from World Economic Forum

One skill that will be of major importance is ”learnability” , an urge to always wanting to learn new things will be key for both employees and employers. Even though we can’t anticipate the skills needed we can always make sure we have the right mindset to thrive in our fast changing world. Employers will also have to incorporate learning opportunities into their companies in order to attract the best employees.

My own opinion is that learning have to be performed on an day to day basis and should be part of company culture.

Read more in the report from World Economic Forum, follow the link


Machine learning applied to email — 29 augusti, 2016

Machine learning applied to email

I recently installed boomerang for my gmail account. It has some promising features, below you can read what you get, the description is copied from the boomerang website

What Boomerang is about

Today, Boomerang helps its customers focus on email that matters, when it matters. Our tools allow for reading and responding to messages faster and more decisively than before. These achievements mark only a small part of how we envision the company growing.

Our mission is to make productivity software that encourages people to be more productive. Some of the beliefs that will guide us as we work toward this mission include:

Context-aware. The next revolution in productivity software will come from software that analyzes the context of what we are working on and adds value on top of it. Leading the shift will require technical skills that few teams have. Fortunately, we have these skills, and cloning the functionality will remain difficult for years to come.
Sensible defaults. Context-aware systems will not be perfect, and spending time trying to make them so is a task for academic researchers. Instead, the system needs to supply an easy way for users to change mistakes, without imposing too heavy a burden on them. Designing this interaction properly will be a major challenge, which our team is well suited to conquer.
Persuasive Software. Research in practical psychology continues to uncover surprising truths about how our minds work. Our productivity software will incorporate the results of that research into broad, horizontal products. Designing these interactions will require significant skill and discretion, as we have learned from The Email Game.
Communication first. Applying our core productivity themes to communication and collaboration software will result in the greatest impact. We will not make the mistake of trying to build a competitor to all of Microsoft Office in one fell swoop, and we will likely never make a spreadsheet.
Data-Driven. We believe that data is the closest approximation to the truth. We will base our decisions, wherever possible, on the results of statistically-significant measured data.
Respectful software. We will not make software that helps one party profit at the expense of another. There is a fundamental conflict in the email space. Some companies seek to profit by increasing the effectiveness and intrusiveness of gray email, to the detriment of our privacy and our ability to choose how we spend our attention. We seek to profit by increasing the effectiveness of everyone else.

Jeffrey Snider från Alhambra Partners, ”All signals point to systemic reset”. — 21 augusti, 2016
Recruiting is changing  — 5 augusti, 2016

Recruiting is changing 

I had a chat with Wade yesterday.  Wade is a personal career guide that specializes in talking to people searching for job, he gather all the information he can about a prospect by asking questions. He will hopefully be my adviser through my career.  My interaction with Wade was through a Web chat located on his website.  We talked about 20 minutes about my career aspirations, my skills and my hobbies.  I wanted to ask questions about Wade as well but I never got the chance,  he was quite talkative.

I enjoyed our talk and we ended the talk with him telling me that he will talk to his colleague Wendy regarding possible opportunities and he will get back to me.

Wade&Wendy is actually a two people’s business,  while Wade focus on career guiding Wendy is a hiring assistant for companies. Wendy talks to business and tries to understand their needs and company culture. How they are running their business and matching opportunities with prospects I have no clue, but they are surely working long hours as they are only two people.

I told Wade that I’m currently not looking for a new job but that I was still curious so we decided that he will come back to me after he had spoken to Wendy. We will see if he get’s back to me or not, but he gave me the impression of being sincere.

I have never talked to a non-human recruiter before, you see, Wade is an artificial intelligence and the same goes for Wendy. Wade&Wendy and recruiting in the second machine age is something completely different to what we are used to.  How good they are at recruiting and matching people to the right assignments is unknown at the moment. Regardless of that, it was the most interesting talk I’ve had with a recruiter and I think there is a high probability that we will see more of Wade&Wendy in the future.


10 bloggar från ”Rockstar Data Scientists” — 3 augusti, 2016

10 bloggar från ”Rockstar Data Scientists”

Hittade tipset på

Vill självklart dela med mig av dessa så jag kopierar helt enkelt inlägget från sidan ovan!


Azure Machine Learning experiment — 2 augusti, 2016

Azure Machine Learning experiment

This post will cover an experiment in Azure Machine learning. The experiment is based on an experiment from the book Microsoft Azure Essentials Azure Machine Learning.


There are several different categories of machine learning algorithms in Azure Machine Learning toolkit. We will list them below:

  • Classification algorithms

These are used to classify data into different categories that can then be used to predict one or more discrete variables, based on the other attributes in the dataset.

  • Regression algorithms

These are used to predict on or more continuous variables, such as profit or loss, based on other attributes in the dataset.

  • Clustering algorithms

These determine natural groupings and patterns in datasets and are used to predict grouping classifications for a given variable.


Supervised learning

  • Classification and Regression ( Input data is know, output is know)

Unsupervised learning

  • Clustering

Workflow for supervised learning

ML experiment workflow


Azure Machine Learning provides a way of applying historical data to a problem by creating a model and using it to successfully predict future behaviors or trends. We have briefly touched the continuous cycle of predictive model creation, model evaluation, model deployment, and the testing and feedback loop.

The primary predictive analytics algorithms currently used in Azure Machine Learning are classification, regression, and clustering.




Now we will try our own experiment. We will use data from a public repository UCI Machine Learning Repository, the data is a Census Income Dataset. The dataset is a 15 x 32526 matrix, the column income is the value that we are going to try and predict, the prediction will be based on the other 14 attributes, more on those later.                                                                                                                           

First step is to upload the dataset to Azure ML, click on new dataset and then upload.

thML experiment



Second, we create a new Azure ML experiment.

Experiment screenshot


Once the data is uploaded and you have your experiment created we can have a first glimpse of the data. Drag the data from the left panel to the workspace in the middle. We can easily visualize the data set by right-click and then visualize. It’s always a nice to get a first feeling of the data.

Visualize data

I will now let you experiment on your own and jump straight ahead to the finished model. Below you can see a screenshot from the workspace which includes all of the different step. The algorithm that was choosen for the first run is a Two Class Boosted Decision Tree.

full model

Let us look at the result afte running the experiment. We right-click on the Evaluate-Model block and press visualize.

In general, classification models are evaluated according to these metrics.

  • Accuracy measures the goodness of a classification model as the proportion of true results to total cases.
  • Precision is the proportion of true results over all positive results.
  • Recall is the fraction of all correct results returned by the model.
  • F-score is computed as the weighted average of precision and recall between 0 and 1, where the ideal F-score value is 1.
  • AUC measures the area under the curve plotted with true positives on the y axis and false positives on the x axis. This metric is useful because it provides a single number that lets you compare models of different types.
  • Average log loss is a single score used to express the penalty for wrong results. It is calculated as the difference between two probability distributions – the true one, and the one in the model.
  • Training log loss is a single score that represents the advantage of the classifier over a random prediction.

We are presented with the ROC curve of the model,

Receiver Operator Characteristic (ROC) curves This format displays the fraction of true positives out of the total actual positives. It contrasts this with the fraction of false positives out of the total negatives, at various threshold settings. The diagonal line represents 50 percent accuracy in your predictions and can be used as a benchmark that you want to improve. The higher and further to the left, the more accurate the model is.

the straight line shows a model with a 50 % chance of prediciting the right value, we want our curve to be above the straight line, as you can see our model lies above.ROC curve

Furthermore, we can look at some more result from our model if we scroll down below the ROC curve. See the table below.



data result

So, that was the first run of our model. If we are happy with the outcome we can move on to the next step, If not, we can either try the same model again with different parameters or try a completely different model.

Im satisified at the moment so I feel confident in moving to the next step which will be to set up a web service and publish our model.

See you in the next blogpost!


Language Understanding Intelligent Service (LUIS) — 29 juli, 2016

Language Understanding Intelligent Service (LUIS)

Im playing around with Microsoft’s Language Understanding Intelligent Service (LUIS), LUIS is a way to add language understanding to applicatons. Follow me for an example.

I am myself not familiar with the terminology yet so if there are any words described that you do not understand, don’t ask me beacuse I probably dont understand them either! Im probably gonna steal most of the information from Microsoft’s tutorial. If you wanna try yourself, here is the link

The first step to using LUIS is to create an application. In the application, you will bundle together the intents and entities that are important to your task. Let’s take the example of a news-browsing application. Two intents relevant to this domain are ”FindNews” and ”SendTo,” to share stories with friends. Two entities that are important are ”Topic” and ”Recipient.” When a user interacts with the system, once we have identified the intent and entity, we can take an appropriate action. Let’s go through the steps of creating the application.

Click on ”My Applications” and then the ”New Application” button to create a new application. In the dialog box, name it ”NewsChat”. Check that the application culture is set to English. Then click ”Add App”. This will create the application and take you to the LUIS Application Editor:

Linear regression: Predicting household energy consumption — 10 juni, 2016

Linear regression: Predicting household energy consumption

Download data from UCI Machine learning repository.

In the Machine learning studio you need to import your data. If your data is stored locally you can do this by pressing new in the bottom left corner and choose datasets. When the data has been loaded the experiment can start!

For this experiment we will use a linear regression model. You will find it under Machine Learning–> Initialize model–> Linear regression. Drag the model over to the workspace and do the same for the data set. We can inspect the data by right clicking on the data set. Here we see that the data is relly messed up…great!!! Im finished for today.