From reducing email clutter to machine learning: probabilistic programming goes large scale



How is it you seem to be spending more and more time every day sifting through and prioritizing email messages? According to research by The Radicati Group, Inc., the legitimate emails you receive — already upwards of 100 per day — will only continue to increase.


George Thomas Jr. writes:

So how can you stem the tide of information overload without sacrificing more of your already precious time?

That's where probabilistic programming becomes relevant to Microsoft's efforts to enhance productivity. In what is believed to be the first large-scale commercial use of this innovative programming paradigm, a recently released feature in Office 365 called Clutter intelligently learns which emails matter most to you and sorts them accordingly, filtering those less-urgent emails into a Clutter folder and allowing users to focus on the most immediately important emails.

Given that computer programming is based on precision coding, "probabilistic programming" may seem like an oxymoron, but probability and uncertainty actually are key to its charm, especially when applied to machine learning.

"It's a way of doing machine learning without writing algorithms," says John Winn, a Principal Researcher at Microsoft's research lab in Cambridge, U.K., whose team collaborated on the development of Clutter.

In standard computer programs, variables have set values, but in probabilistic programs, variables can have uncertain values. For example, a variable could have a value some-number-between-1-and-100. This ability allows for don't-know variables in the program, the values of which you want to learn from data.

Probabilistic programming is a method of reasoning backwards from given data by assuming it was the output of the program with some setting of the don't-know variables. This lets you learn about the value of these variables, without having to write a machine learning algorithm.

One tool for probabilistic programming is Infer.NET, a .NET compiler and runtime Winn and his colleagues created and in turn applied to the development of Clutter.

"You write a simulator of the world, which is a probabilistic program, and Infer.NET takes that program and runs it backwards," he says.

The machine learning underlying Clutter examines how users prioritize email and, over time, learns from users' patterns and infers which emails can automatically be sorted into the Clutter folder for later reading, thus reducing inbox clutter.

Clutter product manager Jim Edelen says the feature is among users' favorite of Outlook in Office 365, with customers proactively singing its praises.

"I've worked on few other features like Clutter," adds Kumar Venkateswar, also a Clutter product manager. "Infer.NET has created a productivity-enhancing experience that users describe as 'life-changing!'"

The benefits of large-scale probabilistic programming are plentiful, Winn says. "It's less coding, which allows rapid prototyping, makes it much faster to develop machine learning systems, is easier to debug — there are tons of advantages.”

Winn says his team is continuing to enhance Clutter and notes potential applications of this technology in Bing and Office, as well as other Microsoft products, though he can't share any details yet.

Still, he says, "We're very excited about where this technology is going and what we can do with it next. Clutter is just the beginning of what we can do with probabilistic programming. It's a new way of writing programs. You can use it to do any machine learning task."

See also: Microsoft's Machine Learning and Perception research group

To read more information, click here.

Microsoft Research Ltd

The Microsoft Research Cambridge laboratory was set up in July 1997 and was Microsoft Corporation's first research laboratory established outside the United States. Today, 100 researchers, mostly from Europe, are engaged in computer research at the lab.

Microsoft Research Ltd directory information