How Statisticians Found Air France Flight 447 Two Years After It Crashed Into Atlantic

How Statisticians Found Air France Flight 447 Two Years After It Crashed Into Atlantic

After more than a year of unsuccessful searching, authorities called in an elite group of statisticians. Working on their recommendations, the next search found the wreckage just a week later.

“In the early morning hours of June 1, 2009, Air France Flight AF 447, with 228 passengers and crew aboard, disappeared during stormy weather over the Atlantic while on a flight from Rio de Janeiro to Paris.” So begin Lawrence Stone and colleagues from Metron Scientific Solutions in Reston, Virginia, in describing their role in the discovery of the wreckage almost two years after the loss of the aircraft.

Stone and co are statisticians who were brought in to reëxamine the evidence after four intensive searches had failed to find the aircraft. What’s interesting about this story is that their analysis pointed to a location not far from the last known position, in an area that had almost certainly been searched soon after the disaster. The wreckage was found almost exactly where they predicted at a depth of 14,000 feet after only one week’s additional search.

Today, Stone and co explain how they did it. Their approach was to use a technique known as Bayesian inference which takes into account all the prior information known about the crash location as well as the evidence from the unsuccessful search efforts. The result is a probability distribution for the location of the wreckage.

Share your opinion with us by leaving comments below. Thanks for reading. Share this article if you think that it would be useful for others. If you have any suggestions, don’t hesitate to send us a messageor leave comments.

 

The economics of adultery

The economics of adultery

The financial crisis of 2008 may have driven many people to betray their wedding vows, according to data from Ashley Madison, an unusual and apparently very popular dating Web site for those seeking extramarital relations.

Ashley Madison has expanded rapidly, but 2008 was a banner year for the company. According to the site, membership swelled 166 percent worldwide that year and 192 percent in the United States, compared with average yearly growth of 50 percent worldwide and 71 percent domestically since the site’s launch 12 years ago. Each month, around 130 million people around the world visit Ashley Madison.

Analysts at Ashley Madison found evidence of a relationship between the economy and infidelity when they examined user data in individual states. They compared the change in the number of employed people in each state with the growth in Ashley Madison’s membership there. The tentative conclusion: People who’ve lost their jobs might be more likely to cheat — or, at least, are more likely to sign up for an adultery dating site.

What do you think about this article? Interesting? Leave your comments below or send us a message.

Hadoop and Data Warehouses

Hadoop and Data Warehouses

I see a lot of confusion when it comes to Hadoop and its role in a data warehouse solution.  Hadoop should not be a replacement for a data warehouse, but rather should augment/complement a data warehouse.  Hadoop and a data warehouse will often work together in a single information supply chain: Hadoop excels in handling raw, unstructured and complex data with vast programming flexibility; Data warehouses, on the other hand, manage structured data, integrating subject areas and providing interactive performance through BI tools.

There are three main use cases for Hadoop with a data warehouse, with the above picture an example of use case 3:

1. Archiving data warehouse data to Hadoop (move)
Hadoop as cold storage/Long Term Raw Data Archiving:
– So don’t need to buy bigger PDW or SAN or tape

2. Exporting relational data to Hadoop (copy)
Hadoop as backup/DR, analysis, cloud use:
– Export conformed dimensions to compare incoming raw data with what is already in PDW
– Can use dimensions against older fact table
– Sending validated relational data to Hadoop
– Hadoop data to WASB and have that used by other tools/products (i.e. Cloud ML Studio)
– Incremental Hadoop load / report

3. Importing Hadoop data into data warehouse (copy)
Hadoop as staging area:
– Great for real-time data, social networks, sensor data, log data, automated data, RFID data (ambient data)
– Where you can capture the data and only pass the relevant data to PDW
– Can do processing of the data as it sits in Hadoop (clean it, aggregate it, transform it)
– Some processing is better done on Hadoop instead of SSIS
– Way to keep staging data
– Long-term raw data archiving on cheap storage that is online all the time (instead of tape) – great if need to keep the data for legal reasons
– Others can do analysis on it and later pull it into data warehouse if find something useful

Thanks for reading this article. If you have any opinions, please leave a comment below or send us a message

The TOFU (Top of Funnel Users) Approach to Business Intelligence

The TOFU (Top of Funnel Users) Approach to Business Intelligence

An interesting article in Forbes.com entitled, “Why Top Of The Funnel BI Will Drive The Next Wave Of Adoption”, written by Dan Woods, sparked some great conversations about bottom of the funnel users (20-30% wanting specific business information), and Top of Funnel Users (or TOFU) that want to interact with information in a personalized way and express their interests. I was fortunate to have Matt Milella, Director of Product Development for Oracle Business Intelligence Mobile Apps, and Jacques Vigeant, Product Strategy Director for Oracle Business Intelligence & Enterprise Performance Management, join me for a podcast to discuss their opinions about “The TOFU approach to business intelligence (BI)”.

Jacques explained that the article is basically about how BI has historically focused on what we refer to as the ‘business analyst’ or the ‘power user’. That’s the person in a company that has the unenviable task of analyzing data, finding trends, and synthesizing data into dashboards that he/she then shares with management. The common thinking, in BI companies, is that roughly 20% of the users prepare data that the ‘rest of us’ consume. There are many practical and technical reasons why BI started using this model 30 years ago, but the world of technology has come a long way since then. Today, the average user can do much more with much less help from IT.

Do you think that this article is interesting? Do you have any opinions? Thank you for reading. If you have any questions, send us a messageor leave a comment below.

Box Partners With Roambi To Attack The BI Market

Box Partners With Roambi To Attack The BI Market

Box and Roambi just announced a partnership that is both old fashioned and empowering, and may be an accelerator for companies struggling to expand the use of data without creating a mess.

Spreadsheets are at the core of the Top of the Funnel BI challenge that companies all over the world have faced for decades. The challenge defined by TOFU BI (as I’ve discussed in “Why Top of the Funnel BI Will Drive the Next Wave of Adoption”) is how do you get everyone in the enterprise using data to maximum effect.

… the point of this partnership is to keep the wildly popular paradigm of self-service spreadsheets and add a delivery mechanism created for the modern, mobile world. Both Box and Roambi are well suited to solve parts of the problem and work together. Box acts as the repository that helps control the sprawl of hundreds or thousands of spreadsheets and makes them manageable. Roambi Analytics extracts data from spreadsheets and other sources and creates attractive dashboards or e-books (in the Roambi Flow product) that present data in an attractive way. …

If you have any opinions, leave it in the comment box or feel free to send us a message.

New Approaches to Analytics to Revolutionize Logistics

New Approaches to Analytics to Revolutionize Logistics

Three stages are commonly used to categorize an organizations maturity in their use of business intelligence and analytics technologies:

  1. Descriptive: What happened in the past?
  2. Predictive: What will (probably) happen in the future?
  3. Prescriptive: What should we do to change the future?

Descriptive analytics typically means good old fashioned business intelligence (BI) – reports and dashboards.  But, there is a newish technology in the Descriptive category – one that I might argue is worthy of a category in its own right.  That technology is visual data discovery.  The visual data discovery approach has a rapidly growing fan base for many reasons, but one stands out:  It increases the probability that business managers will find the information they need in time to influence their decisions.

Visual data discovery tools typically provide:

  1. Unrestricted navigation through, and exploration of, data.
  2. Rich data visualization so that information can be comprehended rapidly.
  3. The ability to introduce new data sources into an analysis to expand it further.

By helping to answer a different class of question – the unanticipated one – visual data discovery tools increase the probability that managers will find the information they need in time to influence their decisions.  And that, after all, is the only valid reason for investing in business intelligence solutions.

If you have any opinions, you are welcome to leave a comment or send us message.

Bywaters waste management uses BI to improve customers’ recycling

Bywaters waste management uses BI to improve customers’ recycling

Bywaters, a recycling and waste management company, has improved productivity by 4% using Pentaho data integration and business intelligence software.

Sasha Korniak, head of analytics and data science at Bywaters, masterminded the project at the family-owned company, which operates nationally, and includes Nandos, Guy’s and St. Thomas’ Hospital, and BNP Paribas among its 2,000-plus customers.

“I wanted Bywaters to embrace a data-driven culture that would give authority and confidence to make autonomous decisions substantiated by credible data and enable consumers to increase recycling and sustainability,” says Korniak.

“We are no longer just a waste management company, we are a waste consultancy, improving our customers’ recycling through providing the data”, says Korniak. “If you are not data driven, but just go out and collect bins, the sustainability of your business will be damaged”.

If you have any opinions, leave a comment below or send us a message.

A Harvest of Company Details, All in One Basket

A Harvest of Company Details, All in One Basket

Trolling government records for juicy details about companies and their executives can be a ponderous task. I often find myself querying the websites of multiple federal agencies, each using its own particular terminology and data forms, just for a glimpse of one company’s business.

But a few new services aim to reduce that friction not just for reporters, but also for investors and companies that might use the information in making business decisions. One site, rankandfiled.com, is designed to make company filings with the Securities and Exchange Commission more intelligible. It also offers visitors an instant snapshot of industry relationships, in a multicolored “influence” graph that charts the various companies in which a business’s officers and directors own shares. According to the site, pooh-bahs at Google, for example, have held shares in Apple, Netflix, LinkedIn, Zynga, Cisco, Amazon and Pixar.

Another site, Enigma.io, has obtained, standardized and collated thousands of data sets — including information on companies’ lobbying activities and their contributions to state election campaigns — made public by federal and state agencies. Starting this weekend, the public will be able to use it, at no charge, to seek information about a single company across dozens of government sources at once.

Welcome to leave any comment below or send us a message

Why Google Flu is a failure: the hubris of big data

Why Google Flu is a failure: the hubris of big data

People with the flu (the influenza virus, that is) will probably go online to find out how to treat it, or to search for other information about the flu. So Google decided to track such behavior, hoping it might be able to predict flu outbreaks even faster than traditional health authorities such as the Centers for Disease Control (CDC).

Instead, as the authors of a new article in Science explain, we got “big data hubris.” David Lazer and colleagues explain that:
“Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.

The problem is that most people don’t know what “the flu” is, and relying on Google searches by people who may be utterly ignorant about the flu does not produce useful information. Or to put it another way, a huge collection of misinformation cannot produce a small gem of true information. Like it or not, a big pile of dreck can only produce more dreck. GIGO, as they say.

Google’s scientist first announced Google Flu in a Nature article in 2009. With what now seems to be a textbook definition of hubris, they wrote:
“…we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day.”

If you have any opinion, feel free to send us a messageor leave your comment below.

 

KB – Neural Data Mining with Python sources

KB – Neural Data Mining with Python sources

The aim of this book is to present and describe in detail the algorithms to extract the knowledge hidden inside data using Python language, which allows us to read and easily understand the nature and the characteristics of the rules of the computing utilized, as opposed to what happens in commercial applications, which are available only in the form of running codes, which remain impossible to modify.

The algorithms of computing contained within the book are minutely described, documented and available in the Python source format, and serve to extract the hidden knowledge within the data whether they are textual or numerical kinds. There are also various examples of usage, underlining the characteristics, method of execution and providing comments on the obtained results.

You are welcome to share your opinion in the comment box below or send us a message.