
The problem of so-called ‘dirty data’ – data that contains duplications, omissions or other errors – has been a serious issue in corporate IT for many a year. Analyst firm Gartner said in March last year that three-quarters of large enterprises will make little to no progress towards improving data quality until 2010, potentially costing large firms millions of dollars.
But is there an open source solution to the problem? [click Continue Reading for more on this entry]
Gartner insisted that, “To gain competitive advantage from information, organisations need to identify ‘data stewards’ in the business and manage information as a corporate asset.”
Gartner Research vice president Andreas Bitterer said that ‘dirty data’ or poor data quality is an often-overlooked business issue and it can have a large negative impact on a business.
There are numerous commercial data quality tools on the market already, from the likes of Informatica, DataFlux, Datanomic and many more.
But today Talend announced what it is claiming is the first open source data profiler, which it says enables companies to assess the quality of data and decide which actions must be taken to correct ‘dirty data’ that irritates customers and can cost companies time and money.
Talend argues the first step in improving the quality of a company’s data is to ‘profile’ or evaluate the data.
“Talend Open Profiler provides project teams with the ability to understand the characteristics of their data and discover its quality level,” the firm said. “Accurate profiling reduces the time and resources needed to find problematic data and allows companies to identify potential problems before beginning data-intensive projects such as data integration or new application development. It also allows business analysts to have more control over the maintenance and management of the data.”

Spock was horrified to discover Data was 'dirty'. Pic TM & © 2008 CBS Studios Inc.
As well as the new Open Profiler tool, Talend offers a range of open source data integration software: Talend Open Studio, its flagship data integration product provided at no cost under a GPL license; Talend Integration Suite, a subscription-based service that extends the functionality of Talend Open Studio; and Talend On Demand, said to be the first software-as-a-service (SaaS) open data integration tool.
“Organizations are spending a ton of money to share data among different departments but far too often this data is incorrect,” said Bertrand Diard, CEO and co-founder of Talend. “Companies in every business face significant losses and inefficiencies that are caused directly by poor data quality. Yet very few organizations are equipped with professional data quality solutions. The first open source data profiling product... leverages Talend’s well-respected data expertise to help companies understand and regain control of the quality of their data.”
There are two things to bear in mind when it comes to data quality tools, however. The first is that data is not static: it not only increases in volume rapidly but it is often in a state of constant flux. For that reason, a data quality initiative should not be seen as a one-off project but as ongoing.
Secondly, although technology can help, ‘dirty data’ is not an IT problem, it is a business problem. If companies do not tackle the source of the ‘dirty data’ – which usually means going right back to the point of its creation, usually by business users – and elect ‘data stewards’ to keep a check on data quality, they are unlikely to ever get ‘dirty data’ under control.
“Data quality is not an IT problem,” agrees Gartner’s Bitterer. “IT can help fix it, but the business must own the problem. For example, company culture can have a significant influence. Organisations need ‘data stewards’, people within the business who are responsible for the quality of the information. However, technology will play a role in fixing many data quality issues, and organisations need to invest in a portfolio of data quality solutions such as profiling, cleansing, matching and enrichment.”
Talend Open Profiler is available now at no cost under a GPL license. To download, visit www.talend.com.