Kaskade, the newly-appointed CEO of Infochimps, believes CIOs are ready to embrace open source big data software and that the established IT players, lacking open source experience, will have to buy their way into the market.
FORBES: Infochimps' New CEO on the Next Big Data Acquisition and Getting Rid of Data Scientists
But adoption of open source software does not necessarily mean adoption of open source-based big data technologies.
FORBES: Infochimps' New CEO on the Next Big Data Acquisition and Getting Rid of Data Scientists
An intercontinental project that bridged citizen science, open data, open source hardware, civic hacking and the Internet of things to monitor, share and map radiation data?
Zero coincidentally, that same day EMC released an open source version of its data warehousing software, hoping to attract the kind of developers who might otherwise work on R.
Built on top of LucidWorks Search, LucidWorks Big Data certifies and integrates all of the Apache open source components necessary to develop and manage a big data application, including Hadoop, Mahout, HBase, Zookeeper, Pig, and other software tools.
Pentaho is an open source business intelligence and big data analytics company that was founded by five deeply nerdy people.
FORBES: Ideas for Solving the 'Data' Problem First, the 'Big' Problem Second: The Pentaho Way
With the amount of digital data in data centers growing annually and storage budgets highly constrained open source storage system designs can fill an important niche for data centers and IT people comfortable with installing and maintaining their own hardware.
But what if the open source community builds products that automate Big Data complexity on commodity hardware?
FORBES: Hadoop Alpha Geeks Disrupting The Rapidly Growing Big-Data Market
Bringing this experience with search and the open source community to bear on emerging big data needs, LucidWorks launched in May a big data beta project.
The Open Compute Project intends to develop servers and data centers that follow the model traditionally associated with open source software projects.
"The future is sharing -- open data, open participation, open source, open everything, " Newsom writes.
Another, called Bloom, was developed by computer scientists at UC Berkeley for analysis of large data sets in the popular open source Hadoop database.
But if you take into account the use of open source products like Xen and KVM in massive data centers, web server farms, and by web application providers like Google and Salesforce.com, the CPUs running open source virtualization likely dwarfs the commercial market.
See my stories on Forbes about NYSE Technologies and big data, data analytics, and providing an open source middleware service.
FORBES: IntercontinentalExchange (ICE) Merger With NYSE Euronext Will Reshape Markets
Today, less than 5% of revenues come from emerging Big Data pure play vendors and open-source contributors such as Cloudera, Hortonworks and others.
FORBES: Hadoop Alpha Geeks Disrupting The Rapidly Growing Big-Data Market
But unlike those projects, OONI uses only open-source software and plans to make the raw data gathered by its tools public and accessible to any researcher.
FORBES: The Tor Project's New Tool Aims To Map Out Internet Censorship
It works well when cities open their data to the public or when, as in the case of Wikipedia, an open-source software platform supports open-source content.
FORBES: Why Open-Source Principles Are a Recipe For Innovation
Financial firms are sometimes reluctant to look at big data because they associate it with the open source Hadoop, he added.
But ubiquitous digital consumer data, cheap computing power and open-source software have created a tremendous opportunity for smart marketers to change the research game.
FORBES: Advertising Week 2010: Digital Fuses Data, Media Buying
EBay also adopted new tools for big-data processing, such as the open-source software framework Hadoop and a system called Mobius that was developed in its labs.
The Bill and Melinda Gates Foundation demands that every research project it funds has to make its full data set freely available, like open-source software code.
Facebook released server, data center and infrastructure technology today, making it open source for others to build on.
FORBES: Facebook Open Sources Its Server and Data Center Technology With Open Compute
Google Scholar, the open source database used to identify published cases, is not a complete data base of state cases involving Shariah because as a search tool it allows retrieval only of those reported appellate and trial court level cases that are available through open sources.
The open source software also contains a journal and automatically saves and backs up all data.
Hbasecon highlights the growing importance of the open-source community as leaders in expanding the pure performance and capability as the big data market rapidly expands.
FORBES: Hadoop Alpha Geeks Disrupting The Rapidly Growing Big-Data Market
And outside the open source search domain, companies like Rosslyn Analytics have developed a platform approach to complex data integration problems.
FORBES: How HP Could Have Saved Itself $10 Billion (And Its Share Price From Falling)
Data on movie selections are not particularly sensitive, but this release of supposedly anonymized data could set a precedent for our health insurance companies to start having open source competitions.
Erlang, developed by telecommunications company Ericsson for massively parallel computing in a phone network, then later released as an open source language, both influenced Scala and is enjoying some rebirth in the world of global data centers.
Unlike early pioneers SAS and SPSS (now owned by IBM), Revolution Analytics fits the mold of other key new players in the big data world: it supports and sells the commercial version of an open-source product, built and nurtured over many years by a dedicated community of enthusiasts.
While an open source approach isn't the right solution for every software need, using and contributing back to open source software is one way that we're making it easier for the government to share data, improve tools and services, and return value to taxpayers.
应用推荐