Quite which way is likely to produce what we all finally desire, a machine intelligence that we can querywith natural language and get the answers based upon all of human knowledge, is as yet unknown.
When they sell it, the entity who buys it now will be looking to querywith other data and this is where the flawed data and Algo duping appears as who knows what other linear or non linear data will be rolled in here.