sâmbătă, 7 februarie 2015

Out Googling Google on Big Data Searches


PORTLAND, Ore. — Almost every search algorithm through unstructured Big Data uses a technique called latent Dirichlet allocation (LDA). Northwestern University professor Luis Amaral became curious as to why LDA-based searches appear to be 90 percent inaccurate and unrepeatable 80 percent of the time, often delivering different "hit lists" for the same search string. To solve the conundrum Amaral took apart LDA, found its flaws, and fixed them.






Now he is offering the improved version, which not only returns more accurate results but returns exactly the same list every time it is used on the same database. He's offering all this for free to Google, Yahoo, Watson, and any other search engine makers — from recommendation systems to spam filtering to digital image processing and scientific investigation.

"The common algorithmic implementation of the LDA model is incredibly naive," Amaral told EE Times. "First, there is this unrealistic belief that one is able to detect topics when documents have a significant mixture of topics. Our systematic analysis reveals that as soon as the corpus is generated with a large value of alpha (which in LDA controls the amount of mixing of topics in documents), its algorithms fail miserably."



Source:
http://www.eetimes.com/document.asp?doc_id=1325551&amp

Postări populare