- 437information,
- 369processing,
- 243network,
- 237social,
- 209conference,
- 208google,
- 164software,
- 164web,
- 163algorithm,
- 1592008
A set-similarity join (SSJ) finds pairs of set-based records such that each pair is similar enough based on a similarity function and a threshold. Many applications require efficient SSJ solutions, such as record linkage and plagiarism detection. This talk studies how to efficiently perform SSJs on large data sets using Hadoop. It proposes a 3-stage approach to the problem, to efficiently partition the data across nodes to balance the workload and minimize the need for replication. It reports results from extensive experiments on real datasets, synthetically increased in size, to evaluate the speedup and scaleup properties of the proposed algorithms using Hadoop.
Similarity join in distance spaces constrained by the metric postulates is the necessary complement of more famous similarity range and the nearest neighbors search primitives. However, the quadratic computational complexity of similarity joins prevents from applications on large data collections. We first study the underlying principles of such joins and suggest three categories of implementation strategies based on filtering, partitioning, or similarity range searching. Then we study an application of the D-index to implement the most promising alternative of range searching. Though also this approach is not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.
-
Agricultural officials believe the disease may have come from a graft of pomelo grapefruit on the lemon tree. Now, they are trying to trace the graft back to its source and hoping to slow the spread of the disease by destroying diseased trees and mandating spraying around them
-
There’s no doubt that spraying selected insecticides can help control populations of Asian citrus psyllids, which are responsible for transmitting citrus greening disease or huanglongbing
-
Huanglongbing (HLB) was discovered in a backyard in Hacienda Heights.
-
HLB--Yellow Dragon Disease--is known in the West as Citrus Greening and is a bacterial disease with no known cure. Its vector is the Asian psyllid, an aphid sized citrus pest that can transmit the disease from tree to tree. HLB was first discovered in Southeast Asia in the early 20th century and has been slowly devastating the citrus growing world
-
US seismologists then cancelled the tsunami warning two hours after the first Indonesian quake at 2.30pm (5.30pm AEST) yesterda
-
The initial earthquake struck 431km off the Indonesian city of Banda Aceh, on the island of Sumatra, at 6.38pm (AEST) and at a depth of 33km
-
An 8.2-magnitude aftershock struck off the coast of Sumatra island at 8.43pm (AEST
- 2 more annotation(s)...
-
The Pacific Tsunami Warning Centre in Hawaii said a tsunami watch was in effect for Indonesia, India, Sri Lanka, Australia, Myanmar, Thailand, the Maldives and other Indian Ocean islands, Malaysia, Pakistan, Somalia, Oman, Iran, Bangladesh, Kenya, South Africa and Singapore.
A tsunami watch means there is the potential for a tsunami, not that one is imminent
-
The US Geological Survey said an 8.7-magnitude earthquake struck off the coast of Sumatra at 1408 IST at a depth of 33 kilometers
-
Two hours after the quakes - one with a magnitude of 8.6, the other measuring 8.3
-
The US Geological Survey (USGS), which documents quakes worldwide, said the first Aceh quake was centred at a depth of 33km (20 miles), about 495km from Banda Aceh, the provincial capital.
It was initially reported as 8.9 magnitude but was later revised down to 8.6 by the USGS
Top Tags
View All Recent Tags (11)
Jeff Kubina's Public Lists (3)
Diigo is about better ways to research, share and collaborate on information. Learn more »
Join Diigo