Alain Antone's Library tagged → View Popular, Search in Google
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
in list: Formation Angie
-
Yahoo! Search BOSS
BOSS (Build your Own Search Service) is Yahoo!'s open search web services platform. The goal of BOSS is simple: to foster innovation in the search industry. Developers, start-ups, and large Internet companies can use BOSS to build and launch web-scale search products that utilize the entire Yahoo! Search index. BOSS gives you access to Yahoo!'s investments in crawling and indexing, ranking and relevancy algorithms, and powerful infrastructure. By combining your unique assets and ideas with our search technology assets, BOSS is a platform for the next generation of search innovation, serving hundreds of millions of users across the Web.
Mapstraction is a library that provides a common API for various javascript mapping APIs to enable switching from one to another as smoothly as possible. Developers can code their applications once, and then easily switch mapping provider based on project needs, terms and conditions, and new functionality.
Users can switch maps as desired based on personal taste and quality of maps in their local area. Various tools built on top of Mapstraction allow users to easily integrate maps into their own sites, and configure them with different controls, styles, and provider
in list: Formation Angie
As a real time news organization, TPM is obsessed with stats. For both our editorial process and publishing strategy, we need to know how our audience is interacting with us. But as an organization built from scratch we’ve always managed to be Ramen Profitable by using free or inexpensive software solutions. For metrics, that meant Google Analytics. Their application is excellent, but it is slow. Accurate data isn’t reported for hours, sometimes a day later. Chartbeat’s, by contrast, is real time. It reports exactly how many people are sitting on these stories *right now*. Neither one of these tools paints a full picture of site activity, but put together they form a powerful analytics package.
Using Chartbeat’s API (alongside the new Google Analytics API), I have developed a few tools for TPM that have changed our workflow to account for the instant feedback our readers are giving us with their clicks.
-
L’enjeu est donc de porter les politiques éditoriales vers une logique de diffusion multicanal, réticulaire pour être plus exact, qui déborde le sillon unique actuel de la chaîne de production : choix et validation validation des œuvres, mise en page, impression, production des livres et diffusion.
-
La clé de cette diffusion réticulaire ce sont les APIs en tant qu’interfaces d’accès aux données numériques de la maison d’édition.
- 1 more annotation(s)...
Selected Tags
Related Tags
Top Contributors
Groups interested in api
-
web dev
My web-dev links
Items: 1 | Visits: 92
Created by: toboby
-
Web services based on service
Items: 28 | Visits: 57
Created by: Hexy Hwang
Highlighter, Sticky notes, Tagging, Groups and Network: integrated suite dramatically boosting research productivity. Learn more »
Join Diigo
