Skip to main content

Mar
23
2011

The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.

Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.

boilerpipe text html extraction java code web api

Jul
11
2008

  •  
     
     
     
     
     
     
     
     

    Yahoo! Search BOSS

     

    BOSS (Build your Own Search Service) is Yahoo!'s open search web services platform. The goal of BOSS is simple: to  foster innovation in the search industry. Developers, start-ups, and large Internet companies can use BOSS to  build and launch web-scale search products that utilize the entire Yahoo! Search index. BOSS gives you access  to Yahoo!'s investments in crawling and indexing, ranking and relevancy algorithms, and powerful  infrastructure. By combining your unique assets and ideas with our search technology assets, BOSS is  a platform for the next generation of search innovation, serving hundreds of millions  of users across the Web.

Jan
20
2010

Mapstraction is a library that provides a common API for various javascript mapping APIs to enable switching from one to another as smoothly as possible. Developers can code their applications once, and then easily switch mapping provider based on project needs, terms and conditions, and new functionality.


Users can switch maps as desired based on personal taste and quality of maps in their local area. Various tools built on top of Mapstraction allow users to easily integrate maps into their own sites, and configure them with different controls, styles, and provider

api mapping javascript maps mapstraction googlemaps map framework

in list: Formation Angie

Sep
18
2009

As a real time news organization, TPM is obsessed with stats. For both our editorial process and publishing strategy, we need to know how our audience is interacting with us. But as an organization built from scratch we’ve always managed to be Ramen Profitable by using free or inexpensive software solutions. For metrics, that meant Google Analytics. Their application is excellent, but it is slow. Accurate data isn’t reported for hours, sometimes a day later. Chartbeat’s, by contrast, is real time. It reports exactly how many people are sitting on these stories *right now*. Neither one of these tools paints a full picture of site activity, but put together they form a powerful analytics package.

Using Chartbeat’s API (alongside the new Google Analytics API), I have developed a few tools for TPM that have changed our workflow to account for the instant feedback our readers are giving us with their clicks.

media data Chartbeat api analysis

Oct
2
2008

  • L’enjeu est donc de porter les politiques éditoriales vers une logique de diffusion multicanal, réticulaire pour être plus exact,  qui déborde le sillon unique actuel de la chaîne de production : choix et validation validation des œuvres, mise en page, impression, production des livres et diffusion.
  • La clé de cette diffusion réticulaire ce sont les APIs en tant qu’interfaces d’accès aux données numériques de la maison d’édition.
  • 1 more annotation(s)...
1 - 20 of 26 Next ›
Showing 20 items per page

Highlighter, Sticky notes, Tagging, Groups and Network: integrated suite dramatically boosting research productivity. Learn more »

Join Diigo
Move to top