harry 's Profile

Member since Jul 07, 2006, follows 0 people, 0 public groups, 606 public bookmarks (609 total).

More »
Tags

Recent Tags:
Top Tags:

More »
Recent Bookmarks and Annotations

  • Welcome — Ruby Enterprise Edition on 2009-11-20
        • A copy-on-write friendly garbage collector. Phusion Passenger uses
          this, in combination with a technique called preforking, to reduce Ruby
          on Rails applications' memory usage by 33% on average.
        • An improved memory allocator called tcmalloc,
          which improves performance quite a bit.
        • The ability to tweak garbage collector settings for maximum server performance,
          and the ability to inspect the garbage collector's state.
          (RailsBench GC patch)
        • The ability to dump stack traces for all running threads
          (caller_for_all_threads),
          making it easier for one to debug multithreaded Ruby web applications.


  • Moses - Moses/RoadMap on 2009-10-19
      • Chart parse-decoding: to support hierarchical models and syntax-based translation models
      • Depth-first decoding: to provide anytime algorithms for decoding
      • Forced decoding: to compute scores for provided output
      • Suffix-array translation models: an alternative way to store large rule-sets without the need to translate them
      • Maximum entropy translation models: translation models that incorporate additional source-side and context information for scoring translation rules.
  • Moses - Moses/FactoredModels on 2009-10-19
    • The training data (a parallel corpus) has to be annotated with the additional factors. For instance, if we want to add part-of-speech information on the input and output side, we need to obtain part-of-speech tagged training data. Typically this involves running automatic tools on the corpus, since manually annotated corpora are rare and expensive to produce.
  • Performance comparison: key/value stores for language model counts - Brendan O'Connor's Blog on 2009-10-19
    • I’m doing word and bigram counts on a corpus of tweets. I want to store and rapidly retrieve them later for language model purposes. So there’s a big table of counts that get incremented many times. The easiest way to get something running is to use an open-source key/value store; but which? There’s recently been some development in this area so I thought it would be good to revisit and evaluate some options.
  • Moses - FactoredTraining/BuildingLanguageModel on 2009-10-19
    • Typically, LM estimation starts with the collection of n-grams and their frequency counters. Then, smoothing parameters
      are estimated for each n-gram level; infrequent n-grams are possibly pruned and, finally, a LM file is
      created containing n-grams with probabilities and back-off weights. This procedure can be very demanding
      in terms of memory and time if applied to huge corpora. IRSTLM provides a simple way to split LM training
      into smaller and independent steps, which can be distributed among independent processes.
  • Moses Language Model Howto v2 - GuardianiUS on 2009-10-16

    • Warning: This script runs GIZA++'s snt2cooc.out in the background. Use the following command if you have 1.5gigs of RAM or more. Otherwise skip it and try the alternative command below:

  • New Relic .:. On-Demand Application Management on 2009-10-16
    • With New Relic RPM, you can monitor 24x7, detect problems in real-time, drill down to find the causes, and continuously tune for high performance.
  • Twitter on Scala on 2009-10-15
    • . But about a year
      ago they started replacing some of the back-end Ruby services with applications running on the JVM and
      written in Scala
  • Extending Ruby with C on 2009-10-14
      • Wrap Up



  • 六种用ruby调用执行shell命令的方法 - { :Alex Space => " Ruby Notes " } - 51CTO技术博客 on 2009-10-14
    • 碰到需要调用操作系统shell命令的时候,Ruby为我们提供了六种完成任务的方法:

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo