Skip to main content

harry

Items from 0 people harry follows

harry
  • 材料一定要齐全,情愿多做,不能少做。

    以下是我材料清单,给需要者借鉴:
harry
      • A copy-on-write friendly garbage collector. Phusion Passenger uses
        this, in combination with a technique called preforking, to reduce Ruby
        on Rails applications' memory usage by 33% on average.
      • An improved memory allocator called tcmalloc,
        which improves performance quite a bit.
      • The ability to tweak garbage collector settings for maximum server performance,
        and the ability to inspect the garbage collector's state.
        (RailsBench GC patch)
      • The ability to dump stack traces for all running threads
        (caller_for_all_threads),
        making it easier for one to debug multithreaded Ruby web applications.


harry
    • Chart parse-decoding: to support hierarchical models and syntax-based translation models
    • Depth-first decoding: to provide anytime algorithms for decoding
    • Forced decoding: to compute scores for provided output
    • Suffix-array translation models: an alternative way to store large rule-sets without the need to translate them
    • Maximum entropy translation models: translation models that incorporate additional source-side and context information for scoring translation rules.
harry
  • The training data (a parallel corpus) has to be annotated with the additional factors. For instance, if we want to add part-of-speech information on the input and output side, we need to obtain part-of-speech tagged training data. Typically this involves running automatic tools on the corpus, since manually annotated corpora are rare and expensive to produce.
harry
  • I’m doing word and bigram counts on a corpus of tweets. I want to store and rapidly retrieve them later for language model purposes. So there’s a big table of counts that get incremented many times. The easiest way to get something running is to use an open-source key/value store; but which? There’s recently been some development in this area so I thought it would be good to revisit and evaluate some options.
harry
  • Typically, LM estimation starts with the collection of n-grams and their frequency counters. Then, smoothing parameters
    are estimated for each n-gram level; infrequent n-grams are possibly pruned and, finally, a LM file is
    created containing n-grams with probabilities and back-off weights. This procedure can be very demanding
    in terms of memory and time if applied to huge corpora. IRSTLM provides a simple way to split LM training
    into smaller and independent steps, which can be distributed among independent processes.
harry

  • Warning: This script runs GIZA++'s snt2cooc.out in the background. Use the following command if you have 1.5gigs of RAM or more. Otherwise skip it and try the alternative command below:

harry
  • With New Relic RPM, you can monitor 24x7, detect problems in real-time, drill down to find the causes, and continuously tune for high performance.
harry
  • . But about a year
    ago they started replacing some of the back-end Ruby services with applications running on the JVM and
    written in Scala
harry
    • Wrap Up



Show more recent items

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo