Member since Jul 07, 2006, follows 0 people, 0 public groups, 606 public bookmarks (609 total).
More »
Tags
| Recent Tags: |
|
|---|---|
| Top Tags: |
|
More »
Recent Bookmarks and Annotations
-
Welcome — Ruby Enterprise Edition on 2009-11-20
-
- A copy-on-write friendly garbage collector. Phusion Passenger uses
this, in combination with a technique called preforking, to reduce Ruby
on Rails applications' memory usage by 33% on average. - An improved memory allocator called tcmalloc,
which improves performance quite a bit. - The ability to tweak garbage collector settings for maximum server performance,
and the ability to inspect the garbage collector's state.
(RailsBench GC patch) - The ability to dump stack traces for all running threads
(caller_for_all_threads),
making it easier for one to debug multithreaded Ruby web applications.
- A copy-on-write friendly garbage collector. Phusion Passenger uses
-
-
Moses - Moses/RoadMap on 2009-10-19
-
- Chart parse-decoding: to support hierarchical models and syntax-based translation models
- Depth-first decoding: to provide anytime algorithms for decoding
- Forced decoding: to compute scores for provided output
- Suffix-array translation models: an alternative way to store large rule-sets without the need to translate them
- Maximum entropy translation models: translation models that incorporate additional source-side and context information for scoring translation rules.
- Chart parse-decoding: to support hierarchical models and syntax-based translation models
-
-
Moses - Moses/FactoredModels on 2009-10-19
-
The training data (a parallel corpus) has to be annotated with the additional factors. For instance, if we want to add part-of-speech information on the input and output side, we need to obtain part-of-speech tagged training data. Typically this involves running automatic tools on the corpus, since manually annotated corpora are rare and expensive to produce.
-
-
Performance comparison: key/value stores for language model counts - Brendan O'Connor's Blog on 2009-10-19
-
I’m doing word and bigram counts on a corpus of tweets. I want to store and rapidly retrieve them later for language model purposes. So there’s a big table of counts that get incremented many times. The easiest way to get something running is to use an open-source key/value store; but which? There’s recently been some development in this area so I thought it would be good to revisit and evaluate some options.
-
-
Moses - FactoredTraining/BuildingLanguageModel on 2009-10-19
-
Typically, LM estimation starts with the collection of n-grams and their frequency counters. Then, smoothing parameters
are estimated for each n-gram level; infrequent n-grams are possibly pruned and, finally, a LM file is
created containing n-grams with probabilities and back-off weights. This procedure can be very demanding
in terms of memory and time if applied to huge corpora. IRSTLM provides a simple way to split LM training
into smaller and independent steps, which can be distributed among independent processes.
-
-
Moses Language Model Howto v2 - GuardianiUS on 2009-10-16
-
Warning: This script runs GIZA++'s snt2cooc.out in the background. Use the following command if you have 1.5gigs of RAM or more. Otherwise skip it and try the alternative command below:
-
-
New Relic .:. On-Demand Application Management on 2009-10-16
-
With New Relic RPM, you can monitor 24x7, detect problems in real-time, drill down to find the causes, and continuously tune for high performance.
-
-
Twitter on Scala on 2009-10-15
-
. But about a year
ago they started replacing some of the back-end Ruby services with applications running on the JVM and
written in Scala
-
-
Extending Ruby with C on 2009-10-14
-
Wrap Up
-
-
六种用ruby调用执行shell命令的方法 - { :Alex Space => " Ruby Notes " } - 51CTO技术博客 on 2009-10-14
-
碰到需要调用操作系统shell命令的时候,Ruby为我们提供了六种完成任务的方法:
-
Diigo is about better ways to research, share and collaborate on information. Learn more »
Join Diigo