Re: [Feed2imap] Dupes
Posted by Lucas Nussbaum on November 30, 2010 - 22:16:
On 30/11/10 at 21:14 +0100, Michael Welle wrote:
To do that, feed2imap hashes the whole item, so changes in the item's
body trigger an update of the email (not a duplicate: feed2imap replaces
the current email with the new version). Some feeds do stupid things, so
there's a (per-feed) option to change that: ignore-hash.
The hash calculation explains the observed behaviour pretty well. What
are the side effects of ignoring the hash? Does it mean that every
item is regarded as new?
No, that all updates to existing items are ignored.
Another way to avoid that issue is to filter the feeds to remove the
content that changes constantly. There are two ways to do that:
- the execurl option specifies a command that outputs the feed content
on stdout. so you can run wget -O - http:/foo |grep -v bar,
- the filter option specifies a command that will receive the
downloaded feed on stdin, and output the modified feed on stdout.
For example, my slashdot definition is:
filter: "ruby -p -e '$_ = $_.gsub(/\\/Slashdot\//i,
Because the capitalization of "Slashdot" in the slashdot feed changes
constantly (or used to, at some point).
At a first glance the filter looks a little bit like the proposed per
feed comparison function. But I guess the output of the filter is not
used only as an input for the hashing function? So I can't strip
everything except the guid of a feed item and expect that feed2imap
still works as expected?
No, it's also used to generate the content of the email.
I've just added an option that does "do not re-upload posts that were
already deleted by the user even if their hash changes", see
Powered by MHonArc
, Updated Tue Nov 30 22:20:14 2010