Can Apache Flume be used to extract tweets for a certain period of time?
AFAIK, the TwitterSource
from Cloudera is just for receiving data at the same time it is generated. I think something similiar occurs with the Twitter 1% firehose source.
Nevertheless, I'm seeing the Twitter API may work with timelines, thus it is a matter of modifying the TwitterSource
source code.