
Extracting Contiguous Twitter Social Media Data
Posted by: Informatica Enterprise Data Integration
PowerExchange for Twitter provides high performance connectivity to the Twitter social network. This listing allows you to leverage PowerExchange for Twitter and PowerCenter to search, extract, and continuously accumulate tweets.
Overview
The Twitter Block file demonstrates how you can search for tweets containing a specific topic, determine the latest Tweet ID for each session, and run the session repeatedly to continue extracting tweets. The first time you run the session, the Integration Service extracts the tweets for the topic defined in the query string and stores the tweets in the database. The Twitter chains parameter file is populated with the latest Tweet ID. If a scheduler is configured to run the session again, the next session uses the latest Tweet ID in the parameter file to extract the next set of tweets.The demo file contains the following objects: Twitter Mappings :The m_Twitter_chain mapping maps the Twitter source to a target database. The mapping extracts the latest 1500 tweets per session or the last six to seven days of historical tweets, whichever condition is reached first. If the session is configured to run again, it extracts the next set of matching tweets that were created after the previous session.The mapping extracts the tweets and also stores the latest Tweet ID that is used to extract the next set of tweets. Twitter WorkflowsThe wf_m_ twitter_chain workflow contains the m_Twitter_chain mapping. The workflow is scheduled to run repeatedly in intervals of time. Twitter Chain MappingThe mapping m_twitter_chain contains two pipelines to extract contiguous tweets.The mapping includes the following pipelines:
The Twitter Entry PipelineThe Twitter Entry pipeline is a pass-through pipeline that extracts tweets based on the search criteria. The Twitter Entry pipeline includes the following objects:
- Twitter Entry
- Twitter Chain

- Twitter Entry source
- Tweets Oracle target
- The default search topic "twitter" that generates a search result of all the tweets that contain the topic "twitter".
- The Twitter Search API parameter, since_id, that returns results with a Tweet ID more recent than the specified ID.
- The workflow variable, $$Tweet_MAX_ID, that takes the value as defined in the parameter file. It specifies the Tweet ID for the since_id parameter.
Features
- PowerExchange for Twitter 9.1.0 Hotfix1 and later.