Extracting Contiguous Twitter Social Media Data

Extracting Contiguous Twitter Social Media Data

Posted by: Informatica Enterprise Data Integration

PowerExchange for Twitter provides high performance connectivity to the Twitter social network. This listing allows you to leverage PowerExchange for Twitter and PowerCenter to search, extract, and continuously accumulate tweets.


The Twitter Block file demonstrates how you can search for tweets containing a specific topic, determine the latest Tweet ID for each session, and run the session repeatedly to continue extracting tweets. The first time you run the session, the Integration Service extracts the tweets for the topic defined in the query string and stores the tweets in the database. The Twitter chains parameter file is populated with the latest Tweet ID. If a scheduler is configured to run the session again, the next session uses the latest Tweet ID in the parameter file to extract the next set of tweets.The demo file contains the following objects: Twitter Mappings :The m_Twitter_chain mapping maps the Twitter source to a target database. The mapping extracts the latest 1500 tweets per session or the last six to seven days of historical tweets, whichever condition is reached first. If the session is configured to run again, it extracts the next set of matching tweets that were created after the previous session.The mapping extracts the tweets and also stores the latest Tweet ID that is used to extract the next set of tweets. Twitter WorkflowsThe wf_m_ twitter_chain workflow contains the m_Twitter_chain mapping. The workflow is scheduled to run repeatedly in intervals of time. Twitter Chain MappingThe mapping m_twitter_chain contains two pipelines to extract contiguous tweets.The mapping includes the following pipelines:
  • Twitter Entry
  • Twitter Chain
The following figure shows the m_Twitter_chain mapping:ScreenShot.png The Twitter Entry PipelineThe Twitter Entry pipeline is a pass-through pipeline that extracts tweets based on the search criteria. The Twitter Entry pipeline includes the following objects:
  • Twitter Entry source
  • Tweets Oracle target  
    The Twitter Chain PipelineThe Twitter Chain pipeline contains transformations that determine the latest Tweet ID from the Tweets target database and store it in a parameter file. The Twitter Entry pipeline uses the parameter file details to extract the next set of contiguous tweets when the session repeats. Twitter Chain WorkflowThe wf_m_twitter_chain workflow contains the s_m_twitter_chain session and a Start task.The workflow uses a workflow variable, $$Tweet_MAX_ID, in the query string of the Application Source Qualifier to input the search criteria. The workflow variable is defined in the Variables tab of the workflow properties for the workflow. The value of the variable is defined in the parameter file.The Integration Service is configured to use the newline column delimiter in the Twitter_Chain_Params parameter file. Verify the parameter file name and directory in the workflow properties. Configure a scheduler to run the workflow in intervals of time, each time extracting a contiguous set of tweets. Twitter Connections :Configure the application connections in the Workflow Manager before you run the social media sessions. Verify the connection to the target database. Search Criteria Configuration :When you configure a session for a Twitter source, you specify the query string that the Twitter API uses to search for the social media data.The query string is defined in the Application Source Qualifier for the Twitter source in the session s_m_twitter_chain.The query string has the format twitter since_id:$$Tweet_MAX_ID and contains the following parameters:
    • The default search topic "twitter" that generates a search result of all the tweets that contain the topic "twitter".
    • The Twitter Search API parameter, since_id, that returns results with a Tweet ID more recent than the specified ID.
    • The workflow variable, $$Tweet_MAX_ID, that takes the value as defined in the parameter file. It specifies the Tweet ID for the since_id parameter.
    You can download this listing as part of the Informatica for Social Media bundle.


    • PowerExchange for Twitter 9.1.0 Hotfix1 and later.


    Comments (3) Comment can only be posted by Signed/Logged in user

    Sort: Newest | Oldest
    • Hi I am getting an error during the import:ERR-RGB <Error> : Invalid Database Type: Twitter for source: , or the Database Type is not installed in this repositoryERR-RGB** Failed to Import: Twitter_Entry02/09/2018 10:47:55 **** Importing Target Definition: Tweets ...02/09/2018 10:47:55 **** Importing Target Definition: Twitter_chain_params ...02/09/2018 10:47:55 **** Importing SessionConfig: default_session_config ... Validating Source Definition Twitter_MAX_ID... Validating Target Definition Tweets... Validating Target Definition Twitter_chain_params... Replacing target definition: Tweets Replacing target definition: Twitter_chain_params Replacing source definition: Twitter_MAX_ID Replacing sessionconfig: default_session_config02/09/2018 10:48:02 **** Importing Mapping: m_twitter_chain ...ERR-RGB <Error> : Could not find Transformation definition for: twitter Twitter_EntryERR-RGB** Failed to Import: m_twitter_chain02/09/2018 10:48:02 **** Importing Workflow: wf_m_twitter_chain ...ERR-WARN <Warning> : The Integration Service DI_SM_HF specified for wf_m_twitter_chain does not exist in the repository. Please specify another Integration Service.ERR-RGB <Error> : the mapping m_twitter_chain used for session s_m_twitter_chain is not found, or is invalidERR-RGB** Failed to Import: wf_m_twitter_chain
    • Hello Viral,Do you have any guides for installing the extracting data from twitter API on Informatica 9.1 hotfix 3?If you have any idea about the same on Facebook/Linkedin, that would be really helpful.Thanks,Santhosh
    • it contains just the workflow XML.. please could you update the package with mapping xml ?
       |   1
       Load replies