DT- Transform from unstructured complex format to structure

DWQA QuestionsDT- Transform from unstructured complex format to structure
A.B Link Consulting Team Staff asked 10 years ago

Hi,
I am new to Informatica Data Tranfromation technology. I have to do extract information from binary documents (xls, word, pdf) into XML structure, using Informatica.
What is the simplest way to deal with this task?

1 Answers
A.B Link Consulting Team Staff answered 10 years ago

The best way to deal with this to use UDT module in mapping. Informatica DataTransofmtion (DT) is very usefull and flexible in transforming documents into relational\hierarchical structures .The approach DT uses to handle binary documents is threw set of pre processors (functions) , that will translate the binary documents into a readable textual format. From this point on, you can use the same functionality you would have used on any flat file extraction in the tool.

Steps :

  • Open a new DT project
  • Create a new parser in the project
  • Assign a new example source to parser.
  • Expand the local file option (see screenshots below), and choose the relevant preprocessor
  • For example you can choose PDFTOTXT_4, for PDF documents.