โ09-03-2024 12:46 AM
Hello everyone, I'm writing to ask if any of you have ever managed to connect a workato environment (or another ETL outside the azur environment) to a lakehouse or warehouse in the Fabric environment. It's as if the warehouse doesn't act as a SQL server, so it's impossible for me to use this type of connector to connect to it. Workato doesn't have any connectors dedicated to Fabric either and I'm at a loss as to how to SINK the result of one of my Workato recipes to my Fabric warehouse. Thank you in advance for your feedback!
Solved! Go to Solution.
โ09-09-2024 05:58 PM - edited โ09-09-2024 06:05 PM
Well the answer is: "it depends" ๐
To be a little more succinct. it depends on how you are managing data within your Fabric environment. The beauty of Fabric is that it isnt prescriptive and gives you many options for how you manage your data. Here are just a few of them:
And that is just four examples - there are many more. So your question above really relates to how you are managing your data and the "lakehouse" design and data pipeline. So, lets give you a more concrete example - obviously lacking in minute details but it will assist you I think. Lets use the most "logically" of the examples above and describe your data pipelines as such:
You ingest data into OneLake using a Medallion-like Lakehouse framework (ingest into Bronze, transform into Silver, aggregate/consolidate into Gold). The data file format you are using is Delta Lake across each of these states. You are using Workato for ingestion only into Bronze. Transformation is being handled by a capability within Fabric (probably Spark pools and notebooks).
So, the thing is, with Workato OOTB the above cant be done. Workato doesnt (yet) have a Delta Lake, nor a OneLake connector. So, for today, to implement the above you are out of luck. However, its not the end of the world! Here is how you can quasi realise a Fabric-based data pipeline and still make use of Fabric:
You ingest data into OneLake a Medallion-like Lakehouse framework (ingest into Bronze, transform into Silver, aggregate/consolidate into Gold). The data file format you are using for Bronze is CSV. Silver and Gold are Delta Lake across these states. You are using Workato for ingestion only into Bronze. Transformation is being handled by a capability within Fabric (probably Spark pools and notebooks).
So, to realise this you are using Workato to extract the data from source, convert to CSV and using the HTTP connector are writing the CSV files to OneLake as OneLake lakehouses can be accessed as Azure Data Lake Storage Gen2 accounts. How I would do this for performance and security I would use the SQL Server extract to CSV (Bulk) action running on an on-premises agent to write to a local file on the agent (or attached network drive if you have a OPA group with multiple nodes), then the HTTP connector (also running on the OPA) to upload the CSV file to OneLake as an Azure Data Lake Storage Gen2 file using the ADLS g2 REST-based interface.
โ09-09-2024 05:58 PM - edited โ09-09-2024 06:05 PM
Well the answer is: "it depends" ๐
To be a little more succinct. it depends on how you are managing data within your Fabric environment. The beauty of Fabric is that it isnt prescriptive and gives you many options for how you manage your data. Here are just a few of them:
And that is just four examples - there are many more. So your question above really relates to how you are managing your data and the "lakehouse" design and data pipeline. So, lets give you a more concrete example - obviously lacking in minute details but it will assist you I think. Lets use the most "logically" of the examples above and describe your data pipelines as such:
You ingest data into OneLake using a Medallion-like Lakehouse framework (ingest into Bronze, transform into Silver, aggregate/consolidate into Gold). The data file format you are using is Delta Lake across each of these states. You are using Workato for ingestion only into Bronze. Transformation is being handled by a capability within Fabric (probably Spark pools and notebooks).
So, the thing is, with Workato OOTB the above cant be done. Workato doesnt (yet) have a Delta Lake, nor a OneLake connector. So, for today, to implement the above you are out of luck. However, its not the end of the world! Here is how you can quasi realise a Fabric-based data pipeline and still make use of Fabric:
You ingest data into OneLake a Medallion-like Lakehouse framework (ingest into Bronze, transform into Silver, aggregate/consolidate into Gold). The data file format you are using for Bronze is CSV. Silver and Gold are Delta Lake across these states. You are using Workato for ingestion only into Bronze. Transformation is being handled by a capability within Fabric (probably Spark pools and notebooks).
So, to realise this you are using Workato to extract the data from source, convert to CSV and using the HTTP connector are writing the CSV files to OneLake as OneLake lakehouses can be accessed as Azure Data Lake Storage Gen2 accounts. How I would do this for performance and security I would use the SQL Server extract to CSV (Bulk) action running on an on-premises agent to write to a local file on the agent (or attached network drive if you have a OPA group with multiple nodes), then the HTTP connector (also running on the OPA) to upload the CSV file to OneLake as an Azure Data Lake Storage Gen2 file using the ADLS g2 REST-based interface.