Stand With Ukraine

In our previous article we took a deep dive into the integration of an LLM within Datuum. This integration gave our users the ability to create custom mappings and build automated dataflows using the ease of conversational English instead of writing formulas or coding. We use Chat GPT to translate user’s prompts into SQL request and embed it into the data pipelines. If you’re curious, go ahead and click the link to see it in action.

Now, in this article, our focus is set on unveiling of the challenges that emerged during the weaving of this integration and the ingenious solutions we engineered to navigate through them.

Why We Need Data Transformations

The data the source seldom aligns perfectly with the destination. While there might be instances where the structure and content are quite similar, they can also exhibit significant disparities. Consequently, the extent of transformations required usually hinges on the configuration of the destination and the dissimilarities between the data source and destination.

Here’s where Datuum comes into play. It conducts a thorough analysis of both the data source and its intended destination, identifying areas of concurrence and divergence. Subsequently, it generates the necessary code to facilitate the transformation of the data, ensuring its alignment with the destination’s specifications. However, it’s important to note that there exist numerous transformations that demand application, which the AI might not be capable of automatically detecting and comprehending.

Let’s consider an example: In your data source, you have two fields, “First Name” and “Last Name,” while in your destination, there’s only one field, “Name.” When Datuum conducts its automated mapping, it naturally selects one field from the source to map to the destination. In this particular case, based on the analysis, it chooses “Last Name.”

However, your preference is to have a single column in your destination that combines “First Name” and “Last Name” into one. To achieve this, you need a mechanism to instruct the system to concatenate these two fields and incorporate the resulting value into the destination.

Typical Approach

Across numerous Data Integration and Data Preparation tools, the customary approach to constructing transformations involves equipping users with a distinct proprietary language reminiscent of Excel. This language serves as a means for users to articulate their desired actions for manipulating the data. Within this framework, users are tasked with formulating expressions that subsequently undergo interpretation and execution within the tool’s interface.

Challenges With This Method

While this method is widespread, we identify significant issues when viewed from a user’s perspective. One substantial impediment hampers the widespread acceptance of such tools. Specifically, each tool employs its distinct transformation language, mandating users to grasp a new linguistic framework to proficiently navigate and utilize the tool.

Moreover, in numerous instances, the interface fails to streamline the user’s workflow. On the contrary, it often exacerbates the complexity of the task at hand, rendering it more arduous rather than simplifying it.

Innovative Approach

As we crafted the user experience for data transformations, our foremost objective revolved around streamlining interaction with the product, striving for an absolutely seamless, 100% no-code interface. While this may seem like an ambitious aspiration, the question arose: How can this be achieved?

Our solution was to introduce a Large Language Model (LLM) into the equation, leveraging its capabilities to interpret plain English and transform it into executable code. This transformation process would be informed by the intricate interplay of context and metadata drawn from user’s Data Source and Destination.

Given that Datuum is already equipped with all the requisite pieces of metadata and the contextual framework for transformations, the challenge lay in optimizing user interaction to harness the full potential of the Large Language Model.

And now, allow us to share the outcome of our endeavors.

Challenges with Innovative Approach

Accessing and querying LLM may seem straightforward, but the real complexity lies in establishing and maintaining the contextual framework that supports LLM’s functions. This involves continuous scanning and updating of tables, column names, data types, and the intricate relationships between these data elements.

LLM encounters several notable challenges:

  • Context Preparation: Anticipating user inquiries necessitates a well-constructed context that is both comprehensive and intelligible to LLM. This entails specifying database particulars and delineating table relationships, ensuring clarity without causing confusion.
  • Effective Querying: While user prompts largely determine query content, it’s imperative to augment these with contextual cues. This contextual awareness hinges on the user’s position within the UI journey and the available metadata.
  • High Standards: Meeting user expectations demands precision. This includes providing SQL outputs without extensive explanations, ensuring data types and functions align with the database in use.
  • Stability in Results: To enhance result consistency, a lower temperature setting is recommended. Nonetheless, validation remains crucial, and function calling should be employed judiciously when relevant to the use case.

Conclusion

At Datuum, we hold a firm belief that Data Engineers and, in particular, Data Analysts require potent tools to wield data effectively. We are equally convinced that AI possesses the potential to revolutionize their workflows by automating tasks that are presently manual, tedious, and time-consuming. We are excited to unveil the strides we are making towards our overarching objective: Simplifying Data Integration.

Book A Demo

Get in touch to find out more about how Datuum can help

    For details about how we process your data please see our Privacy Policy
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.