Leveraging OpenAI for Creating Compelling Sample Datasets for Microsoft Fabric and Power BI

Data analysis and visualization are key components of business intelligence, and Power BI stands as a leading platform in this domain. A pivotal part of working with Power BI involves dealing with datasets. Unfortunately, it isn’t always easy to access or generate datasets that perfectly illustrate the capabilities of Power BI. This is where ChatGPT, OpenAI’s powerful language model, can lend a hand. Today, we’ll delve into how you can use ChatGPT to create intriguing sample datasets for use in Power BI.

Step 1: Understanding the Desired Data Structure

Before generating your data, it’s essential to understand the structure you require. In Power BI, data is often organized into tables that consist of rows (records) and columns (fields). For example, a simple customer database could contain fields such as CustomerID, Name, Email, Country, and Purchase Amount.

You can sketch out your desired table and decide the kind of data you need for each column. For instance, for a column like “Country,” you might want a mix of countries worldwide, while for “Purchase Amount,” you may need a range of numerical values.

Step 2: Defining the Data Parameters with ChatGPT

Once you understand the structure of the data, the next step is to translate it into a form that ChatGPT can generate. This would typically involve providing the model with examples or templates of what you want. For instance, if you are creating a dataset for customer analysis, you can instruct ChatGPT as follows:

    data_template = """
    {
    "CustomerID": "random alphanumeric string of length 6",
    "Name": "random human name",
    "Email": "random email",
    "Country": "random country",
    "Purchase Amount": "random number between 100 and 5000"
    }
    """

Remember, your instructions need to be as clear and specific as possible to generate the right type of data.

Step 3: Generating the Data

After setting the data parameters, you can now instruct ChatGPT to generate the data. If you’re using the OpenAI API, you can use the openai.ChatCompletion.create() method, passing in the model you’re using (for instance, ‘text-davinci-002’) and the data template you’ve defined. Your code may look something like this:

    import openai
    import json

    openai.api_key = 'your-api-key'
    
    response = openai.ChatCompletion.create(
      model="text-davinci-002",
      messages=[
          {"role": "system", "content": "You are a helpful assistant that's generating a data sample."},
          {"role": "user", "content": data_template},
      ]
    )

    data_sample = json.loads(response['choices'][0]['message']['content'])

    print(data_sample)

This code will generate a single record. If you want to generate more records, you can loop through the data generation process as many times as you need.

Step 4: Compiling and Formatting the Data

Now that you have the data generated, you can compile it into a dataset. Each generated record can be appended to a list which can later be converted into a DataFrame using pandas. Here is how it might look:

    import pandas as pd

    data_records = []

    # Assume you have generated n number of records
    for i in range(n):
        data_records.append(generate_data()) # generate_data function includes the data generation code from step 3

    # Convert the list to DataFrame
    df = pd.DataFrame(data_records)

    # Save the DataFrame as a CSV file for use in Power BI
    df.to_csv('sample_dataset.csv', index=False)

Step 5: Importing the Dataset into Power BI

After your CSV file is ready, you can now import it into Power BI. In Power BI Desktop, you can import your CSV file by navigating to “Home” > “External Data” > “CSV”. From here, you can start creating your visualizations and dashboards.

Here is the complete code as a single block for easier reference:

import openai
import json
import pandas as pd

def generate_data():
    # Define your data template
    data_template = """
    {
    "CustomerID": "random alphanumeric string of length 6",
    "Name": "random human name",
    "Email": "random email",
    "Country": "random country",
    "Purchase Amount": "random number between 100 and 5000"
    }
    """

    # Initialize the OpenAI API
    openai.api_key = 'your-api-key'
    
    # Create a chat completion with the model and data template
    response = openai.ChatCompletion.create(
      model="text-davinci-002",
      messages=[
          {"role": "system", "content": "You are a helpful assistant that's generating a data sample."},
          {"role": "user", "content": data_template},
      ]
    )
    # Parse the response to JSON and return
    return json.loads(response['choices'][0]['message']['content'])

# Initialize a list for storing your data
data_records = []

# Decide the number of records you want to generate
n = 100

# Generate n number of records
for i in range(n):
    data_records.append(generate_data())

# Convert the list to a DataFrame
df = pd.DataFrame(data_records)

# Save the DataFrame as a CSV file
df.to_csv('sample_dataset.csv', index=False)

This script will generate 100 records based on the data template, compile them into a DataFrame, and save it as a CSV file. You can then import this CSV file into Power BI. Remember to replace 'your-api-key' with your actual OpenAI API key. Also, ensure that you have installed the openai and pandas libraries, which you can do with pip:

pip install openai pandas

Wrapping Up

Creating compelling sample datasets for Power BI is crucial for demonstrating its capabilities and experimenting with various features. By leveraging ChatGPT, you can create datasets that are tailored to your specific needs and can offer varied insights when analyzed in Power BI.

It’s important to remember that while ChatGPT is a powerful tool, it’s not perfect. Be sure to verify and clean the generated data before using it in your Power BI projects to ensure accuracy in your data visualizations and analysis.

This blogpost was created with help from ChatGPT Pro