View on GitHub

bifabrik

Microsoft Fabric ETL toolbox

Loading JSON files

Load JSON data to a lakehouse table:

import bifabrik as bif

bif.fromJson('Files/JSON/orders_2023.json').toTable('DimOrder').run()

Table is now in place

display(spark.sql('SELECT * FROM DimOrders'))

Or you can make use of pattern matching

# take all files matching the pattern and concat them
bif.fromJson('Files/*/orders*.json').toTable('OrdersAll').run()

These are full loads, overwriting the target table if it exists.

Configure load preferences

The backend of the JSON source uses the standard PySpark dataframe loader. Most of the apsects that can be configured there can also be set in bifabrik.

To see the options available, use help(bif.fromJson()).

For example, you may need to switch from the default “compressed” JSON format (one object per line in the file) to multiline:

bif.fromJson('Files/JSON/orders.json').multiLine(True).toTable('DimOrders').run()

If you prefer the PySpark “option” syntax, you can use that too:

bif.fromJson('Files/JSON/ITA_TabOrders.json').option('multiLine', 'true').toTable('TabOrders1').run()

You can also chain multiple settings together to configure multiple options, and more - see Configuration

Back