Loading JSON files
Load JSON data to a lakehouse table:
import bifabrik as bif
bif.fromJson('Files/JSON/orders_2023.json').toTable('DimOrder').run()
Table is now in place
display(spark.sql('SELECT * FROM DimOrders'))
Or you can make use of pattern matching
# take all files matching the pattern and concat them
bif.fromJson('Files/*/orders*.json').toTable('OrdersAll').run()
These are full loads, overwriting the target table if it exists.
Configure load preferences
The backend of the JSON source uses the standard PySpark dataframe loader. Most of the apsects that can be configured there can also be set in bifabrik.
To see the options available, use help(bif.fromJson())
.
For example, you may need to switch from the default “compressed” JSON format (one object per line in the file) to multiline:
bif.fromJson('Files/JSON/orders.json').multiLine(True).toTable('DimOrders').run()
If you prefer the PySpark “option” syntax, you can use that too:
bif.fromJson('Files/JSON/ITA_TabOrders.json').option('multiLine', 'true').toTable('TabOrders1').run()
You can also chain multiple settings together to configure multiple options, and more - see Configuration