View on GitHub

bifabrik

Microsoft Fabric ETL toolbox

JDBC data source

JDBC connection can come in handy when the source database is not supported by PySpark out of the box.

For example, if you need to load data from MariaDB, you could try the JDBC MySQL connector like this:

import bifabrik as bif

# You will probably need to configure the user name and password.
# Passwords cannot be set directly - store in Key Vault and run the notebook with sufficient permissions.
bif.config.jdbc.jdbcUser = 'my.jdbc.user'
bif.config.security.keyVaultUrl = 'https://contoso-kv.vault.azure.net/'
bif.config.security.passwordKVSecretName = 'Contoso-MariaDb-Password'

db_host = "127.0.0.1"
db_port = "3306"
db_name = "contoso"
url = f'jdbc:mysql://{db_host}:{db_port}/{db_name}?useUnicode=true&tinyInt1isBit=FALSE&useLegacyDatetimeCode=false'

bif.fromJdbcQuery('SELECT * FROM contoso.orders').jdbcUrl(url).toTable('MariaDB_Orders').run()

There is a lot of flexibility with JDBC

The JDBC URL usually has options specific to your data source
If need be, specify the JDBC driver, e.g. bif.config.jdbc.jdbcDriver = 'com.mysql.cj.jdbc.Driver'
To see all the JDBC options, run help(bif.config.jdbc). These correspond to their PySpark equivalents

You can save your Key Vault configuration and other repetitive setup in a file - see Configuration.

Back