To insert data into the database, we'll need to set up a database connection. sqlx is an async SQL client so we need to talk about async Rust.
In Rust, the equivalent of the Promise
type in JavaScript is defined by the Rust standard library.
It's called a Future
.
That's where Rust's standard library's responsibilities end though. The runtime that is responsible for running and polling Future
s can be swapped out according to your programs needs.
Setting up the tokio runtime
The runtime we'll be using is in a crate called tokio
which is the longest lived, most stable async runtime in the Rust ecosystem.
We'll enable the full
feature, which makes writing application-level code the easiest. If we were writing a library, we'd want to restrict the featureset as much as possible.
cargo add tokio -p upload-pokemon-data -F full
tokio = { version = "1.32.0", features = ["full"] }
We can use the tokio::main
macro on our main
function and use the async
keyword to turn our main
function into an async function.
#[tokio::main]
async fn main() -> miette::Result<()> {
...
}
The macro from the tokio crate will rewrite our main function to wrap it in the tokio runtime.
We could set the runtime up ourselves but we're only concerned with using async in this workshop, not the internals of async runtimes, so we'll let the macro do the work for us.
Having an async function is important for us because it will let us await
our sql queries, which are async functions themselves.
Setting up the connection pool
Before we can make queries, we need to set up our database connection. sqlx offers us the MySqlPoolOptions::new()
builder to connect a pool to our database url. The connect
function is async
and we can't continue without the database connection, so we await
the result before continuing. .connect
returns a Result
, so we can take the same approach as before and use ?
to forward the error to miette's report if anything goes wrong.
Bring MySqlPoolOptions
into scope in main.rs
.
use sqlx::mysql::MySqlPoolOptions;
and add the connection pool logic below our database_url
.
fn main() -> eyre::Result<()> {
let database_url = ...;
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&database_url)
.await?;
...
}
We can also use miette again to add some help-text since many people often forget the format of a mysql connection string:
let pool = MySqlPoolOptions::new()
.max_connections(5)
.connect(&database_url)
.await
.map_err(|e| {
miette!(
help="database urls must be in the form `mysql://username:password@host:port/database`",
"{e}"
)
})?;
Inserting Pokemon into the database
With the tokio runtime set up and the connection pool also set up, we can now insert pokemon into our database.
We already have insert_pokemon
in scope because we added use db::*
earlier in the course. At the bottom of our for loop we can call insert_pokemon
with a reference to the MySql connection pool
and to the pokemon_row
we want to insert. insert_pokemon
is async, so we'll await
it and use ?
to handle the error.
Also note that I've changed the dbg!
to be println!
so that its more readable when this script prints out all >1000 pokemon.
for result in rdr.deserialize() {
let record: PokemonCsv =
result.into_diagnostic()?;
let pokemon_row: PokemonTableRow = record.into();
println!(
"{} {:?} {}",
pokemon_row.pokedex_id,
pokemon_row.id,
pokemon_row.name
);
insert_pokemon(&pool, &pokemon_row)
.await
.into_diagnostic()?;
}
At this point we'll be able to run the script and insert the pokemon! If you've lost or stopped your database connection along the way, you can get it started again with this pscale command.
pscale connect pokemon new-tables
and we can run our program to insert all the pokemon into the pokemon table.
DATABASE_URL=mysql://127.0.0.1 cargo run
This will take a few minutes, that's partially why we left the println!
in, so we could see progress.
Why is this slow?
There are two reasons this insertion is slow.
First: sqlx doesn't have batching. Some other clients do. So instead of sending a single large request we're sending a lot of smaller requests.
Second: Currently we stop at every await
(because that's how await
works) which means we only ever have one SQL query running.
To be clear: this is a perfectly fine way to work with async/await, especially if you're just getting started with Rust or async Rust but also because we only really ever have to run this script once, so how easy it is for us to write and run needs to be traded off against how long it takes to run. Even if this script took a whomping 15 minutes to run, that would only give us 15 minutes worth of time to improve the performance (because we only have to run the script once).
However, there are ways to execute our futures that will result in more requests running simultaneously.