-
Notifications
You must be signed in to change notification settings - Fork 73
SafeQuery Explained
The StringFormat class offers compile-time safety with interpolation-style string formatting, using named placeholders.
What you shouldn't use it for, is to create SQL queries, because string interpolation with untrusted input can lead to SQL injection attack. For example:
private static final StringFormat FIND_USER_BY_ID =
new StringFormat("SELECT * FROM Users WHERE user_id = '{user_id}'");
...
String query = FIND_USER_BY_ID.format(userIdInput);If the userIdInput comes from untrusted sources, it can be used to steal information about other users. The attacker can just send a string like "' OR user_id = 'victim", and then they can see all information about the victim.
Instead, consider using the com.google.mu.safesql package:
SafeQuery query = SafeQuery.of(
"SELECT * FROM Users WHERE user_id = '{user_id}'", userIdInput);Benefits provided by the safesql package:
-
SafeQueryautomatically escapes special characters to prevent injection attack. - You can interpolate not only literal values, but tables, columns and even sub-queries.
- The same set of compile-time checks ensure that you can't make human errors (like passing the user password as the user id).
- Extra sql-aware compile-time checks ensure that your SQL template is sane.
- You can compose smaller
SafeQueryobjects to create largeSafeQueryobjects, making it easier to manage complex queries. - The
GoogleSqlclass provides extra GoogleSQL conversions such as translatingjava.time.Instantto GoogleSQL'sTIMESTAMP()function. You can useGoogleSqlto generateSafeQueryobjects for e.g. BigQuery.
If you are using Maven, make sure to put mug-errorprone artifact in your annotationProcessorPath in order to get the compile-time guardrails.
Then refer to the javadoc of SafeQuery and GoogleSql for the specifics of the API.
Overall, there are a few best practices and examples worth mentioning.
Create a client library that wraps the interface with the underlying database. The interface should accept SafeQuery as the input instead of raw String so that you can better enforce the protection. The client library internally can call SafeQuery#toString() to get the wrapped query, but the callers of the client library must all pass in SafeQuery. Example:
class BigQueryClient {
public BigQueryResponse sendQuery(SafeQuery query) {
// ...
}
}
...
// a caller that calls BigQuery
private static final StringFormat.Template<SafeQuery> JOBS_TIMELINE_BY_PROJECT_ID =
GoogleSql.template(
"""
SELECT period_start, period_slot_ms
FROM INFORMATION_SCHEMA.JOBS_TIMELINE
WHERE period_start BETWEEN {start_time} AND {end_time}
AND project_id = '{project_id}'
""");
private final BigQueryClient bigQuery;
// ...
Instant startTime = ...;
Instant endTime = ...;
String projectId = ...;
var response = bigQuery.sendQuery(
JOBS_TIMELINE_BY_PROJECT_ID.with(startTime, endTime, projectId));If the query expects a string literal, do quote the placeholder in the template, like the '{project_id}' in the above example. Failing to do so will not compile and even if you manage to fool the compile-time checks, SafeQuery will throw IllegalArgumentException.
This makes the query template clearer in terms of what's expected to be string.
To parameterize by table name, column names etc., use backticks to quote around the placeholder (or else you get a compile-time error).
For example:
private static StringFormat.To<SafeQuery> SLOT_MS_PER_PERIOD =
GoogleSql.template(
"""
SELECT sum(period_slot_ms) as total_slot_ms FROM `{timeline_table}`
GROUP BY period_start
""");
// ...
SafeQuery query = SLOT_MS_PER_PERIOD.with("JOBS_TIMELINE_BY_PROJECT");The above example parameterizes the query by table name, but sometimes you might want to pass in a sub query instead.
For that, you can use a smaller SafeQuery object to interpolate the larger SafeQuery. Example:
private static StringFormat.Template<SafeQuery> WINDOW_TIMELINE =
GoogleSql.template(
"""
SELECT * FROM ({timeline_table})
WHERE period_start BETWEEN {start_time} AND {end_time}
""");
// ...
SafeQuery timelineInWindow = WINDOW_TIMELINE.with(timelineTable, startTime, endTime);
SafeQuery query = GoogleSql.from(
"""
SELECT * FROM ({timeline_table})
ORDER BY period_start
""",
timelineInWindow);SafeQuery and GoogleSql only allow types that are known to be safe (numbers, enums, Instant, ZonedDateTime, LocalDate, compile-time string literals, quoted string values, backtick-quoted identifiers etc., and other SafeQuery objects).
But sometimes you may have string values that come from trusted sources (say, from a command-line flag). These string values are not compile-time string literals, so you won't be able to pass them to SafeQuery.of(), or use them as unquoted placeholder values.
As a work-around, you can create your own TrustedSql class. If you tightly control the construction of this class only from trusted sources, it'll be provably safe. You can then set the system property com.google.mu.safesql.SafeQuery.trusted_sql_type to this class name. SafeQuery will pick it up and objects of this type will be accepted.
For example:
package com.mycompany.safesql;
public final class TrustedSql {
private final String sql;
private TrustedSql(String sql) {
this.sql = sql;
}
public static TrustedSql fromFlag(Flag<String> stringFlag) {
return new TrustedSql(stringFlag.get());
}
public static TrustedSql loadFromResource(String resourceName)
throws IOException {
...
}
/** The class is expected to be annotated with @Query("SELECT ..."). */
public static TrustedSql loadFromQueryAnnotation(Class<?> annotatedClass) {
...
}
@Override public String toString() {
return sql;
}
}
// ...
SafeQuery query = SLOT_MS_PER_PERIOD.with(
TrustedSql.loadFromResource("com/google/mycompany/all_job_timeline.sql"));As a convenience feature, when you pass an Iterable as the value of a template placeholder, it'll be automatically expanded, using , as the delimiter.
For example:
private static final StringFormat.To<SafeQuery> GET_JOBS_BY_ID =
SafeQuery.template("SELECT * FROM Jobs WHERE job_id IN ({ids})");
...
List<Integer> ids = asList(1, 2, 3);
SafeQuery query = GET_JOBS_BY_ID.with(ids);
// produces "SELECT * FROM Jobs WHERE job_id IN (1, 2, 3)If you expect the list elements to be string values, you need to quote the placeholder (the library will throw otherwise):
List<String> ids = asList("foo", "bar's");
SafeQuery query =
SafeQuery.of("SELECT * FROM Jobs WHERE job_id IN ('{ids}')", ids);
// produces "SELECT * FROM Jobs WHERE job_id IN ('foo', 'bar\'s')