I just solved a problem where I needed to rename some columns in an SQLite3 database. Unfortunately, SQLite3′s ALTER TABLE command doesn’t have the ability to rename columns in tables. Most Google results offered solutions such as creating a new table, copying the data over, and then dropping the old table. While this is fine for probably 99% of SQLite3 databases, the table I’m dealing with has almost 3 billion rows (yes, I have an SQLite3 database with 3×109 rows). Recreating the table (specifically, the indices on the columns) would take too long. It took me about 24 hours to create the indices on a supercomputer with 512gb of RAM, and I didn’t want to go through that again.
I stumbled on two separate posts that seemed to offer solutions, and I got it working using them. I just wanted to write up this blog entry to clarify a few things and to show this in action.
Backup your database! Seriously, back it up. The first time I tried this, I corrupted mine because I forgot to update the corresponding index. Were it not for the backup, I’d be spending another several days recreating it from scratch.
I’m on a *nix box, so I’m interacting with the database using sqlite3 from the command line. I think this is simplest and safest because you can double-check your progress as you go before you commit the changes to the database. As an example, I’ll show you the table I was working with:
CREATE TABLE people ( id INTEGER PRIMARY KEY, state_id INTEGER NOT NULL, county INTEGER NOT NULL, age INTEGER NOT NULL, gender TEXT NOT NULL, -- M or F income INTEGER NOT NULL ); CREATE INDEX people_county_index ON people(county); |
What I realized is that I incorrectly named the county column. It should’ve been county_id. In order to change the name of the column, I must change the index as well. Here is how it’s done:
1 2 3 4 5 6 7 8 9 10 11 12 13 | BEGIN; --start a transaction; this ensures that your changes won't be written until you COMMIT .schema --this will show you your current schema PRAGMA writable_schema=1; --this enables you to edit the schema SELECT * FROM sqlite_master; --show the raw data SQLite saves related to your schema UPDATE sqlite_master SET SQL=REPLACE(SQL, 'county', 'county_id') WHERE name='people'; --simple string replace .schema --verify CREATE TABLE is correct UPDATE sqlite_master SET SQL='CREATE INDEX people_county_id_index on people(county_id)' WHERE name='people_county_index'; --update index SQL UPDATE sqlite_master SET name='people_county_id_index' WHERE name='people_county_index'; --update index name .schema --verify index is correct SELECT * FROM sqlite_master; --double-check all raw data once more PRAGMA writable_schema=0; --disable editing the schema COMMIT; --save the changes to the database .exit |
After you verify that things look right, run ANALYZE to update the query planner’s statistics. This hack seemed to mess up the query planner for my database, but ANALYZE fixed it. If you’re still paranoid, run PRAGMA integrity_check; after ANALYZE.
That’s all there is to it. sqlite_master is just a regular SQL table, so you can modify it like any other. Note that the call to replace(x, y, z) will replace all occurrences of y with z in the string x. This means that if “county” existed anywhere else in the CREATE TABLE people statement, it would be replaced. As a result, be careful when using replace().
Also, if anyone has any questions regarding big/huge/gigantic SQLite3 databases, feel free to ask. Mine is 280gb and contains about 3.5 billion records split between 4 tables. Performance is quite good, but it took quite a bit of work to get there, and there were a lot of hard lessons learned along the way.
