对Python脚本做简单的profiling

最近事情好多,Blog好久没有更新了。今天上来写写最近解决的一个Python里的性能优化问题。

起因

之前为项目写过一个Sqlite数据库预处理的Python脚本,里面主要做了张新表,把其他表的数据填进去。当时主要考虑到维护性,条理清楚,就没太考虑Performance。之后QA发现模块的运行比原来慢了20倍,因为还是挺快,所以没有当时马上修正。

Profiling

这次Release要修掉这个问题。我的原则是,改进Performance一定要做Profiling,做到有的放矢才。

和Java的VirtualVM类似,Python 2.7也内置了几个Module做Profiling,我选择了cProfile。基本就是如下命令:

“-s tottime”是让结果用总执行时间排序。

优化之前的执行结果,

可见罪魁祸首就是sqlite3的Cursor的execute()方法,和原本猜测的也是一样的。优化的手法也很明确就是减少execute()的调用次数,使用batch和合并SQL语句的办法,很容易就用空间换回了时间。

优化之后的结果,

优化之后只是原来的9%的Runtime。

总结

  • 继续坚持用Profiler来做Performance的改进。
  • 边改边用Profiler查看Performance有没有提升。
  • 不要过分优化,否则代码没法看了。😀

References

强大的XPath

最近的项目开发中常常要分析XML,深切体会到XPath的强大和方便。Java可以使用DOM4j,Python 2.5以后可以使用了etree。

举个小例子,如果有如下XML:

要去掉所有<B>的节点,XPath可以如何做呢?

XPath的灵活之处就在于,可以用简单明了的Path控制想要读取的节点。

其实还有很多有趣的特性,可以看看XPath Tutorial

How to migrate ReviewBoard database from sqlite to MySQL

About 3 years ago, I introduced Review Board into our team with a great help from RB group and replaced CodeStrike. We deployed RB on a WinXP VM and the backend database is sqlite. I can’t remember the reason. Maybe I just wanted to save some time. Actually, it has spent a lot of time of our team.

Why? Because sqlite is a lightweight DB and it have bad performace in concurrency scenario. For our team, we have about 20 developers and RB is the most popular daily tools. Now we have to say, we highly depend it to control the quality of team’s code.

Recently, the dababase issue was getting worse. RB pages can’t be created well and there are many warnings/errors on it. And yesterday, the database was dead finally-“The database is locked!”. That’s a message from Django. I googled and the best solution is to use better database to avoid such issues in future.

I took about 3 hours to migrate database to MySQL 5.1. Maybe my experience can help you to save some time.

Step 1. sqlite bump

Go to sqlite.com and download a command line. For window version, you can find it here.

Unzip it and use command to dump sqlite database.

reviewboard.db is the database file of sqlite.

Step 2. Convert to MySQL dump

Now we have the sqlite dump file. But you can’t import it into a MySQL database directly. Because some syntax is not supported by MySQL.

I googled and got a free converter to do that. You can find it here and download sqlite3_mysql.zip.

In the deep of the zip , you can find a executable file – sqlite_mysql.exe. Run it and convert the sqlite dump file to a MySQL one.

Step 3. Import MySQL dump

Before that, please create a database in your MySQL database.

In my case, the dump is about 180M and I waited very long time. (And there may be some warning messages. I don’t know why but for now the migrated server works well. So maybe we can’t ignore them.)

Step 3.1 Alter database structure

Actually, this issue was found when I finished Step 4 and restarted RB. At the beginning, RB worked well. But a minute later, a guy said he could’t submit any comments.

The root cause was the “id” field of some tables lost the property of “AUTO_INCREMENT”. I didn’t know why and SQL in dump file was right. Then I write a very simple .py script to fix this issue.

I just went through all tables and alter the property of “id” field if the table has one.

Step 4. Change RB configuration file

Change rb_site_root/conf/settings_local.py to:

Make sure MySQLdb is installed or there will be error when restart RB.

The best moment came finally! RB restarted successfully and all review request/comment/diff were there. And the performance was improved. You can feel the page is loaded fast. Cheers! :grin:

If you have the same problem with me, I hope this post can help and save some time.

Java技术学习之Class文件

Java Class文件是由Java Compiler编译源文件之后产生的。Class文件里保存的就是大名鼎鼎的ByteCode(字节码)。其实在JVM Specification有对它格式的详细描述,我也因此用Python写了一个解析器PyJavap。昨天这个小工具也到了第一个Milestone,支持1.5以前的规范。我心里还是很有成就感的,同时对ByteCode也有了更深了理解,在我的GitHub上可以找到。

现在对Java技术的兴趣越来越深,继续研究吧! :smile: