joram deletes persistent messages on startup
environment: JOnAS 4.8.6 / 4.9.2 with JORAM 4.3.25 / 4.3.32, collocated/persistent server.
problem: when any queue has an ID starting with #0.0.11 (in our case it was #0.0.1133), all persistent messages are deleted during startup of jonas. it never happens when the queue IDs start with #0.0.10. the problem has been seen at a customer side, loosing data on every shutdown/startup cycle. deleting the whole persistent storage directory and starting deployment again works around the problem since the queue IDs start counting from 1031 (or so) again...
looking at traces and the joram sources, the problem is as follows:
- an internal agent exists: #0.0.11 (joram admin proxy)
- during startup, a MessagePersistenceModule.deleteAll("#0.0.11") is executed
- this deletes all messages in the file system starting with msg#0.0.11, this includes all messages from queue #0.0.1133 since the message IDs start with msg#0.0.11
the problem is that MessagePersistenceModule construct the message name like following: "msg" + agentId + messageId.substring(3).
this means, everything from agent #0.0.11 and #0.0.1133 start with the exactly same name, making it impossible to distinguish between them.
obvious solution is to include a divider between queue name and message id: "msg" + agentId + "#" + messageId.substring() and instead of: tx.getList("msg" + agentId); do it like this: tx.getList("msg" + agentId + "#"); having this make the getList() only return elements really belonging to the specified queue.
this little changes in about 6 places in MessagePersistenceModule make startup/shutdown work correctly, but also breaks backward compatibility when messages stored with the original naming exist in the file system... but at least no messages are deleted during startup...