Login

volcs0 · 01-14-2008, 01:45 AM

I have a hash table keyed on a floating point number (a molecular weight). I need to pick out all the elements who's keys fall within a certain range.

For instance, these might be the keys:

100.1
100.3
100.8
101.2
101.6
102.3
.
.
105.0

And I need to pick out all the elements that fall between say 101 and 103.

I've created a list of keys and sorted it, but I still can't figure out how to pick out the keys (and thus their indices) so that I can get at the elements in the table. I guess I could iterate over the entire list of keys, checking each to see if it fell within the range (and stopping once I'm out of the range), but this seems inefficient.

Sorry if I'm not explaining this well.

I'm using Python, but it shouldn't matter.

Thanks.

space-time · 01-14-2008, 02:10 AM

I don't really understand the question. In your example, the elements are already sorted, so what's the problem?

if you don't want to search the whole array, then you just need to find the 101 and 103 points, right?

you can do that very easily in a sorted array.

Look at the mid point, see if 101 is below or above.

Then look at the midpoint of that section, see where 101 is.

With this method you can find a number in a 1024 array in just 10 steps.

Repeat for 103.

Now you have your low and high limit, 101 and 103. Just copy the old array to a new array starting from the 101 element to the 103 element.

Is that what you wanted?

Sorry if I did not understand your question clearly.

TheTominator · 01-14-2008, 02:20 AM

You have to iterate over the keys.

If you have a good guess at where to start iterating, you can save iterations on the front end. Since you know when to end, you can stop iterating early.

The problem is that you are using keys in a situation where you don't know the keys when it comes time to using the data.

Keys are used when you plan on looking up the data when you know the key. This isn't how you are using the data. This isn't the data structure you should be using.

I question if this will actually work in practice since there is the logical possibility of two values having the same key. Picking the right isotopes, two elements could have the same molecular weight.

volcs0 · 01-14-2008, 03:22 AM

Thanks guys.

This is mass spec data, and my key is actually the query number and the mass, so it will be unique.

But yes, I realize that maybe I need a better data structure.

I need to take all the data and compare each element's molecular mass and scan number against all the other elements - picking out the ones that meet certain criteria (in this case, mass number within a certain tolerance and scan number within a certain window). I thought it would be fast to take all the elements with the MW within a certain tolerance and then quickly find the ones with the scan number within the window.

I'll have to rethink my data structure.

TheTominator · 01-14-2008, 06:45 AM

It sounds like you need to throw the data set into a database. Databases are designed to perform queries like that. They have special optimizations (indexing fields) to make the process efficient internally.

volcs0 · 01-16-2008, 06:17 AM

[quote TheTominator]It sounds like you need to throw the data set into a database. Databases are designed to perform queries like that. They have special optimizations (indexing fields) to make the process efficient internally.
Yes, ultimately, this will be the solution.

Login
Username:
Password:	Lost Password?
	Remember me