.query on large DataFrame yields "None of [Int64Index([...] dtype='int64')] are in the [columns]"

This simple example:
```
from sklearn.linear_model import LogisticRegression
from modAL.models import ActiveLearner

X = pd.DataFrame([[1],[2],[3]])
y = pd.Series([True, False, False])
my_learner = ActiveLearner(estimator=LogisticRegression(), X_training=X, y_training=y)
df = pd.concat([X]*2000)
query_idx, _ = my_learner.query(df, n_instances=100)
```
yields:
```
KeyError: "None of [Int64Index([1665, 1662, 5412, 3399, 1758, 4866, 1755, 3402, 1752, 5415, 3405,\n            1749, 1746, 3408, 1743, 5418, 4863, 1740, 3411, 1737, 3414, 1734,\n            5421, 1731, 3417, 1728, 4860, 3420, 1725, 5424, 1722, 3423, 1719,\n            3426, 1716, 5427, 1713, 4857, 3429, 1710, 3432, 1707, 5430, 1704,\n            3435, 1701, 4854, 1698, 5433, 3438, 1695, 3441, 1692, 1689, 5436,\n            3444, 1686, 4851, 1683, 3447, 1680, 5439, 3450, 1677, 1674, 3453,\n            1671, 5442, 4848, 1668, 3456, 1764, 3459, 5469, 1587, 3492, 1608,\n            5463, 3495, 1605, 1602, 3498, 1599, 5466, 4833, 1596, 3501, 1593,\n            3504, 1590, 4836, 1575, 3513, 3519, 4827, 1569, 5475, 1572, 3516,\n            1614],\n           dtype='int64')] are in the [columns]"
```
at:
```
/databricks/python/lib/python3.7/site-packages/modAL/uncertainty.py in uncertainty_sampling(classifier, X, n_instances, random_tie_break, **uncertainty_measure_kwargs)
    157         query_idx = shuffled_argmax(uncertainty, n_instances=n_instances)
    158 
--> 159     return query_idx, X[query_idx]
```

It works fine with a smaller input, like:
```
...
query_idx, _ = my_learner.query(X, n_instances=1)
```

It seems like `query_idx` is an array for smaller input, but a different index representation, `Int64Index` when the number of instances or input is large. And then that can't be used for indexing rows in X. 

Is it possible that this needs to be `X.iloc[query_idx]`? I don't really know enough pandas to know for sure. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.query on large DataFrame yields "None of [Int64Index([...] dtype='int64')] are in the [columns]" #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

.query on large DataFrame yields "None of [Int64Index([...] dtype='int64')] are in the [columns]" #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions