fletcher package¶
Subpackages¶
Submodules¶
Module contents¶
-
class
fletcher.
FletcherBaseArray
¶ Bases:
fletcher.string_mixin.StringSupportingExtensionArray
Pandas ExtensionArray implementation base backed by an Apache Arrow structure.
- Attributes
- T
base
Return base object of the underlying data.
dtype
Return the ExtensionDtype of this array.
nbytes
The number of bytes needed to store this object in memory.
ndim
Return the number of dimensions of the underlying data.
shape
Return the shape of the data.
size
Return the number of elements in this array.
Methods
all
([skipna])Compute whether all boolean values are True.
any
([skipna])Compute whether any boolean value is True.
argmax
()Return the index of maximum value.
argmin
()Return the index of minimum value.
argsort
([ascending, kind, na_position])Return the indices that would sort this array.
astype
(dtype[, copy])Cast to a NumPy array with ‘dtype’.
copy
()Return a copy of the array.
dropna
()Return ExtensionArray without NA values.
equals
(other)Return if another array is equivalent to this array.
factorize
([na_sentinel])Encode the extension array as an enumerated type.
fillna
([value, method, limit])Fill NA/NaN values using the specified method.
isna
()Boolean NumPy array indicating if each value is missing.
ravel
([order])Return a flattened view on this array.
repeat
(repeats[, axis])Repeat elements of a ExtensionArray.
searchsorted
(value[, side, sorter])Find indices where elements should be inserted to maintain order.
shift
([periods, fill_value])Shift values by desired number.
sum
([skipna])Return the sum of the values.
take
(indices, *[, allow_fill, fill_value])Take elements from an array.
to_numpy
([dtype, copy, na_value])Convert to a NumPy ndarray.
transpose
(*axes)Return a transposed view on this array.
unique
()Compute the ExtensionArray of unique values.
value_counts
([dropna])Return a Series containing counts of each unique value.
view
([dtype])Return a view on the array.
-
all
(skipna: bool = False) → Optional[bool]¶ Compute whether all boolean values are True.
-
any
(skipna: bool = False, **kwargs) → Optional[bool]¶ Compute whether any boolean value is True.
-
astype
(dtype, copy=True)¶ Cast to a NumPy array with ‘dtype’.
- Parameters
- dtypestr or dtype
Typecode or data-type to which the array is cast.
- copybool, default True
Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.
- Returns
- arrayndarray
NumPy ndarray with ‘dtype’ for its dtype.
-
property
base
¶ Return base object of the underlying data.
-
property
dtype
¶ Return the ExtensionDtype of this array.
-
isna
() → numpy.ndarray¶ Boolean NumPy array indicating if each value is missing.
This should return a 1-D array the same length as ‘self’.
-
property
ndim
¶ Return the number of dimensions of the underlying data.
-
property
shape
¶ Return the shape of the data.
-
property
size
¶ Return the number of elements in this array.
- Returns
- sizeint
-
sum
(skipna: bool = True)¶ Return the sum of the values.
-
unique
()¶ Compute the ExtensionArray of unique values.
It relies on the Pyarrow.ChunkedArray.unique and if it fails, comes back to the naive implementation.
- Returns
- uniquesExtensionArray
-
value_counts
(dropna: bool = True) → pandas.core.series.Series¶ Return a Series containing counts of each unique value.
- Parameters
- dropnabool, default True
Don’t include counts of missing values.
- Returns
- countsSeries
See also
Series.value_counts
-
class
fletcher.
FletcherBaseDtype
(arrow_dtype: pyarrow.lib.DataType)¶ Bases:
pandas.core.dtypes.base.ExtensionDtype
Dtype base for a pandas ExtensionArray backed by an Apache Arrow structure.
- Attributes
Methods
construct_array_type
()Return the array type associated with this dtype.
construct_from_string
(string)Construct this type from a string.
example
()Get a simple array with example content.
is_dtype
(dtype)Check if we match ‘dtype’.
-
example
()¶ Get a simple array with example content.
-
property
itemsize
¶
-
property
kind
¶ Return a character code (one of ‘biufcmMOSUV’), default ‘O’.
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
-
na_value
= <NA>¶
-
property
name
¶ Return a string identifying the data type.
Will be used for display in, e.g.
Series.dtype
-
property
type
¶ Return the scalar type for the array, e.g.
int
.It’s expected
ExtensionArray[item]
returns an instance ofExtensionDtype.type
for scalaritem
.
-
class
fletcher.
FletcherChunkedArray
(array, dtype=None, copy=None)¶ Bases:
fletcher.base.FletcherBaseArray
Pandas ExtensionArray implementation backed by Apache Arrow.
- Attributes
- T
base
Return base object of the underlying data.
dtype
Return the ExtensionDtype of this array.
nbytes
Return the number of bytes needed to store this object in memory.
ndim
Return the number of dimensions of the underlying data.
shape
Return the shape of the data.
size
Return the number of elements in this array.
Methods
all
([skipna])Compute whether all boolean values are True.
any
([skipna])Compute whether any boolean value is True.
argmax
()Return the index of maximum value.
argmin
()Return the index of minimum value.
argsort
([ascending, kind, na_position])Return the indices that would sort this array.
astype
(dtype[, copy])Cast to a NumPy array with ‘dtype’.
copy
()Return a copy of the array.
dropna
()Return ExtensionArray without NA values.
equals
(other)Return if another array is equivalent to this array.
factorize
([na_sentinel])Encode the extension array as an enumerated type.
fillna
([value, method, limit])Fill NA/NaN values using the specified method.
flatten
()Flatten the array.
isna
()Boolean NumPy array indicating if each value is missing.
ravel
([order])Return a flattened view on this array.
repeat
(repeats[, axis])Repeat elements of a ExtensionArray.
searchsorted
(value[, side, sorter])Find indices where elements should be inserted to maintain order.
shift
([periods, fill_value])Shift values by desired number.
sum
([skipna])Return the sum of the values.
take
(indices[, allow_fill, fill_value])Take elements from an array.
to_numpy
([dtype, copy, na_value])Convert to a NumPy ndarray.
transpose
(*axes)Return a transposed view on this array.
unique
()Compute the ExtensionArray of unique values.
value_counts
([dropna])Return a Series containing counts of each unique value.
view
([dtype])Return a view on the array.
-
copy
() → pandas.core.arrays.base.ExtensionArray¶ Return a copy of the array.
- Parameters
- deepbool, default False
Also copy the underlying data backing this array.
- Returns
- ExtensionArray
-
factorize
(na_sentinel=- 1)¶ Encode the extension array as an enumerated type.
- Parameters
- na_sentinelint, default -1
Value to use in the codes array to indicate missing values.
- Returns
- codesndarray
An integer NumPy array that’s an indexer into the original ExtensionArray.
- uniquesExtensionArray
An ExtensionArray containing the unique values of self.
Note
uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.
See also
factorize
Top-level factorize method that dispatches here.
Notes
pandas.factorize()
offers a sort keyword as well.
-
fillna
(value=None, method=None, limit=None)¶ Fill NA/NaN values using the specified method.
- Parameters
- valuescalar, array-like
If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like ‘value’ can be given. It’s expected that the array-like have the same length as ‘self’.
- method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.
- Returns
- filledExtensionArray with NA/NaN filled
-
flatten
()¶ Flatten the array.
-
property
nbytes
¶ Return the number of bytes needed to store this object in memory.
-
take
(indices: Union[Sequence[int], numpy.ndarray], allow_fill: bool = False, fill_value: Optional[Any] = None) → pandas.core.arrays.base.ExtensionArray¶ Take elements from an array.
- Parameters
- indicessequence of integers
Indices to be taken.
- allow_fillbool, default False
How to handle negative values in indices. * False: negative values in indices indicate positional indices
from the right (the default). This is similar to
numpy.take()
.True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a
ValueError
.
- fill_valueany, optional
Fill value to use for NA-indices when allow_fill is True. This may be
None
, in which case the default NA value for the type,self.dtype.na_value
, is used. For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if nescessary.
- Returns
- ExtensionArray
- Raises
- IndexError
When the indices are out of bounds for the array.
- ValueError
When indices contains negative values other than
-1
and allow_fill is True.
See also
numpy.take
pandas.api.extensions.take
Notes
ExtensionArray.take is called by
Series.__getitem__
,.loc
,iloc
, when indices is a sequence of values. Additionally, it’s called bySeries.reindex()
, or any other method that causes realignemnt, with a fill_value.
-
class
fletcher.
FletcherChunkedDtype
(arrow_dtype: pyarrow.lib.DataType)¶ Bases:
fletcher.base.FletcherBaseDtype
Dtype for a pandas ExtensionArray backed by Apache Arrow’s pyarrow.ChunkedArray.
- Attributes
- itemsize
kind
Return a character code (one of ‘biufcmMOSUV’), default ‘O’.
name
Return a string identifying the data type.
names
Ordered list of field names, or None if there are no fields.
type
Return the scalar type for the array, e.g.
Methods
construct_array_type
(*args)Return the array type associated with this dtype.
construct_from_string
(string)Attempt to construct this type from a string.
example
()Get a simple array with example content.
is_dtype
(dtype)Check if we match ‘dtype’.
-
classmethod
construct_array_type
(*args) → Type[fletcher.base.FletcherChunkedArray]¶ Return the array type associated with this dtype.
- Returns
- type
-
classmethod
construct_from_string
(string: str) → fletcher.base.FletcherChunkedDtype¶ Attempt to construct this type from a string.
- Parameters
- stringstr
- Returns
- selfinstance of ‘cls’
- Raises
- TypeError
If a class cannot be constructed from this ‘string’.
Examples
If the extension dtype can be constructed without any arguments, the following may be an adequate implementation. >>> @classmethod … def construct_from_string(cls, string) … if string == cls.name: … return cls() … else: … raise TypeError(“Cannot construct a ‘{}’ from ” … “’{}’”.format(cls, string))
-
class
fletcher.
FletcherContinuousArray
(array, dtype=None, copy: Optional[bool] = None)¶ Bases:
fletcher.base.FletcherBaseArray
Pandas ExtensionArray implementation backed by Apache Arrow’s pyarrow.Array.
- Attributes
- T
base
Return base object of the underlying data.
dtype
Return the ExtensionDtype of this array.
nbytes
Return the number of bytes needed to store this object in memory.
ndim
Return the number of dimensions of the underlying data.
shape
Return the shape of the data.
size
Return the number of elements in this array.
Methods
all
([skipna])Compute whether all boolean values are True.
any
([skipna])Compute whether any boolean value is True.
argmax
()Return the index of maximum value.
argmin
()Return the index of minimum value.
argsort
([ascending, kind, na_position])Return the indices that would sort this array.
astype
(dtype[, copy])Cast to a NumPy array with ‘dtype’.
copy
()Return a copy of the array.
dropna
()Return ExtensionArray without NA values.
equals
(other)Return if another array is equivalent to this array.
factorize
([na_sentinel])Encode the extension array as an enumerated type.
fillna
([value, method, limit])Fill NA/NaN values using the specified method.
flatten
()Flatten the array.
isna
()Boolean NumPy array indicating if each value is missing.
ravel
([order])Return a flattened view on this array.
repeat
(repeats[, axis])Repeat elements of a ExtensionArray.
searchsorted
(value[, side, sorter])Find indices where elements should be inserted to maintain order.
shift
([periods, fill_value])Shift values by desired number.
sum
([skipna])Return the sum of the values.
take
(indices[, allow_fill, fill_value])Take elements from an array.
to_numpy
([dtype, copy, na_value])Convert to a NumPy ndarray.
transpose
(*axes)Return a transposed view on this array.
unique
()Compute the ExtensionArray of unique values.
value_counts
([dropna])Return a Series containing counts of each unique value.
view
([dtype])Return a view on the array.
-
copy
() → pandas.core.arrays.base.ExtensionArray¶ Return a copy of the array.
Currently is a shadow copy - pyarrow array are supposed to be immutable.
- Returns
- ExtensionArray
-
factorize
(na_sentinel=- 1)¶ Encode the extension array as an enumerated type.
- Parameters
- na_sentinelint, default -1
Value to use in the codes array to indicate missing values.
- Returns
- codesndarray
An integer NumPy array that’s an indexer into the original ExtensionArray.
- uniquesExtensionArray
An ExtensionArray containing the unique values of self.
Note
uniques will not contain an entry for the NA value of the ExtensionArray if there are any missing values present in self.
See also
factorize
Top-level factorize method that dispatches here.
Notes
pandas.factorize()
offers a sort keyword as well.
-
fillna
(value=None, method=None, limit=None)¶ Fill NA/NaN values using the specified method.
- Parameters
- valuescalar, array-like
If a scalar value is passed it is used to fill all missing values. Alternatively, an array-like ‘value’ can be given. It’s expected that the array-like have the same length as ‘self’.
- method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
- limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled.
- Returns
- filledExtensionArray with NA/NaN filled
-
flatten
()¶ Flatten the array.
-
property
nbytes
¶ Return the number of bytes needed to store this object in memory.
-
take
(indices: Union[Sequence[int], numpy.ndarray], allow_fill: bool = False, fill_value: Optional[Any] = None) → pandas.core.arrays.base.ExtensionArray¶ Take elements from an array.
- Parameters
- indicessequence of integers
Indices to be taken.
- allow_fillbool, default False
How to handle negative values in indices. * False: negative values in indices indicate positional indices
from the right (the default). This is similar to
numpy.take()
.True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a
ValueError
.
- fill_valueany, optional
Fill value to use for NA-indices when allow_fill is True. This may be
None
, in which case the default NA value for the type,self.dtype.na_value
, is used. For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if nescessary.
- Returns
- ExtensionArray
- Raises
- IndexError
When the indices are out of bounds for the array.
- ValueError
When indices contains negative values other than
-1
and allow_fill is True.
See also
numpy.take
pandas.api.extensions.take
Notes
ExtensionArray.take is called by
Series.__getitem__
,.loc
,iloc
, when indices is a sequence of values. Additionally, it’s called bySeries.reindex()
, or any other method that causes realignemnt, with a fill_value.
-
class
fletcher.
FletcherContinuousDtype
(arrow_dtype: pyarrow.lib.DataType)¶ Bases:
fletcher.base.FletcherBaseDtype
Dtype for a pandas ExtensionArray backed by Apache Arrow’s pyarrow.Array.
- Attributes
- itemsize
kind
Return a character code (one of ‘biufcmMOSUV’), default ‘O’.
name
Return a string identifying the data type.
names
Ordered list of field names, or None if there are no fields.
type
Return the scalar type for the array, e.g.
Methods
construct_array_type
(*args)Return the array type associated with this dtype.
construct_from_string
(string)Attempt to construct this type from a string.
example
()Get a simple array with example content.
is_dtype
(dtype)Check if we match ‘dtype’.
-
classmethod
construct_array_type
(*args)¶ Return the array type associated with this dtype.
- Returns
- type
-
classmethod
construct_from_string
(string: str)¶ Attempt to construct this type from a string.
- Parameters
- string
- Returns
- selfinstance of ‘cls’
- Raises
- TypeError
If a class cannot be constructed from this ‘string’.
Examples
If the extension dtype can be constructed without any arguments, the following may be an adequate implementation. >>> @classmethod … def construct_from_string(cls, string) … if string == cls.name: … return cls() … else: … raise TypeError(“Cannot construct a ‘{}’ from ” … “’{}’”.format(cls, string))
-
class
fletcher.
TextAccessor
(obj)¶ Bases:
fletcher.string_array.TextAccessorBase
Accessor for pandas exposed as
.fr_strx
.Methods
cat
(others)Concatenate strings in the Series/Index with given separator.
contains
(pat[, case, regex])Test if pattern or regex is contained within a string of a Series or Index.
endswith
(pat)Check whether a row ends with a certain pattern.
replace
(pat, repl[, n, case, regex])Replace occurrences of pattern/regex in the Series/Index with some other string.
slice
([start, end, step])Extract every step character from strings from start to end.
startswith
(pat)Check whether a row starts with a certain pattern.
strip
([to_strip])Strip whitespaces from both ends of strings.
zfill
(width)Pad strings in the Series/Index by prepending ‘0’ characters.
count
isalnum
isalpha
isdecimal
isdigit
islower
isnumeric
isspace
istitle
isupper
-
cat
(others: Optional[fletcher.base.FletcherBaseArray]) → pandas.core.series.Series¶ Concatenate strings in the Series/Index with given separator.
If others is specified, this function concatenates the Series/Index and elements of others element-wise. If others is not passed, then all values in the Series/Index are concatenated into a single string with a given sep.
-
contains
(pat: str, case: bool = True, regex: bool = True) → pandas.core.series.Series¶ Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
- This implementation differs to the one in
pandas
: We always return a missing for missing data.
You cannot pass flags for the regular expression module.
- Parameters
- patstr
Character sequence or regular expression.
- casebool, default True
If True, case sensitive.
- regexbool, default True
If True, assumes the pat is a regular expression.
If False, treats the pat as a literal string.
- Returns
- Series or Index of boolean values
A Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.
- This implementation differs to the one in
-
count
(pat: str, regex: bool = True) → pandas.core.series.Series¶
-
endswith
(pat)¶ Check whether a row ends with a certain pattern.
-
isalnum
()¶
-
isalpha
()¶
-
isdecimal
()¶
-
isdigit
()¶
-
islower
()¶
-
isnumeric
()¶
-
isspace
()¶
-
istitle
()¶
-
isupper
()¶
-
replace
(pat: str, repl: str, n: int = - 1, case: bool = True, regex: bool = True)¶ Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().
Return а string Series where in each row the occurrences of the given pattern or regex
pat
are replaced byrepl
.- This implementation differs to the one in
pandas
: We always return a missing for missing data.
You cannot pass flags for the regular expression module.
- Parameters
- patstr
Character sequence or regular expression.
- replstr
Replacement string.
- nint
Number of replacements to make from start.
- casebool, default True
If True, case sensitive.
- regexbool, default True
If True, assumes the pat is a regular expression. If False, treats the pat as a literal string.
- Returns
- Series of string values.
- This implementation differs to the one in
-
slice
(start=0, end=None, step=1)¶ Extract every step character from strings from start to end.
-
startswith
(pat)¶ Check whether a row starts with a certain pattern.
-
strip
(to_strip=None)¶ Strip whitespaces from both ends of strings.
-
zfill
(width: int) → pandas.core.series.Series¶ Pad strings in the Series/Index by prepending ‘0’ characters.
-
-
fletcher.
pandas_from_arrow
(arrow_object: Union[pyarrow.lib.RecordBatch, pyarrow.lib.Table, pyarrow.lib.Array, pyarrow.lib.ChunkedArray], continuous: bool = False)¶ Convert Arrow object instance to their Pandas equivalent by using Fletcher.
- The conversion rules are:
{RecordBatch, Table} -> DataFrame
{Array, ChunkedArray} -> Series
- Parameters
- arrow_objectRecordBatch, Table, Array or ChunkedArray
object to be converted
- continuousbool
Use FletcherContinuousArray instead of FletcherChunkedArray
-
fletcher.
read_parquet
(path, columns: Optional[List[str]] = None, continuous: bool = False) → pandas.core.frame.DataFrame¶ Load a parquet object from the file path, returning a DataFrame with fletcher columns.
- Parameters
- pathstr or file-like
- continuousbool
Use FletcherContinuousArray instead of FletcherChunkedArray
- Returns
- pd.DataFrame